文章目录
前言
本文写的主要是prometheus各个组件的概要部署,详细各个组件实现的功能详见具体文章
设计方案:
方案概要:
采集区的各个exporter作为各个功能的采集器,每隔一段时间采集自己相应的数据,consul最为注册服务组件,将各个监控目标注册到consul上,以便于prometheus的动态发现。Prometheus拉取各个监控指标、比对规则、将数据存于tsdb(默认,本次设计采用influxdb远方存储),并将异常的指标抛给alertmanger。Alertmanager则配置相应的告警(本设计为微信告警),Grafana用于展示相关监控指标。具体各个组件的详细功能见第4步的组件详解.
2、prometheus监控系统搭建
1、安装环境准备
1.1 关闭selinux
sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
setenforce 0
1.2 安装go环境
下载go的安装包,解压到/usr/local 目录下
tar -xvf go1.11.5.linux-amd64.tar.gz -C /usr/local/
配置环境变量
cat >>/etc/profile<<EOF
export PATH=$PATH:/usr/local/go/bin
EOF
source /etc/profile
go version
1.3 系统主机时间、时区、系统语言
若有ntp服务则忽略此步
修改时区
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
修改系统语言环境
echo ‘LANG=“en_US.UTF-8”’ >> /etc/profile && source /etc/profile
配置主机NTP时间同步
yum -y install ntp
systemctl enable ntpd && systemctl start ntpd
echo ‘server ntp1.aliyun.com’ >> /etc/ntp.conf
echo ‘server ntp2.aliyun.com’ >> /etc/ntp.conf
若机器无法访问外网,则在一台服务器上装ntp服务,并将另外几台服务器设置为该服务器的客户端。
安装ntp服务(下载ntpd rpm包)此处忽略
设置ntp server
vi /etc/ntp.conf
driftfile /var/lib/ntp/drift
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict -6 ::1
#允许192.168.6网段机器同步时间
restrict 192.168.6.0 mask 255.255.255.0 nomodify notrap
#远程时间服务器的地址
server 210.72.145.44 perfer #中国国家授时中心
server 1.cn.pool.ntp.org
#允许上层服务器主动修改本机时间
restrict 210.72.145.44 nomodify notrap noquery
restrict 1.cn.pool.ntp.ofg nomodify notrap noquery
#外部时间服务器不可用时,以本地时间作为时间服务
server 127.127.1.0
fudge 127.127.1.0 stratum 10
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
4.设置NTP client
设置个crontab 每分钟执行ntpd ntp server ip即可
3、prometheus安装
- 下载&部署
#下载
[root@prometheus src]# cd /usr/local/src/
[root@prometheus src]# wget https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz
#部署到/usr/local/目录
#promethus不用编译安装,解压目录中有配置文件与启动文件
[root@prometheus src]# tar -zxvf prometheus-2.0.0.linux-amd64.tar.gz -C /usr/local/
[root@prometheus src]# cd /usr/local/
[root@prometheus local]# mv prometheus-2.0.0.linux-amd64/ prometheus/
mkdir /etc/prometheus
mkdir /var/lib/prometheus
cd /usr/local/prometheus
cp prometheus /usr/sbin/
cp promtool /usr/sbin/
cp prometheus.yml /etc/prometheus/
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown prometheus:prometheus /usr/sbin/prometheus /usr/sbin/promtool
chown prometheus:prometheus /etc/prometheus /var/lib/prometheus/ -R
#验证
[root@prometheus prometheus]# prometheus --version
#设置开机自启动
[root@prometheus ~]# touch /usr/lib/systemd/system/prometheus.service
[root@prometheus ~]# chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service
[root@prometheus ~]# vim /usr/lib/systemd/system/prometheus.service
[Unit] Description=Prometheus Documentation=https://prometheus.io/ After=network.target [Service]
#Type设置为notify时,服务会不断重启 Type=simple User=prometheus ExecStart=/usr/sbin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries Restart=on-failure [Install] WantedBy=multi-user.target
[root@prometheus ~]# systemctl enable prometheus
[root@prometheus ~]# systemctl start prometheus
开放端口
firewall-cmd --zone=public --add-port=9090/tcp --permanent
配置prometheus查询日志切割(有配置 query_log_file: /var/log/prometheus.log)
cat /etc/logrotate.d/prometheus
/var/log/prometheus.log {
create 0644 prometheus prometheus
daily
rotate 7
missingok
notifempty
dateext
compress
sharedscripts
postrotate
ps -ef |grep prometheus.yml |grep -v grep |awk ‘{print $2}’ |xargs kill -HUP
endscript
}
4、node_exporter(wmi_exporter的安装)
centos6
tar -zxvf node_exporter-0.18.1.linux-arm64.tar.gz -C /usr/local/
mv node_exporter-0.18.1.linux-arm64 node_exporter
cd /usr/local/node_exporter/
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /usr/local/node_exporter/
启动脚本:
cat start-node_exporter.sh
cd /usr/local/node_exporter
nohup ./node_exporter >>/var/log/node_exporter.log &
日志切割:(如有必要)
cat /etc/logrotate.d/node_exporter
/var/log/node_exporter.log {
create 644 root root
daily
rotate 7
missingok
notifempty
dateext
sharedscripts
postrotate
ps -ef |grep node_exporter |grep -v grep |awk ‘{print $2}’ |xargs kill -HUP
endscript
}
centos7:
下载地址:https://github.com/prometheus/node_exporter/releases
安装
tar -zxvf node_exporter-0.18.1.linux-arm64.tar.gz -C /usr/local/
mv node_exporter-0.18.1.linux-arm64 node_exporter
设置组用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /usr/local/node_exporter/
设置开机自启动脚本
cat /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl enable node_exporter.service
systemctl start node_exporter.service
访问日志数据:
http://192.168.14.160:9100/metrics
windows
安装msi安装包即可
设置iptables
firewall-cmd --zone=public --add-port=9100/tcp --permanent
5、blackbox exporter的安装以及实现url探测
tar -zxvf blackbox_exporter-0.16.0.linux-amd64.tar.gz -C /usr/local/
mv blackbox_exporter-0.16.0.linux-amd64 blackbox_exporter
mv blackbox_exporter /usr/sbin/
mkdir -p /etc/blackbox
cd /usr/local/blackbox_exporter
cp blackbox.yml /etc/blackbox/
chown prometheus:prometheus /etc/blackbox/ -R
配置自启动脚本:
vim /usr/lib/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/sbin/blackbox_exporter \
--config.file=/etc/blackbox/blackbox.yml
[Install]
WantedBy=multi-user.target
systemctl enable blackbox_exporter.service
systemctl start blackbox_exporter.service
url探测功能实现
2、定义接口模块参数(仅适用于post,get请求无需进行此步)
vim /etc/blackbox/blackbox.yml
在modules后面添加(每个http post接口添加一个模块)
其中fjsmrh_dzjkk为自己定义的模块名
fjsmrh_dzjkk:
prober: http
timeout: 25s
http:
preferred_ip_protocol: "ip4"
method: POST
headers:
Content-Type: application/json;charset=UTF-8
body: '{"post报文"}'
fail_if_body_not_matches_regexp:
- "0000|success"
fail_if_body_not_matches_regexp: —过滤post请求规则,指定成功的编码
在prometheus.yml里面引用此模块
- job_name: "smrh_zm"
metrics_path: /probe
params:
module: [fjsmrh_dzjkk]
static_configs:
- targets: ["url"]
labels:
tags: "项目名称"
product: "功能"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.11.178:9115
get请求 直接在prometheus.yml上加
- job_name: "blackbox"
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
file_sd_configs:
- refresh_interval: 1m
files:
- "/etc/blackbox/blackbox-dis.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: 192.168.11.178:9115
然后在/etc/blackbox/blackbox-dis.yml 加上get目标
- targets: ['url']
labels:
instance: 'get url(自定义)'
tags: '标签名(自定义)'
product: '产品名(自定义)'
上传blackbox.yml到rule(prometheus)文件夹下(所有的rule规则会整合在一个目录中)
groups:
- name: blackbox_network_stats
rules:
- alert: 'url服务探测失败'
expr: probe_success == 0
for: 60s
labels:
severity: high
alertinfo: push_blackbox_alert
annotations:
summary: "{{ $labels.instance }}探测失败"
description: "url探测失败,请检查业务是否正常!!!"
也可加上证书过期规则:
groups:
- name: ssl_expiry.rules
rules:
- alert: ssl证书即将过期
expr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time() < 86400 * 15
for: 10m
labels:
severity: high
alertinfo: push_ssl_alert
annotations:
summary: "{{ $labels.instance }}证书即将过期"
description: "证书即将过期,请及时处理"
启动blackbox并重新加载prometheus
systemctl start blackbox_exporter.service
systemctl restart prometheus
grafana导入json模板。实现效果:
6、mysql exporter安装
1、创建采集用户
create user mysqld_exporter IDENTIFIED BY ‘ylz@yhkj#2020’
GRANT PROCESS, REPLICATION CLIENT, SELECT ON . TO ‘mysqld_exporter’@’%’;
2、下载采集器并配置
tar -zxvf mysqld_exporter-0.11.0.linux-amd64.tar.gz -C /usr/local
mv mysqld_exporter-0.11.0.linux-amd64 mysqld_exporter
cd mysqld_exporter
vim .my.cnf
[client]
user=mysqld_exporte
password=ylz@yhkj#2020
设置开机自启动
vim /usr/lib/systemd/system/mysql_exporter.service
[Unit]
Description=Prometheus1
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/usr/local/mysqld_exporter/.my.cnf
Restart=on-failure
[Install]
WantedBy=multi-user.target
chown -R prometheus:prometheus /usr/local/mysqld_exporter/
systemctl enable mysql_exporter.service
systemctl start mysql_exporter.service
问题处理:
若启动不了服务(提示密码错误)
vim /etc/profileexport DATA_SOURCE_NAME='mysqld_exporter:ylz@yhkj#2020@tcp(127.0.0.1:3306)/'source /etc/profile
若还是启动不了 则:
解决方案:手动启动
先设置一个全局变量
export DATA_SOURCE_NAME=‘exporter:zZ#342xz666@tcp(127.0.0.1:3306)/’
source /etc/profile
然后
/usr/local/mysqld_exporter/mysqld_exporter &
也可以在赋权目录下比如/home/prometheus下直接复制.my.cnf
7、oracle采集器安装(linux)
创建用户
create user prometheus identified by ylz#2020;
grant connect,resource,dba to prometheus;
1、 下载oracle客户端安装包,配置安装
https://www.oracle.com/database/technologies/instant-client/downloads.html (版本必须为18以上)
下载basic sqlplus tools这三个文件(其实只要安装basic即可)
rpm -ivh oracle-instantclient18.5-basic-18.5.0.0.0-3.x86_64.rpm
rpm -ivh oracle-instantclient18.5-sqlplus-18.5.0.0.0-3.x86_64.rpm
默认路劲为 /usr/lib/oracle/版本号/client64
添加tnsnames.ora
vim /usr/lib/oracle/18.5/client64/tnsnames.ora
prometheus =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.18)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)
)
)
在/etc/profile
export ORACLE_HOME=/usr/lib/oracle/18.5/client64
export TNS_ADMIN=O R A C L E H O M E / n e t w o r k e x p o r t L D L I B R A R Y P A T H = ORACLE_HOME/network export LD_LIBRARY_PATH=ORACLEHOME/networkexportLDLIBRARYPATH=ORACLE_HOME/lib
export PATH=O R A C L E H O M E / b i n : ORACLE_HOME/bin:ORACLEHOME/bin:PATH
export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
export DATA_SOURCE_NAME=“onepay/onepay@192.168.44.90:1521/orcl”
source /etc/profile
验证是否可以登录
sqlplus onepay/onepay@IP/orcl source /etc/profile
验证是否可以登录
sqlplus onepay/onepay@IP/orcl
2、安装oracle_exporter
下载地址
https://github.com/iamseth/oracledb_exporter/releases
tar -zxvf oracledb_exporter-0.2.7.tar.gz -C /usr/local
在/etc/profile中添加
export DATA_SOURCE_NAME=“onepay/onepay@10.102.0.240:1521/orcl”
启动(可以根据实际情况设置抓取时间 建议40s以上)
./oracledb_exporter -query.timeout=50
8、alertmanager安装
tar -zxvf alertmanager-0.15.3.linux-amd64.tar.gz
cd alertmanager-0.15.3.linux-amd64
cp -rf alertmanager amtool /usr/sbin/
cp -rf alertmanager.yml /etc/
mkdir -p /var/lib/alertmanager/data
chown prometheus:prometheus /etc/alertmanager.yml /var/lib/alertmanager/ -R
firewall-cmd --zone=public --add-port=9093/tcp --permanent
自启动脚本
vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/sbin/alertmanager --config.file=/etc/alertmanager.yml --storage.path=/var/lib/alertmanager/data
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl enable alertmanager.service
systemctl start alertmanager.service
参考配置文件
[root@bogon ~]# cat /etc/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
templates:
- '/etc/alertmanager/wechat.tmpl'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: 'wechat'
receivers:
- name: 'wechat'
wechat_configs:
- corp_id: 'ww7329a73bf83d1d39'
to_party: '3'
agent_id: '1000005'
api_secret: 'lzWFlJIfbVW8QdlTvcVcFqZ-Rf0nR_gTr43oiS4AENE'
send_resolved: true
inhibit_rules:
- equal: ['alertname', 'cluster', 'service']
source_match:
severity: 'high'
target_match:
severity: 'warning'
9、grafana安装部署
下载grafana安装包 https://grafana.com/grafana/download
wget https://dl.grafana.com/oss/release/grafana-7.3.7-1.x86_64.rpm
yum install grafana-7.3.7-1.x86_64.rpm
rpm -ivh --nodeps grafana-7.3.7-1.x86_64.rpm
systemctl daemon-reload
systemctl enable grafana-server
systemctl start grafana-server