prometheus方案设计以及部署详解


前言

本文写的主要是prometheus各个组件的概要部署,详细各个组件实现的功能详见具体文章


设计方案:
架构探索设计
方案概要:
采集区的各个exporter作为各个功能的采集器,每隔一段时间采集自己相应的数据,consul最为注册服务组件,将各个监控目标注册到consul上,以便于prometheus的动态发现。Prometheus拉取各个监控指标、比对规则、将数据存于tsdb(默认,本次设计采用influxdb远方存储),并将异常的指标抛给alertmanger。Alertmanager则配置相应的告警(本设计为微信告警),Grafana用于展示相关监控指标。具体各个组件的详细功能见第4步的组件详解.

2、prometheus监控系统搭建

1、安装环境准备

1.1 关闭selinux

sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
setenforce 0

1.2 安装go环境

下载go的安装包,解压到/usr/local 目录下
tar -xvf go1.11.5.linux-amd64.tar.gz -C /usr/local/
配置环境变量
cat >>/etc/profile<<EOF
export PATH=$PATH:/usr/local/go/bin
EOF
source /etc/profile
go version

1.3 系统主机时间、时区、系统语言

若有ntp服务则忽略此步
修改时区
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
修改系统语言环境
echo ‘LANG=“en_US.UTF-8”’ >> /etc/profile && source /etc/profile
配置主机NTP时间同步
yum -y install ntp
systemctl enable ntpd && systemctl start ntpd
echo ‘server ntp1.aliyun.com’ >> /etc/ntp.conf
echo ‘server ntp2.aliyun.com’ >> /etc/ntp.conf
若机器无法访问外网,则在一台服务器上装ntp服务,并将另外几台服务器设置为该服务器的客户端。
安装ntp服务(下载ntpd rpm包)此处忽略
设置ntp server
vi /etc/ntp.conf
driftfile /var/lib/ntp/drift
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict -6 ::1
#允许192.168.6网段机器同步时间
restrict 192.168.6.0 mask 255.255.255.0 nomodify notrap

#远程时间服务器的地址
server 210.72.145.44 perfer #中国国家授时中心
server 1.cn.pool.ntp.org

#允许上层服务器主动修改本机时间
restrict 210.72.145.44 nomodify notrap noquery
restrict 1.cn.pool.ntp.ofg nomodify notrap noquery

#外部时间服务器不可用时,以本地时间作为时间服务
server 127.127.1.0
fudge 127.127.1.0 stratum 10

includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
4.设置NTP client
设置个crontab 每分钟执行ntpd ntp server ip即可

3、prometheus安装

  1. 下载&部署
    #下载
    [root@prometheus src]# cd /usr/local/src/
    [root@prometheus src]# wget https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz
    #部署到/usr/local/目录
    #promethus不用编译安装,解压目录中有配置文件与启动文件
    [root@prometheus src]# tar -zxvf prometheus-2.0.0.linux-amd64.tar.gz -C /usr/local/
    [root@prometheus src]# cd /usr/local/
    [root@prometheus local]# mv prometheus-2.0.0.linux-amd64/ prometheus/
    mkdir /etc/prometheus
    mkdir /var/lib/prometheus
    cd /usr/local/prometheus
    cp prometheus /usr/sbin/
    cp promtool /usr/sbin/
    cp prometheus.yml /etc/prometheus/
    groupadd prometheus
    useradd -g prometheus -s /sbin/nologin prometheus
    chown prometheus:prometheus /usr/sbin/prometheus /usr/sbin/promtool
    chown prometheus:prometheus /etc/prometheus /var/lib/prometheus/ -R
    #验证
    [root@prometheus prometheus]# prometheus --version
    查看版本
    #设置开机自启动
    [root@prometheus ~]# touch /usr/lib/systemd/system/prometheus.service
    [root@prometheus ~]# chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service

[root@prometheus ~]# vim /usr/lib/systemd/system/prometheus.service

[Unit] Description=Prometheus Documentation=https://prometheus.io/ After=network.target [Service]
#Type设置为notify时,服务会不断重启 Type=simple User=prometheus ExecStart=/usr/sbin/prometheus  \
     --config.file=/etc/prometheus/prometheus.yml \
     --storage.tsdb.path=/var/lib/prometheus            \
     --web.console.templates=/etc/prometheus/consoles \
     --web.console.libraries=/etc/prometheus/console_libraries   Restart=on-failure [Install] WantedBy=multi-user.target

[root@prometheus ~]# systemctl enable prometheus
[root@prometheus ~]# systemctl start prometheus
开放端口
firewall-cmd --zone=public --add-port=9090/tcp --permanent
配置prometheus查询日志切割(有配置 query_log_file: /var/log/prometheus.log)
cat /etc/logrotate.d/prometheus
/var/log/prometheus.log {
create 0644 prometheus prometheus
daily
rotate 7
missingok
notifempty
dateext
compress
sharedscripts
postrotate
ps -ef |grep prometheus.yml |grep -v grep |awk ‘{print $2}’ |xargs kill -HUP
endscript
}

4、node_exporter(wmi_exporter的安装)

centos6
tar -zxvf node_exporter-0.18.1.linux-arm64.tar.gz -C /usr/local/
mv node_exporter-0.18.1.linux-arm64 node_exporter
cd /usr/local/node_exporter/
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /usr/local/node_exporter/
启动脚本:
cat start-node_exporter.sh
cd /usr/local/node_exporter
nohup ./node_exporter >>/var/log/node_exporter.log &
日志切割:(如有必要)
cat /etc/logrotate.d/node_exporter
/var/log/node_exporter.log {
create 644 root root
daily
rotate 7
missingok
notifempty
dateext
sharedscripts
postrotate
ps -ef |grep node_exporter |grep -v grep |awk ‘{print $2}’ |xargs kill -HUP
endscript
}

centos7:
下载地址:https://github.com/prometheus/node_exporter/releases
安装
tar -zxvf node_exporter-0.18.1.linux-arm64.tar.gz -C /usr/local/
mv node_exporter-0.18.1.linux-arm64 node_exporter
设置组用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /usr/local/node_exporter/
设置开机自启动脚本
cat /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

systemctl enable node_exporter.service
systemctl start node_exporter.service

访问日志数据:
http://192.168.14.160:9100/metrics

windows
安装msi安装包即可
设置iptables
firewall-cmd --zone=public --add-port=9100/tcp --permanent

5、blackbox exporter的安装以及实现url探测

tar -zxvf blackbox_exporter-0.16.0.linux-amd64.tar.gz -C /usr/local/
mv blackbox_exporter-0.16.0.linux-amd64 blackbox_exporter
mv blackbox_exporter /usr/sbin/
mkdir -p /etc/blackbox
cd /usr/local/blackbox_exporter
cp blackbox.yml /etc/blackbox/
chown prometheus:prometheus /etc/blackbox/ -R
配置自启动脚本:
vim /usr/lib/systemd/system/blackbox_exporter.service

[Unit]
Description=blackbox_exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/sbin/blackbox_exporter  \
         --config.file=/etc/blackbox/blackbox.yml
[Install]
WantedBy=multi-user.target

systemctl enable blackbox_exporter.service
systemctl start blackbox_exporter.service
url探测功能实现
2、定义接口模块参数(仅适用于post,get请求无需进行此步)
vim /etc/blackbox/blackbox.yml
在modules后面添加(每个http post接口添加一个模块)
其中fjsmrh_dzjkk为自己定义的模块名

  fjsmrh_dzjkk:
    prober: http
    timeout: 25s
    http:
      preferred_ip_protocol: "ip4"
      method: POST
      headers:
        Content-Type: application/json;charset=UTF-8
      body: '{"post报文"}'
      fail_if_body_not_matches_regexp:
        - "0000|success"

fail_if_body_not_matches_regexp: —过滤post请求规则,指定成功的编码
在prometheus.yml里面引用此模块

  - job_name: "smrh_zm"
    metrics_path: /probe
    params:
      module: [fjsmrh_dzjkk]
    static_configs:
    - targets: ["url"]
      labels:
        tags: "项目名称"
        product: "功能"
     relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 192.168.11.178:9115

get请求 直接在prometheus.yml上加

  - job_name: "blackbox"
    metrics_path: /probe
    params:
      module: [http_2xx]  # Look for a HTTP 200 response.
    file_sd_configs: 
    - refresh_interval: 1m
      files: 
      - "/etc/blackbox/blackbox-dis.yml"
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - target_label: __address__
      replacement: 192.168.11.178:9115

然后在/etc/blackbox/blackbox-dis.yml 加上get目标

- targets: ['url']
  labels:
    instance: 'get url(自定义)'
    tags: '标签名(自定义)'
    product: '产品名(自定义)'

上传blackbox.yml到rule(prometheus)文件夹下(所有的rule规则会整合在一个目录中)

groups:
- name: blackbox_network_stats
  rules:
  - alert: 'url服务探测失败'
    expr: probe_success == 0
    for: 60s
    labels:
      severity: high
      alertinfo: push_blackbox_alert
    annotations:
      summary: "{{ $labels.instance }}探测失败"
      description: "url探测失败,请检查业务是否正常!!!"

也可加上证书过期规则:

groups: 
  - name: ssl_expiry.rules 
    rules: 
      - alert: ssl证书即将过期 
        expr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time() < 86400 * 15 
        for: 10m
    labels:
      severity: high
      alertinfo: push_ssl_alert
    annotations:
      summary: "{{ $labels.instance }}证书即将过期"
      description: "证书即将过期,请及时处理"

启动blackbox并重新加载prometheus
systemctl start blackbox_exporter.service
systemctl restart prometheus

grafana导入json模板。实现效果:
url探测面板

6、mysql exporter安装

1、创建采集用户
create user mysqld_exporter IDENTIFIED BY ‘ylz@yhkj#2020’
GRANT PROCESS, REPLICATION CLIENT, SELECT ON . TO ‘mysqld_exporter’@’%’;
2、下载采集器并配置
tar -zxvf mysqld_exporter-0.11.0.linux-amd64.tar.gz -C /usr/local
mv mysqld_exporter-0.11.0.linux-amd64 mysqld_exporter
cd mysqld_exporter

vim .my.cnf
[client]
user=mysqld_exporte
password=ylz@yhkj#2020
设置开机自启动

vim /usr/lib/systemd/system/mysql_exporter.service
[Unit]
Description=Prometheus1
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
         --config.my-cnf=/usr/local/mysqld_exporter/.my.cnf  
Restart=on-failure
[Install]
WantedBy=multi-user.target

chown -R prometheus:prometheus /usr/local/mysqld_exporter/
systemctl enable mysql_exporter.service
systemctl start mysql_exporter.service
问题处理:
若启动不了服务(提示密码错误)
在这里插入图片描述
vim /etc/profileexport DATA_SOURCE_NAME='mysqld_exporter:ylz@yhkj#2020@tcp(127.0.0.1:3306)/'source /etc/profile
若还是启动不了 则:
解决方案:手动启动
先设置一个全局变量
export DATA_SOURCE_NAME=‘exporter:zZ#342xz666@tcp(127.0.0.1:3306)/’
source /etc/profile
然后
/usr/local/mysqld_exporter/mysqld_exporter &
也可以在赋权目录下比如/home/prometheus下直接复制.my.cnf

7、oracle采集器安装(linux)

创建用户
create user prometheus identified by ylz#2020;
grant connect,resource,dba to prometheus;
1、 下载oracle客户端安装包,配置安装
https://www.oracle.com/database/technologies/instant-client/downloads.html (版本必须为18以上)
下载basic sqlplus tools这三个文件(其实只要安装basic即可)
在这里插入图片描述
rpm -ivh oracle-instantclient18.5-basic-18.5.0.0.0-3.x86_64.rpm
rpm -ivh oracle-instantclient18.5-sqlplus-18.5.0.0.0-3.x86_64.rpm
默认路劲为 /usr/lib/oracle/版本号/client64
添加tnsnames.ora

vim /usr/lib/oracle/18.5/client64/tnsnames.ora
prometheus =            
  (DESCRIPTION =            
    (ADDRESS_LIST =            
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.18)(PORT = 1521))            
    )            
    (CONNECT_DATA =            
      (SERVER = DEDICATED)            
      (SERVICE_NAME = orcl)            
    )            
  )

在/etc/profile
export ORACLE_HOME=/usr/lib/oracle/18.5/client64
export TNS_ADMIN=O R A C L E H O M E / n e t w o r k e x p o r t L D L I B R A R Y P A T H = ORACLE_HOME/network export LD_LIBRARY_PATH=ORACLEHOME/networkexportLDLIBRARYPATH=ORACLE_HOME/lib
export PATH=O R A C L E H O M E / b i n : ORACLE_HOME/bin:ORACLEHOME/bin:PATH
export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
export DATA_SOURCE_NAME=“onepay/onepay@192.168.44.90:1521/orcl”
source /etc/profile
验证是否可以登录
sqlplus onepay/onepay@IP/orcl source /etc/profile
验证是否可以登录
sqlplus onepay/onepay@IP/orcl
2、安装oracle_exporter
下载地址
https://github.com/iamseth/oracledb_exporter/releases
tar -zxvf oracledb_exporter-0.2.7.tar.gz -C /usr/local
在/etc/profile中添加
export DATA_SOURCE_NAME=“onepay/onepay@10.102.0.240:1521/orcl”
启动(可以根据实际情况设置抓取时间 建议40s以上)
./oracledb_exporter -query.timeout=50

8、alertmanager安装

tar -zxvf alertmanager-0.15.3.linux-amd64.tar.gz
cd alertmanager-0.15.3.linux-amd64
cp -rf alertmanager amtool /usr/sbin/
cp -rf alertmanager.yml /etc/
mkdir -p /var/lib/alertmanager/data
chown prometheus:prometheus /etc/alertmanager.yml /var/lib/alertmanager/ -R
firewall-cmd --zone=public --add-port=9093/tcp --permanent
自启动脚本

vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/sbin/alertmanager --config.file=/etc/alertmanager.yml --storage.path=/var/lib/alertmanager/data                                   
Restart=on-failure
[Install]
WantedBy=multi-user.target

systemctl enable alertmanager.service
systemctl start alertmanager.service
参考配置文件

[root@bogon ~]# cat /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
templates:
- '/etc/alertmanager/wechat.tmpl'
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  receiver: 'wechat'
receivers:
- name: 'wechat'
  wechat_configs:
  - corp_id: 'ww7329a73bf83d1d39'
    to_party: '3'
    agent_id: '1000005'
    api_secret: 'lzWFlJIfbVW8QdlTvcVcFqZ-Rf0nR_gTr43oiS4AENE'
    send_resolved: true
inhibit_rules:
- equal: ['alertname', 'cluster', 'service']
  source_match:
    severity: 'high'
  target_match:
    severity: 'warning'

9、grafana安装部署

下载grafana安装包 https://grafana.com/grafana/download
wget https://dl.grafana.com/oss/release/grafana-7.3.7-1.x86_64.rpm
yum install grafana-7.3.7-1.x86_64.rpm
rpm -ivh --nodeps grafana-7.3.7-1.x86_64.rpm
systemctl daemon-reload
systemctl enable grafana-server
systemctl start grafana-server

10、influxdb部署(详见influxdb)


版权声明:本文为liaos666原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。