ansible一键部署高可用集群项目实战最细教程

服务架构图

在这里插入图片描述

环境配置

IP规划和配置

  • 负载均衡节点

    • nginx1:192.168.146.100

      • 先查看本机网卡:
      nmcli connection show
      
      • 可以看到我的虚拟机上网卡连接名是ens33

    在这里插入图片描述

    • 根据连接名修改网卡IP
    nmcli connection modify ens33 ipv4.address 192.168.146.100/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
    
    • 修改网卡后记得重启该网卡,让配置生效
    nmcli connection down ens33
    nmcli connection up ens33
    
    • nginx2:192.168.146.101

      • 修改网卡IP,按理说虚拟机环境下一张网卡的网卡连接名都是一样的,如果不放心可以重复上述步骤查看,再修改网卡
      nmcli connection modify ens33 ipv4.address 192.168.146.101/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
      
      • 重启网卡,操作跟上面的相同,下面就不再显示了
  • web服务器

    • apache1:192.168.146.102

      nmcli connection modify ens33 ipv4.address 192.168.146.102/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
      
    • apache2:192.168.146.103

      nmcli connection modify ens33 ipv4.address 192.168.146.103/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
      
  • NAS存储节点:

    • NFS服务器:192.168.146.104

      nmcli connection modify ens33 ipv4.address 192.168.146.104/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
      
  • MySQL主从集群

    • master:192.168.146.105

      nmcli connection modify ens33 ipv4.address 192.168.146.105/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
      
    • slave:192.168.146.106

      nmcli connection modify ens33 ipv4.address 192.168.146.106/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
      
  • Prometheus-server节点

    • prometheus:192.168.146.107
    nmcli connection modify ens33 ipv4.address 192.168.146.107/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
    
  • rsync-server节点

    • rsyncd:192.168.146.108
    nmcli connection modify ens33 ipv4.address 192.168.146.108/24 ipv4.gateway 192.168.146.2 ipv4.dns 114.114.114.114 ipv4.method manual
    

ssh免密登录

用一台管理节点,对上面配置的所有服务器节点配置成可以ssh免密登录,为了后续在管理节点上ansible一键搭建整个架构

  • 在管理节点上,生成密钥对

    ssh-keygen
    
    • 出现提示信息一路敲击回车就好
      在这里插入图片描述
  • 检查一下是否成功生成RSA密钥对
    在这里插入图片描述

  • 将公钥id_rsa.pub发送至各个服务器,以下命令在每个服务器包括管理节点都执行一遍,IP换成对应的即可

    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.134    
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.100  
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.101  
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.102  
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.103  
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.104  
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.105
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.106
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.107
    ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.146.108
    
    • 出现以下信息即完成传输
      在这里插入图片描述

    • 可以在其他服务器上检查一下是否存在authorized_keys文件,存在该文件就说明公钥从管理节点接收成功了
      在这里插入图片描述

  • 在管理节点上,可以验证一下是否可以成功ssh免密登录到其他节点

    [root@server1 ~]# ssh root@192.168.146.101
    Last login: Sat Apr  9 10:39:32 2022 from 192.168.146.1    
    
    # 查看当前IP,发现已经成功登录到apache2节点上了
    [root@server1 ~]# ip a | grep -v 'LOOPBACK' |awk '/^[0-9]/{print $2;getline;getline;if ($0~/inet/){print $2}}'
    ens33:
    192.168.146.101/24
    

开始搭建

管理节点

准备工作

  • 以下所有命令均在管理节点上使用ansible工具配置即可

  • 使用yum安装ansible工具

    # 安装epel扩展源
    yum install epel-release.noarch -y
    # 安装ansible
    yum install -y ansible
    
  • 编写ansible主机清单/etc/ansible/hosts文件,在文件末尾添加如下内容:

    [all_ip]
    192.168.146.134 hostname=manager
    192.168.146.100 hostname=nginx1
    192.168.146.101 hostname=nginx2
    192.168.146.102 hostname=apache1
    192.168.146.103 hostname=apache2
    192.168.146.104 hostname=nas rsync_server=192.168.146.108
    192.168.146.105 hostname=master
    192.168.146.106 hostname=slave
    192.168.146.107 hostname=prometheus
    192.168.146.108 hostname=rsyncd
    
    [balancers]
    nginx1 mb=MASTER priority=100
    nginx2 mb=BACKUP priority=98
    
    [web]
    apache1
    apache2
    
    [mysql]
    master master=true
    slave slave=true
    
    [mysql:vars]
    master_ip=192.168.146.105
    slave_ip=192.168.146.106
    
    [nfs]
    nas
    
    [nfs:vars]
    rsync_server=192.168.146.108
    
    [rsync]
    rsyncd
    
    [prometheus]
    prometheus
    
    [alertmanagers]
    prometheus
    
    [node-exporter]
    192.168.146.100
    192.168.146.101
    192.168.146.102
    192.168.146.103
    192.168.146.104
    192.168.146.105
    192.168.146.106
    192.168.146.108
    
    • 检查连通性,所有服务器均响应pong才表示管理节点与所有服务器连通无误

在这里插入图片描述

  • 为所有服务器设置主机名,顺便把selinux和防火墙关了,这里为了偷懒选择直接关掉防火墙和selinux(当然也可以为每类的角色做特定的放行策略和设置对应的安全上下文,更符合生产环境配置)

    [root@server1 ansible]# vim prepare_work.yml 
    
    - name: prepare work
      hosts: all_ip
      tasks:
    
        - name: set hostname
          shell: hostnamectl set-hostname {{ hostname }}
    
        - name: stop firewalld
          service:
            name: firewalld
            state: stopped
            enabled: no
    
        - name: disabled selinux
          selinux:
            state: disabled
            
    # 执行剧本!
    [root@server1 ansible]# ansible-playbook prepare_work.yml
    
  • 为所有服务器主机生成/etc/hosts/解析文件

    [root@server1 ~]# mkdir ansible
    [root@server1 ~]# cd ansible/
    
    # 编写hosts的模板文件
    [root@server1 ansible]# vim hosts.j2
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    {% for host in groups.all_ip %}
    {{ hostvars[host].ansible_ens33.ipv4.address }} {{ hostvars[host].ansible_hostname }}
    {% endfor %}
    
    # 编写剧本yml文件
    [root@server1 ansible]# vim hosts.yml
    - name: update hosts file
      hosts: all_ip
      tasks:
        - name: copy hosts.j2 to other servers
          template:
            src: hosts.j2
            dest: /etc/hosts
    
    # 执行剧本yml文件
    [root@server1 ansible]# ansible-playbook hosts.yml 
    
  • 创建存放角色的目录roles

    [root@server1 ansible]# mkdir roles
    
    • 此时我们的角色目录roles不是ansible默认的角色目录,我们需要到/etc/ansible/ansible.cfg文件中修改,否则ansible找不到我们创建的角色
      在这里插入图片描述
  • 至此环境的准备工作结束了,让我们开始对具体的服务进行配置吧!

搭建数据库

  • 创建数据库msql角色

    [root@server1 ansible]# ansible-galaxy init role/mysql
    - Role role/mysql was created successfully
    
  • 因为数据库我们选择搭建一个主从架构,设计的参数较多,并且有些是重复的,有些是运行ansible的py脚本的节点判断是主节点还是从节点的依据,所以我们可以预先定义一些变量在了主机清单文件中,后续的keepalived高可用搭建在主机清单文件中定义了一些变量,为了主备节点的判断。那么还有一些通用的变量,为了偷偷懒少写重复的值,可以在角色的vars/main.yml下定义:

    [root@server1 ansible]# vim roles/mysql/vars/main.yml
    ---
    # vars file for role/mysql
    mysql_sock: /var/lib/mysql/mysql.sock
    mysql_port: 3306
    repl_user: repl
    repl_passwd: "123456"
    
  • 编写模板文件,用于修改mysql的配置文件my.cnf,此时预先定义的变量就派上用场了,主从节点的配置server-id的值需要不一样的。

    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    symbolic-links=0
    
    {% if master is defined %}
    server-id=1
    innodb_file_per_table=on
    {% else %}
    server-id=2
    {% endif %}
    
    log-bin=master-bin
    binlog-format=ROW
    # 这些是开启gtid的参数
    log-slave-updates=true
    gtid-mode=on
    enforce-gtid-consistency=true
    master-info-repository=TABLE
    relay-log-info-repository=TABLE
    sync-master-info=1
    binlog-rows-query-log_events=1
    
    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    
  • 编写mysql/tasks/main.yml文件,完成一键搭建主从架构的ansible脚本。

    注:

    • 一键搭建MySQL主从复制架构的ansible脚本这里采用了基于GTID的主从复制模式,比较方便,不需要我们去找主节点bin-log的position

    • 并且我这里使用的是yum安装mysql而不是MariaDB,因此需要在安装之前先配置下载mysql源,否则因为yum源列表里没有mysql会默认安装MariaDB替换

    [root@server1 ansible]# vim roles/mysql/tasks/main.yml 
    
    ---
    # tasks file for role/mysql
    - name: yum install wget
      yum:
        name: wget
        state: present
    
    - name: wget mysql repo
      shell: wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
    
    - name: rpm mysql repo
      shell: rpm -ivh mysql-community-release-el7-5.noarch.rpm
      ignore_errors: True
    
    - name: yum install mysql
      yum:
        name: mysql-server
        state: present
    
    - name: config my.cnf
      template:
        src: my.cnf.j2
        dest: /etc/my.cnf
    
    - name: start mysql
      service:
        name: mysqld
        state: restarted
        enabled: yes
    
    - name: install MYSQL-python
      yum:
        name: MySQL-python
        state: present
    
    - name: update mysql root password
      shell: mysql -e "update mysql.user set password=password('123456') where user='root' and host='localhost';flush privileges;"
      ignore_errors: True
    
    - name: create repl user
      mysql_user:
        name: "{{ repl_user }}"
        host: '192.168.146.%'
        password: "{{ repl_passwd }}"
        priv: "*.*:REPLICATION SLAVE"
        state: present
        login_user: 'root'
        login_password: '123456'
        login_host: localhost
        login_unix_socket: "{{ mysql_sock }}"
        login_port: "{{ mysql_port }}"
      when: master is defined
      
    - name: change master to
      mysql_replication:
        mode: changemaster
        master_host: "{{ master_ip }}"
        master_user: "{{ repl_user }}"
        master_password: "{{ repl_passwd }}"
        login_password: '123456'
        login_host: localhost
        login_unix_socket: "{{ mysql_sock }}"
        login_port: "{{ mysql_port }}"
      when: slave is defined
    
    - name: start_slave
      mysql_replication:
        mode: startslave
        login_user: 'root'
        login_password: '123456'
        login_host: localhost
        login_unix_socket: "{{ mysql_sock }}"
        login_port: "{{ mysql_port }}"
      when: slave is defined
    
    - name: get_slave_info
      mysql_replication:
        login_host: localhost
        login_user: root
        login_port: "{{ mysql_port }}"
        login_password: '123456'
        login_unix_socket: "{{ mysql_sock }}"
        mode: getslave
      when: slave is defined
      register: info
    
    - name: dispaly_slave
      debug:
        msg: "Slave_IO_Running={{ info.Slave_IO_Running }}       Slave_SQL_Running={{ info.Slave_SQL_Running }}"
      when: slave is defined
    
    

搭建NAS存储节点

  • 本架构通过NFS服务共享本地的资源给web节点使用,先创建nfs角色

    [root@server1 ansible]# ansible-galaxy init role/nfs
    - Role role/nfs was created successfully
    
  • 创建好角色后,编写nfs/tasks/main.yml文件

    [root@server1 ansible]# vim role/nfs/tasks/main.yml 
    
    ---
    # tasks file for role/nfs
    - name: install nfs,expect
      yum:
        name: "{{ item }}"
        state: present
      loop:
        - nfs-utils
        - rpcbind
        - expect*
    
    - name: config dir for dynamic
      shell: mkdir /data | chmod -Rf 777 /data | echo "/data 192.168.146.0/24(rw,sync,no_root_squash)" >> /etc/exports
    
    
    - name: make dir for exp and sh
      file:
        path: /sh
        state: directory
        
    - name: make dir for backup
      file:
        path: /backup
        state: directory
    
    - name: config expect.sh
      template:
        src: rsync.exp.j2
        dest: /sh/rsync.exp
        
    - name: config beifen.sh
      template:
        src: beifen.sh.j2
        dest: /sh/beifen.sh
    
    - name: chmod beifen.sh
      file:
        path: /sh/beifen.sh
        mode: '0755'
        
    - name: cron tab
      shell: echo "0 1 * * * root /sh/beifen.sh" >> /etc/crontab                           
    
    - name: start nfs
      service:
        name: "{{ item }}"
        state: restarted
        enabled: yes
      loop:
        - rpcbind
        - nfs-server
    
  • 准备expect脚本的模板文件rsync.exp.j2

    [root@manager ansible]# vim roles/nfs/templates/rsync.exp.j2
    
    #!/usr/bin/expect
    set mulu [lindex $argv 0]
    set timeout 10
    spawn rsync -avzr /backup/$mulu root@{{ rsync_server }}::backup_server
    expect Password
    send "123456\r"
    expect eof
    
    
  • 准备备份脚本的模板文件beifen.sh.j2

    [root@manager ansible]# vim roles/nfs/templates/beifen.sh.j2
    
    #!/bin/bash
    # 准备压缩文件的目录
    mulu=`ip a | grep global|awk -F'[ /]+' '{print $3}'`_`date +%F`
    echo $mulu
    mkdir -pv /backup/$mulu &> /dev/null
    # 打包待发送的数据
    tar zcf /backup/$mulu/conf.tar.gz /data/* &> /dev/null
    touch /backup/$mulu
    # 发送数据
    # 这一句就是执行expect脚本
    expect /sh/rsync.exp $mulu
    # 保留七天以内的数据
    find /backup -mtime +7 -delete
    

搭建备份节点

  • 备份选择rsync服务端,定时接收NAS存储节点推送的备份文件。因此先创建rsync角色

    [root@manager ansible]# ansible-galaxy init roles/rsync
    - Role roles/rsync was created successfully
    
  • 编写文件rsyncd.conf

    [root@manager ~]# vim ansible/roles/rsync/files/rsyncd.conf 
    
    [backup_server]
    path = /backup
    uid = root
    gid = root
    max connections = 2
    timeout = 300
    read only = false
    auth users = root
    secrets file = /etc/rsync.passwd
    strict modes = yes
    use chroot = yes
    
    
  • 编写rsync/tasks/main.yml文件

    [root@manager ansible]# vim roles/rsync/tasks/main.yml 
    
    ---
    # tasks file for roles/rsync
    - name: yum install rsync
      yum:
        name: rsync
        state: present
    
    - name: config rsyncd.conf
      copy:
        src: rsyncd.conf
        dest: /etc/rsyncd.conf
    
    - name: make dir for backup
      file:
        path: /backup
        state: directory
    
    - name: prepare rsync.passwd
      shell: echo "root:123456" >> /etc/rsync.passwd | chmod 600 /etc/rsy
    nc.passwd
    
    - name: start rsync
      service:
        name: rsyncd
        state: started
        enabled: yes
    

搭建web节点

  • 因为本架构是LAMP环境,web服务器选用apache,因此先创建apache角色

    [root@server1 ansible]# ansible-galaxy init role/apache
    - Role role/apache was created successfully
    [root@server1 ansible]# ll role/
    总用量 0
    drwxr-xr-x. 10 root root 154 49 14:30 apache
    drwxr-xr-x. 10 root root 154 49 14:16 nginx
    
  • 编写apache/tasks/main.yml文件

    [root@server1 ansible]# vim role/apache/tasks/main.yml 
    
    ---
    # tasks file for role/apache
    - name: yum install lamp environment
      yum:
        name: httpd,php-fpm,php-mysql,mod_php
        state: present
    
    - name: start httpd
      service:
        name: httpd
        state: restarted
        enabled: yes
    
    - name: start php-fpm
      service:
        name: php-fpm
        state: restarted
        enabled: yes
    
    - name: Client Install nfs Server
      yum:
        name: nfs-utils
        state: present
    
    - name: mount nfs resources
      mount:
        src: nas:/data
        path: /var/www/html
        fstype: nfs
        opts: defaults
        state: mounted
    
    

搭建负载均衡节点

  • 因为本架构的负载均衡实现选择的是nginx,那就先创建nginx角色

    # 创建nginx角色
    [root@server1 ansible]# ansible-galaxy init role/nginx
    - Role role/nginx was created successfully
    
  • 编写模板文件,为前端的两台nginx负载均衡节点创建负载均衡的配置文件lb.conf,为了实现会话保持,在这顺便使用最简单的方法:使用基于ip_hash算法的负载均衡

    [root@server1 ansible]# vim role/nginx/templates/lb.conf.j2
    upstream webservers{
    server apache1;
    server apache2;
    ip_hash;
    }
    
    server{
    
    location / {
    proxy_pass http://webservers;
    proxy_next_upstream error timeout invalid_header http_500 http_502 http_504;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
    }
    
  • 创建好nginx角色和模板文件之后,那就来编写nginx/tasks/main.yml文件吧

    [root@server1 ansible]# vim role/nginx/tasks/main.yml 
    
    ---
    # tasks file for nginx
    - name: yum install epel
      yum:
        name: epel-release.noarch
        state: present
    
    - name: yum install nginx
      yum:
        name: nginx
        state: present
    
    - name: config lb.conf
      template:
        src: lb.conf.j2
        dest: /etc/nginx/conf.d/lb.conf
    
    - name: start nginx
      service:
        name: nginx
        state: restarted
        enabled: yes
    
    

配置keepalived高可用

  • 本架构前端负载均衡节点需要做高可用,因此创建keepalived角色,VIP:192.168.146.200

    [root@server1 ansible]# ansible-galaxy init role/keepalived
    - Role role/keepalived was created successfully
    
  • 接着编写模板文件,用于在负载均衡节点上创建keepalived的配置文件

    [root@server1 ansible]# vim role/keepalived/templates/keepalived.conf.j2
    
    !Configuration File for keepalived
    
    global_defs {
        router_id {{ ansible_hostname }}
    }
    
    vrrp_script check_nginx {
        script "/usr/local/src/check_nginx_pid.sh"
        interval 1
        weight -10
    }
    
    vrrp_instance VI_1 {
        state {{ mb }}
        interface ens33
        virtual_router_id 10
        priority {{ priority }}
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
    }
    
    track_script {
        check_nginx
    }
    
        virtual_ipaddress {
            192.168.146.200
    }
    }
    
  • 编写用于检测nginx服务状态的脚本,以便于keepalived实现VIP飘移

    [root@manager ansible]# vim roles/keepalived/files/check_nginx_pid.sh
    #!/bin/bash
    nginx_process_num=`ps -C nginx --no-header | wc -l`
    if [ $nginx_process_num -eq 0 ];then
      exit 1
    else
      exit 0
    fi
    
  • 创建好keepalived角色和模板文件后,开始编写keepalived/tasks/main.yml文件

    [root@server1 ansible]# vim role/keepalived/tasks/main.yml 
    
    ---
    # tasks file for role/keepalived
    - name: yum install keepalived
      yum:
        name: keepalived
        state: present
    
    - name: copy check_nginx_pid.sh
      copy:
        src: check_nginx_pid.sh
        dest: /usr/local/src/check_nginx_pid.sh
    
    - name: chmod sh
      file:
        path: /usr/local/src/check_nginx_pid.sh
        mode: '0755'
    
    - name: config keepalived.conf
      template:
        src: keepalived.conf.j2
        dest: /etc/keepalived/keepalived.conf
    
    - name: start keepalived
      service:
        name: keepalived
        state: restarted
        enabled: yes
    
    

配置Prometheus监控

这里的为服务集群配置Prometheus监控服务,需要安装Prometheus-server负责拉取数据并展示,再在各个节点上安装node-exporter用于监控服务器的CPU、内存、机器开启状态,还需要安装alertmanager用于告警。

因此下面创建了三个角色,负责上述的三项工作,来完成Prometheus监控服务的配置

准备二进制软件包

这里的Prometheus监控服务的组件都使用二进制包来安装,但是从官网上下载二进制包非常慢,并且用tar解压时会有归档异常,如果用ansible工具的脚本直接编写语句,在受控节点下载二进制包并且解压,首先会非常慢,其次解压会因为报错而不执行(本人亲测,若大佬们有更好的方法请指教!!!)

  • 这里作为小白的我没有找到更好的解决办法,因此我只能选择将本地事先准备好的Prometheus组件的压缩包上传到ansible管理节点,并且将每个组件的压缩包先解压到创建的角色如:roles/prometheus/files目录下,然后在编写角色的任务时,用copy模块直接到受控节点上
    在这里插入图片描述
# 解压压缩包
[root@manager ansible]# tar -zxvf prometheus-2.25.0.linux-amd64.tar.gz -C roles/prometheus/files/
prometheus-2.25.0.linux-amd64/
prometheus-2.25.0.linux-amd64/consoles/
prometheus-2.25.0.linux-amd64/consoles/index.html.example
prometheus-2.25.0.linux-amd64/consoles/node-cpu.html
prometheus-2.25.0.linux-amd64/consoles/node-disk.html
prometheus-2.25.0.linux-amd64/consoles/node-overview.html
prometheus-2.25.0.linux-amd64/consoles/node.html
prometheus-2.25.0.linux-amd64/consoles/prometheus-overview.html
prometheus-2.25.0.linux-amd64/consoles/prometheus.html
prometheus-2.25.0.linux-amd64/console_libraries/
prometheus-2.25.0.linux-amd64/console_libraries/menu.lib
prometheus-2.25.0.linux-amd64/console_libraries/prom.lib
prometheus-2.25.0.linux-amd64/prometheus.yml
prometheus-2.25.0.linux-amd64/LICENSE
prometheus-2.25.0.linux-amd64/NOTICE
prometheus-2.25.0.linux-amd64/prometheus
prometheus-2.25.0.linux-amd64/promtool

# 报错信息不用管!!!
gzip: stdin: unexpected end of file
tar: 归档文件中异常的 EOF
tar: 归档文件中异常的 EOF
tar: Error is not recoverable: exiting now

# 解压好了!
[root@manager ansible]# ll roles/prometheus/files/
总用量 55608
-rw-r--r--. 1 root root     1606 411 20:20 node.yml
drwxr-xr-x  4 3434 3434      132 218 2021 prometheus-2.25.0.linux-amd64
  • 安装其他组件同理
tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C roles/alertmanager/files/

tar -zxvf node_exporter-1.3.1.linux-amd64.tar.gz -C roles/node-exporter/files/

配置Prometheus-server节点

  • 老样子,第一步创建角色

    ansible-galaxy init roles/prometheus
    
  • 为Prometheus服务配置文件编写模板文件prometheus.yml.j2

    [root@manager ansible]# vim roles/prometheus/templates/prometheus.yml.j2 
    
    global:
      scrape_interval:     30s
      evaluation_interval: 30s
      query_log_file: ./promql.log
    
    
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
    {% for alertmanager in groups['alertmanagers'] %}
          - {{ alertmanager }}:9093
    {% endfor %}
    rule_files:
      - "rules/*.yml"
    
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
        - targets:
    {% for prometheu in groups['prometheus'] %}
          - "{{ prometheu }}:9090"
    {% endfor %}
      - job_name: "node"
        static_configs:
        - targets:
    {% for node in groups['node-exporter'] %}
          - "{{ node }}:9100"
    {% endfor %}
    
  • 为Prometheus配置成service项编写模板文件prometheus.service.j2

    [root@manager ansible]# vim roles/prometheus/templates/prometheus.service.j2 
    
    [Unit]
    Description=Prometheus
    After=network.target
    
    [Service]
    WorkingDirectory=/usr/local/prometheus
    ExecStart=/usr/local/prometheus/prometheus
    ExecReload=/bin/kill -HUP $MAINPID
    ExecStop=/bin/kill -KILL $MAINPID
    Type=simple
    KillMode=control-group
    Restart=on-failure
    RestartSec=3s
    
    [Install]
    WantedBy=multi-user.target
    
  • 编写告警规则文件node.yml

    • 因为node.yml文件中需要用到jinjia2语法写监控项,所以就不用template模块来在受控节点上编写该文件,而是直接使用copy模块将编写好的文件复制过去即可
    [root@manager ansible]# vim roles/prometheus/files/node.yml 
    
    groups:
    - name: node.rules   # 报警规则组名称
        expr: up == 0
        for: 30s  #持续时间,表示持续30秒获取不到信息,则触发报警
        annotations:
          summary: "Instance {{ $labels.instance }} down" # 自定义摘要
    
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "{{$labels.instance}}: {{$labels.mountpoint }} 分区使>用过高"
          description: "{{$labels.instance}}: {{$labels.mountpoint }} 分>区使用大于 80% (当前值: {{ $value }})"
    
      - alert: node Memory
        expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "{{$labels.instance}}: 内存使用过高"
          description: "{{$labels.instance}}: 内存使用大于 80% (当前值: {{ $value }})"
    
      - alert: node CPU
        expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 80
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "{{$labels.instance}}: CPU使用过高"
          description: "{{$labels.instance}}: CPU使用大于 80% (当前值: {{ $value }})"
    
    
  • 编写prometheus/tasks/main.yml

    [root@manager ansible]# vim roles/prometheus/tasks/main.yml 
    
    ---
    # tasks file for roles/promethues
    - name: copy prometheus.tar.gz
      copy:
        src: prometheus-2.25.0.linux-amd64
        dest: /usr/local/
    
    - name: create soft link
      file:
        src: /usr/local/prometheus-2.25.0.linux-amd64
        dest: /usr/local/prometheus
        state: link
    
    - name: chmod file
      file:
        path: /usr/local/prometheus/prometheus
        mode: '0755'
    
    - name: copy service file
      template:
        src: prometheus.service.j2
        dest: /etc/systemd/system/prometheus.service
    
    - name: copy config prometheus yaml
      template:
        src: prometheus.yml.j2
        dest: /usr/local/prometheus/prometheus.yml
    
    - name: create rules dir
      file:
        path: /usr/local/prometheus/rules
        state: directory
    
    - name: copy rules yaml
      copy:
        src: node.yml
        dest: /usr/local/prometheus/rules/node.yml
    
    - name: start prometheus
      service:
        name: prometheus
        state: started
        enabled: yes
    

配置node-exporter探针

  • 创建角色

    ansible-galaxy init roles/node-exporter
    
  • 编写模板文件node_exporter.service.j2

    [root@manager ansible]# vim roles/node-exporter/templates/node_exporter.service.j2 
    
    [Unit]
    Description=Node Exporter
    After=network.target
    
    [Service]
    WorkingDirectory=/prometheus_exporter/node_exporter/
    ExecStart=/prometheus_exporter/node_exporter/node_exporter
    ExecStop=/bin/kill -KILL $MAINPID
    Type=simple
    KillMode=control-group
    Restart=on-failure
    RestartSec=3s
    
    [Install]
    WantedBy=multi-user.target
    
  • 编写node-exporter/tasks/main.yml

    [root@manager ansible]# vim roles/node-exporter/tasks/main.yml 
    
    ---
    # tasks file for roles/node-exporter
    - name: create dir
      file:
        path: /prometheus_exporter
        state: directory
    
    - name: copy file
      copy:
        src: node_exporter-1.3.1.linux-amd64
        dest: /prometheus_exporter
    
    
    - name: create link
      file:
        src: /prometheus_exporter/node_exporter-1.3.1.linux-amd64
        dest: /prometheus_exporter/node_exporter
        state: link
    
    - name: chmod file
      file:
        path: /prometheus_exporter/node_exporter/node_exporter
        mode: '0755'
    
    - name: copy service file
      template:
        src: node_exporter.service.j2
        dest: /etc/systemd/system/node_exporter.service
    
    - name: start node_exporter
      service:
        name: node_exporter
        state: restarted
        enabled: yes
    

配置alertmanager组件

  • 创建角色

    ansible-galaxy init roles/alertmanager
    
  • 编写模板文件alertmanager.service.j2

    [root@manager ansible]# vim roles/alertmanager/templates/alertmanager.service.j2 
    
    [Unit]
    Description=AlertManager
    After=network.target
    
    [Service]
    WorkingDirectory=/usr/local/alertmanager/
    ExecStart=/usr/local/alertmanager/alertmanager
    ExecReload=/bin/kill -HUP $MAINPID
    ExecStop=/bin/kill -KILL $MAINPID
    Type=simple
    KillMode=control-group
    Restart=on-failure
    RestartSec=3s
    
    [Install]
    WantedBy=multi-user.target
    
    
  • 编写模板文件alertmanager.yml.j2

    • smtp_auth_password字段的密码不是QQ邮箱的密码,到QQ邮箱的设置 ——> 账户里,找到下图的生成授权码,填写在该字段上

      在这里插入图片描述

    [root@manager ansible]# vim roles/alertmanager/templates/alertmanager.yml.j2 
    
    global:
      resolve_timeout: 5m
      smtp_from: "2249807270@qq.com"
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_auth_username: "2249807270@qq.com"
      smtp_auth_password: "vwsorrnckxwjdhgf"
      smtp_require_tls: false
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 24h
      receiver: 'default-receiver'
    
    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: '2249807270@qq.com'
    
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']
    
  • 编写alertmanager/tasks/main.yml

    [root@manager ansible]# vim roles/alertmanager/tasks/main.yml 
    
    ---
    # tasks file for roles/alertmanager
    - name: copy file
      copy:
        src: alertmanager-0.21.0.linux-amd64
        dest: /usr/local/
    
    
    - name: create link
      file:
        src: /usr/local/alertmanager-0.21.0.linux-amd64
        dest: /usr/local/alertmanager
        state: link
    
    - name: chmod file
      file:
        path: /usr/local/alertmanager/alertmanager
        mode: '0755'
    
    - name: copy service file
      template:
        src: alertmanager.service.j2
        dest: /etc/systemd/system/alertmanager.service
    
    - name: copy config yaml
      template:
        src: alertmanager.yml.j2
        dest: /usr/local/alertmanager/alertmanager.yml
    
    - name: start server
      service:
        name: alertmanager
        state: restarted
        enabled: yes
    

总剧本

  • 到此,整个架构的所有服务,都已经用角色编辑好了!现在我们最后一步就剩编写一个总剧本,也是最简单的,把所有的编写好的角色都调用一遍就好了

    [root@server1 ansible]# vim all.yml
    
    - name: config mysql replication
      hosts: mysql
      roles:
      - mysql
    
    - name: config nfs
      hosts: nfs
      roles:
      - nfs
    
    - name: config rsync
      hosts: rsync
      roles:
      - rsync
    
    - name: config lamp
      hosts: web
      roles:
      - apache
    
    - name: config lb
      hosts: balancers
      roles:
      - nginx
      - keepalived
    
    - name: install prometheus
      hosts: prometheus
      roles:
        - prometheus
        - alertmanager
    
    - name: install node-exporter
      hosts: node-exporter
      roles:
        - node-exporter
    
  • 一键安装!

    ansible-playbook all.yml
    

搭建博客

  • 这里的博客使用typecho,现在到我们搭建好的NFS服务器上,解压typecho压缩包到共享的资源目录/data/

    [root@nas ~]# yum install -y wget
    [root@nas ~]# wget http://typecho.org/downloads/1.1-17.10.30-release.tar.gz
    
    [root@nas ~]# ll
    总用量 11156
    -rw-r--r--. 1 root root   487445 1030 2017 1.1-17.10.30-release.tar.gz
    
    [root@nas ~]# tar -zxvf 1.1-17.10.30-release.tar.gz
    
    [root@nas ~]# ll
    总用量 11156
    -rw-r--r--. 1 root root    487445 1030 2017 1.1-17.10.30-release.tar.gz
    drwxr-xr-x. 6  501 games      111 1030 2017 build
    
    [root@nas ~]#  mv build/ typecho
    
    [root@nas ~]# cp -r typecho/ /data/
    
    [root@nas ~]# ll /data/
    总用量 0
    drwxr-xr-x. 6 root root 111 49 20:42 typecho
    
  • 到MySQL主库服务器上,创建好用于博客软件登录和博客软件使用的数据库,并授予好权限

    [root@master ~]# mysql -uroot -p123456
    
    mysql> create database bbs;
    
    mysql> grant all on bbs.* to 'bbs'@'192.168.146.%' identified by '123456';
    
    mysql> flush privileges;
    
  • 现在,直接打开浏览器,输入192.168.146.200/typecho,就会跳转到开启typecho博客界面了
    在这里插入图片描述

  • 让我们点击下一步,输入连接的数据库基本信息
    在这里插入图片描述

  • 再把管理员信息输入完成,即可开始安装博客,后续跟着提示信息一步步安装即可。

总结一下

本实战项目服务架构不小,用ansible一键部署,编写剧本和角色的yaml文件工作量不小,而且容易眼花,敲的时候要小心噢~~~

新人第一篇博客,制作不易!求点赞~~~不足的地方欢迎大佬指教!!!


版权声明:本文为qq_49010564原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。