一、Prometheus介绍
Prometheus是一个开源的系统监控和报警系统,在kubernetes容器管理系统中,通常会搭配prometheus进行监控,同时也支持多种exporter采集数据,还支持pushgateway进行数据上报,Prometheus性能足够支撑上万台规模的集群。
二、grafana介绍
Prometheus的重要组件之一,提供监控仪表盘,可视化监控数据的功能
三、locust介绍
Locust 是一个开源负载测试工具,使用的‘协程’方式,有webui图形界面、无图形界面、分布式等多种运行方式。
为啥要使用Prometheus和grafana,locust 自带检测结果太丑,数据无法持久化
具体操作
1.先参考官网提供的prometheus_exporter.py
# coding: utf8
import six
from itertools import chain
from flask import request, Response
from locust import stats as locust_stats, runners as locust_runners
from locust import User, task, events
from prometheus_client import Metric, REGISTRY, exposition
# This locustfile adds an external web endpoint to the locust master, and makes it serve as a prometheus exporter.
# Runs it as a normal locustfile, then points prometheus to it.
# locust -f prometheus_exporter.py --master
# Lots of code taken from [mbolek's locust_exporter](https://github.com/mbolek/locust_exporter), thx mbolek!
class LocustCollector(object):
registry = REGISTRY
def __init__(self, environment, runner):
self.environment = environment
self.runner = runner
def collect(self):
# collect metrics only when locust runner is spawning or running.
runner = self.runner
if runner and runner.state in (locust_runners.STATE_SPAWNING, locust_runners.STATE_RUNNING):
stats = []
for s in chain(locust_stats.sort_stats(runner.stats.entries), [runner.stats.total]):
stats.append({
"method": s.method,
"name": s.name,
"num_requests": s.num_requests,
"num_failures": s.num_failures,
"avg_response_time": s.avg_response_time,
"min_response_time": s.min_response_time or 0,
"max_response_time": s.max_response_time,
"current_rps": s.current_rps,
"median_response_time": s.median_response_time,
"ninetieth_response_time": s.get_response_time_percentile(0.9),
# only total stats can use current_response_time, so sad.
#"current_response_time_percentile_95": s.get_current_response_time_percentile(0.95),
"avg_content_length": s.avg_content_length,
"current_fail_per_sec": s.current_fail_per_sec
})
# perhaps StatsError.parse_error in e.to_dict only works in python slave, take notices!
errors = [e.to_dict() for e in six.itervalues(runner.stats.errors)]
metric = Metric('locust_user_count', 'Swarmed users', 'gauge')
metric.add_sample('locust_user_count', value=runner.user_count, labels={})
yield metric
metric = Metric('locust_errors', 'Locust requests errors', 'gauge')
for err in errors:
metric.add_sample('locust_errors', value=err['occurrences'],
labels={'path': err['name'], 'method': err['method'],
'error': err['error']})
yield metric
is_distributed = isinstance(runner, locust_runners.MasterRunner)
if is_distributed:
metric = Metric('locust_slave_count', 'Locust number of slaves', 'gauge')
metric.add_sample('locust_slave_count', value=len(runner.clients.values()), labels={})
yield metric
metric = Metric('locust_fail_ratio', 'Locust failure ratio', 'gauge')
metric.add_sample('locust_fail_ratio', value=runner.stats.total.fail_ratio, labels={})
yield metric
metric = Metric('locust_state', 'State of the locust swarm', 'gauge')
metric.add_sample('locust_state', value=1, labels={'state': runner.state})
yield metric
stats_metrics = ['avg_content_length', 'avg_response_time', 'current_rps', 'current_fail_per_sec',
'max_response_time', 'ninetieth_response_time', 'median_response_time', 'min_response_time',
'num_failures', 'num_requests']
for mtr in stats_metrics:
mtype = 'gauge'
if mtr in ['num_requests', 'num_failures']:
mtype = 'counter'
metric = Metric('locust_stats_' + mtr, 'Locust stats ' + mtr, mtype)
for stat in stats:
# Aggregated stat's method label is None, so name it as Aggregated
# locust has changed name Total to Aggregated since 0.12.1
if 'Aggregated' != stat['name']:
metric.add_sample('locust_stats_' + mtr, value=stat[mtr],
labels={'path': stat['name'], 'method': stat['method']})
else:
metric.add_sample('locust_stats_' + mtr, value=stat[mtr],
labels={'path': stat['name'], 'method': 'Aggregated'})
yield metric
@events.init.add_listener
def locust_init(environment, runner, **kwargs):
print("locust init event received")
if environment.web_ui and runner:
@environment.web_ui.app.route("/export/prometheus")
def prometheus_exporter():
registry = REGISTRY
encoder, content_type = exposition.choose_encoder(request.headers.get('Accept'))
if 'name[]' in request.args:
registry = REGISTRY.restricted_registry(request.args.get('name[]'))
body = encoder(registry)
return Response(body, content_type=content_type)
REGISTRY.register(LocustCollector(environment, runner))
class Dummy(User):
@task(20)
def hello(self):
pass使用方式两种,
a、直接修改改文件,将自己的压测类替换Dummy类,当启动压测,自动会启动ip:/export/prometheus的服务,该服务的数据就是我们需要收集的数据
b、以master启动该脚本,压测脚本以worker形式启动,指向master为启动该脚本的地址
b优势在于,监听服务可以永远启动,第一种方式只有压测时才启动
调用/export/prometheus的截图:

2.上面图有内容,说明locust-explorer已经搞定,继续安装Prometheus和grafana 。。。此处省略一万字
3.启动Prometheus后,浏览器输入服务器ip:9090即可打开页面,表示启动正常

4.配置Prometheus.yml,增加数据源为上面的接口地址
- job_name: locust
metrics_path: '/export/prometheus'
static_configs:
- targets: ['slave机ip:8089']
labels:
instance: locust配置后重启prometheus服务,然后再刷新页面:

5.一切就绪,开始配置grafana
增加数据源:

导入仪表盘,推荐使用https://grafana.com/grafana/dashboards/12081
一切就绪,开始压测
执行压测:
1.运行master机:locust --master --web-host=本机ip -f prometheus_exporter.py
2.检查是否正在监听:
cmd中执行netstat -ano|findstr 8089,发现当前服务器ip和master机ip正在ESTABLISH着8089端口
浏览器输入master机ip:8089/export/prometheus可查看到prometheus数据
3.运行负载机:go run test.go --master-host=master机ip --master-port=5557
4..浏览器输入master机ip:8089,输入总user数+ramp up数,开始压测
5.浏览器打开服务器ip:3000,查看仪表盘,正常显示当前locust的执行数据
