【星海随笔】Promethes(三) metrics

本文主要是介绍【星海随笔】Promethes(三) metrics，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

注意： WEB UI / Grafana / API client 功能与实用型与Alertmanager 有重叠。

官网：
http://prometheus.io
node_exporter
https://prometheus.io/download/#node_exporter

今天突然思考了一下运维的重点
运维要想做的好->需要自动化做的好
自动化做的好->需要不断的精细化
自动化精细化做的好->需要对业务的深度理解。
业务的深度理解->需要深度的经验和业务交流能力。

数据存储概念：metrics
Promethes 对采集过来的数据统一称为 metrics 数据。

K/V 形式储存
当一个exporter(node_exporter) 被安装和运行在被监控的服务器上后。
使用简单的 curl 命令就可以看到 exporer 帮我们采集到 metrics 数据的样子，以k / v 的形式展现和保存。

查看已经被监控的机器

netstat -tnlp | grep 9100

查看信息

curl localhost:9100/metrics

形式为两行注释对下面返回的数据的解释

# HELP process_max_fds Maximum number of open file descriptions.
# TYPE process_max_fds gauge
process_max_fds 65535

prometheus的 exporter

官网提供了很多类型的exporter

blackbox_exporter #服务级别的数据抓取。
consul_exporter
graphite_exporter
haproxy_exporter #专门用于监控haproxy的exporter
memcached_exporter
mysqld_exporter
node_exporter
statsd_exporter

exporters 下载之后，就提供了启动命令，一般直接运行带上一定的参数。
例如：

node_boot_time：系统启动时间
node_cpu：系统CPU使用量
node disk*：磁盘IO
node filesystem*：文件系统用量
node_load1：系统负载
node memeory*：内存使用量
node network*：网络带宽
node_time：当前系统时间
go_：node exporter中go相关指标
process_：node exporter自身进程相关运行指标

push(被动拉取)

pushgateway 安装在客户端或者服务端(其实装哪里都无所谓)
pushgateway 本身也是一个 http 服务器
自己写脚本抓取自己想要监控的数据，然后推送到 pushgateway(HTTP 更多使用的是POST) ，再由 pushgateway 推送到 prometheus 服务器。
node exporter提供的方法已经很多了，例如：硬件、资源、网络等。但是，我们有时候还需要采集一些定制化资源，例如用户方面，特定场景的特定资源的使用情况等。

CPU相关信息

监控需要首先对linux底层极为了解
例如：user_time / sys_time / nice time / idle time / irq /
用户时间、系统内核时间、nice使用的时间、空闲时间、中断事件

node_cpu

(1-(  (sum (increase (node_cpu{mode="idle"}[1m]) ) by (instance)) / (sum(increase(node_cpu[1m]) ) by (instance) )  )) * 100

CPU的使用率 = （所有非空闲状态的CPU使用时间总和 / （所有状态CPU时间的总和））
(user(8mins) + sys(1.5mins) + iowa(0.5min) + 0 + 0 + 0 + 0  ) / (30mins)

针对空闲时间
idle(20mins) / (30mins) 

increase 函数使用的是 CPU使用的是 counter 累加递进的类型。
[1m] #代表1分钟内的增量
increase(node_cpu{ mode=“idle” }[1m]) # 代表所有空闲CPU 1分钟的增量。
increase( node_cpu[1m] ) #代表所有CPU1分钟的增量。
#注：上面会把每个核的CPU全部显示出来。32核就会显示32条线

sum 计算总和。把所有的线总和为一条线。

sum 函数后 + by(instance)
可以将加合到一起的数值进行一层或多层拆分。
instance 代表的是机器名

gauge类型的数据

count_netstat_wait_connections

直接输入key就会出现想要的结果，是单点极值类型。
返回的数据中

count_netstat_wait_connections(exported_instance="A",exported_jos="pushgateway1", instance="localhost:9092",job="pushgateway")
其中exported_instance=“A” ，代表是监控的机器是名为A 的机器。

命令行

count_netstat_wait_connection !node_exporter (TCP wait_connect 数)
自定义的使用shell脚本 + pushgateway 的方法。

时间同步

prometheus是 T-S 时间序列数据库

timedatectl set-timezone Asia/Shanghai
# 设置NTP时间同步
ntpdate -u cn.pool.ntp.org

docker安装

#安装docker
yum install -y docker-io#下载镜像包
docker pull prom/node-exporter
docker pull prom/prometheus

tar包安装

cp -rf prometheus-xxx.linux-amd64  /usr/local/prometheus
#递归复制，如果有失败就放弃。适用于已知目录的复制。

#启动Prometheus
#启动node-exporter#新建目录Prometheus，编辑配置文件prometheus.yml。
mkdir /opt/prometheus
cd /opt/prometheus/
vim prometheus.yml启动prometheus：
docker run -d -p 9100:9100 \
-v "/proc:/host/proc:ro" \
-v "/sys:/host/sys:ro" \
-v "/:/rootfs:ro" \
--net="host" \
prom/node-exporter

#启动prometheus
docker run -d \
-p 9090:9090 \
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus#访问
<ip>:9090/graph
<ip>:9090/targets