首页
常用命令
About Me
推荐
weibo
github
Search
1
linuxea:gitlab-ci之docker镜像质量品质报告
48,760 阅读
2
linuxea:如何复现查看docker run参数命令
19,489 阅读
3
Graylog收集文件日志实例
17,808 阅读
4
git+jenkins发布和回滚示例
17,364 阅读
5
linuxea:jenkins+pipeline+gitlab+ansible快速安装配置(1)
17,353 阅读
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack
Open-Falcon
Prometheus
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
音乐
影视
music
Internet Consulting
最后的净土
软件交付
持续集成
gitops
devops
登录
Search
标签搜索
kubernetes
docker
zabbix
Golang
mariadb
持续集成工具
白话容器
linux基础
nginx
elk
dockerfile
Gitlab-ci/cd
最后的净土
基础命令
jenkins
docker-compose
gitops
haproxy
saltstack
Istio
marksugar
累计撰写
675
篇文章
累计收到
140
条评论
首页
栏目
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack
Open-Falcon
Prometheus
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
音乐
影视
music
Internet Consulting
最后的净土
软件交付
持续集成
gitops
devops
页面
常用命令
About Me
推荐
weibo
github
搜索到
53
篇与
zabbix
的结果
2019-05-25
linuxea:zabbix 4.2 使用Simple check监控VIP和端口
在实际使用中,通常我们会对一些端口进行监控,比如nginx,mariadb,php等。要完成一个端口监控是简单容易的。net.tcp.listen[80]如上,既可对tcp 80获取状态,而后配置Triggeres做告警处理即可{HOST.NAME} nginx is not running {nginx_80_port:net.tcp.listen[80].last()}<>1 Enabled但是,现在。我要说的并不是这个。在一个环境中一般来讲会有负载均衡和HA冗余,就不能避免的要使用VIP,顾名思义就是虚拟IP。通过操作虚拟IP的飘逸来完成故障后的IP上业务的切换。通常而言,每个VIP都会伴随一个端口。如: lvs,haproxy,nginx。阅读这篇文章,你将了解如何监控VIP:PORT。尤为提醒的是,通过某一台机器添加一次模板就可以对VIP和VIP端口进行检查。创建模板1,Configuration->Templates->Create template 输入名称即可。如:Template App Telnet VIP2,并且Create application ,如:Telnet VIP3,创建Itemstype: Simple check为什么是Simple check,而不是telent?相比较telnet与 Simple check,后者要比前者好用的多,在我看来,不需要写脚本,简单配置就能够完成我们需求。所以,我们这里仍然使用的是: net.tcp.service。推荐阅读官网的service_check_details与simple_checksKey: net.tcp.service[tcp,10.10.195.99,9200]意思为:获取10.10.10.195.99ip地址的tcp 9200端口状态在Applications中选择Telnet VIP除此之外,创建Triggers创建Triggers0 - 服务停止,1 - 服务正在运行 ,由此,我们的触发器就如下:{Template App Telnet VIP:net.tcp.service[tcp,10.10.195.99,9200].last(#3)}=0如果最新三次回去的值都是0就触发告警。如下图即可:到此,使用zabbix简单监控ip和端口已经完成。自动发现参考自动发现基于zabbix4.2 zabbix Discovery 教程延伸阅读service_check_detailssimple_checks阅读更多zabbix监控教程docker教程zabbix-complete-workslinuxea:Zabbix-complete-works之Zabbix基础安装配置linuxea:zabbix4.2新功能之TimescaleDB数据源测试
2019年05月25日
3,392 阅读
0 评论
0 点赞
2019-05-07
linuxea:zabbix4.2新功能之TimescaleDB数据源测试
Zabbix发布了4.2版本,带有一系列新功能。在Zabbix自己的网站上有一个很好的概述,但一定要检查文档中的“Zabbix 4.2中的新功能”部分,因为它更完整!一个新功能是TimescaleDB的实验支持。一个当前流行的开源时间序列的SQL数据库 ,TimescaleDB打包为PostgreSQL扩展。这套数据库由PostgreSQL构建,提供跨时间和空间的自动分区(分区键),同时保留标准的PostgreSQL接口以及完整的SQL支持。前言为什么要使用时间序列的SQL数据库?以及如何配置它,以及它是什么?首先是数据库分区,但要了解分区,我们需要考虑下zabbix server的历史数据。假设此时有5个历史数据和两个表,数据的时间设置在前段中进行配置,他可以是任何时间。然而现在,我们说的是zabbix的内部趋势数据的House keeping,而House keepingje是控制从数据库中删除过时的信息的。而一个任务去一个数据库内扫描所有的历史和趋势表以及更老的数据,在指定删除。但是这个过程中会变得缓慢,因为他将会扫描所有表格删除数据,在同时还有其他的内部调用流程,这样一来,数据删除的过程将会更慢,也无法删除更多的数据。现在我们讨论下如何进行分区,假设我们使用三个月的数据,按天分组多个分区,如:1,2,3,4,此时如我们只想保留最近一天的,就会删除1,2,3三张表分区,而不是扫描表 。这样一来,首先没有了性能的问题,第二就是更快了,并且释放了磁盘空间。而TimescaleDB就是时间序列的数据库,内部自动分区,TimescaleDB不是一个数据库引擎,而是一个积极SQL数据库的扩展。安装Zabbix官方docker有一个选项打开就可以支持TimescaleDB:# ENABLE_TIMESCALEDB=true 在4.2的版本中我在我的环境中实验了,一如既往,我会选择“Docker安装”(使用docker-compose),docker官方也提供了现有的docker容器,阅读zabbix文档和GitHub的仓库。为此,我在此前的我自己的github上提供了TimescaleDB数据源的安装方式,参阅此docker-compose,但是目前,目前,Zabbix-proxy不支持TimescaleDB。参考我的: github上的https://github.com/marksugar/zabbix-complete-works快速部署curl -Lk https://raw.githubusercontent.com/marksugar/zabbix-complete-works/master/zabbix_server/zabbix-install/install_zabbix_timescaledb.sh|bashtimescaledb在timescaledb中挂在数据目录到本地 - /data/zabbix/postgresql/data:/var/lib/postgresql/data:rw传递两个环境变量设置用户和密码 environment: - POSTGRES_USER=zabbix - POSTGRES_PASSWORD=abc123version: '3.5' services: timescaledb: image: timescale/timescaledb:latest-pg11-oss container_name: timescaledb restart: always network_mode: "host" volumes: - /etc/localtime:/etc/localtime:ro - /data/zabbix/postgresql/data:/var/lib/postgresql/data:rw user: root stop_grace_period: 1m environment: - POSTGRES_USER=zabbix - POSTGRES_PASSWORD=abc123 logging: driver: "json-file" options: max-size: "1G"zabbix使用pgsql镜像zabbix/zabbix-server-pgsql:alpine-4.2-latest zabbix/zabbix-web-nginx-pgsql:alpine-4.2-latestzabbix-server-pgsql在zabbix-server-pgsql环境变量中修改数据库链接 environment: - ENABLE_TIMESCALEDB=true - DB_SERVER_HOST=127.0.0.1 - POSTGRES_DB=zabbix - POSTGRES_USER=zabbix - POSTGRES_PASSWORD=abc123 并且开启HOUSEKEEPINGFREQUENCY - ZBX_HOUSEKEEPINGFREQUENCY=1 - ZBX_MAXHOUSEKEEPERDELETE=100000zabbix-web-nginx-pgsql在zabbix-web-nginx-pgsql的环境变量中也需要修改 environment: - DB_SERVER_HOST=127.0.0.1 - POSTGRES_DB=zabbix - POSTGRES_USER=zabbix - POSTGRES_PASSWORD=abc123 - ZBX_SERVER_HOST=127.0.0.1配置完成后,在web界面中默认已经启用了在文档中的提到配置,为了对历史和趋势使用分区管理,必须启用这些选项。可以仅对趋势(通过设置“ 覆盖项目趋势期”)或仅对历史记录(“ 覆盖项目历史记录期间”)使用TimescaleDB分区。Override item history period Override item trend period可以通过如下方式查看,在Administration → General → Housekeeping查看勾选的Override item history period和Override item trend period现在zabbix在TimescaleDB上运行,这样数据库在查询和提取的时候是有一定的好处,如:zabbix的housekeeping,在TimescaleDB之前,使用许多DELETE查询删除数据,这肯定会损害整体性能。现在使用TimescaleDB的分块表,过时的数据将作为一个整体进行转储,而性能负担则更少。测试假如此刻历史数据保留为1天,那么在数据库中,其他的数据将会被删除类似这样存储几天的数据[root@LinuxEA ~]# docker exec -it timescaledb bash bash-4.4# psql -U zabbix psql (11.2) Type "help" for help.zabbix=# \d+ history Table "public.history" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------------+-----------+----------+---------+---------+--------------+------------- itemid | bigint | | not null | | plain | | clock | integer | | not null | 0 | plain | | value | numeric(16,4) | | not null | 0.0000 | main | | ns | integer | | not null | 0 | plain | | Indexes: "history_1" btree (itemid, clock) "history_clock_idx" btree (clock DESC) Triggers: ts_insert_blocker BEFORE INSERT ON history FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker() Child tables: _timescaledb_internal._hyper_1_11_chunk, _timescaledb_internal._hyper_1_16_chunk, _timescaledb_internal._hyper_1_21_chunk, _timescaledb_internal._hyper_1_26_chunk, _timescaledb_internal._hyper_1_6_chunkzabbix=# \d+ trends Table "public.trends" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -----------+---------------+-----------+----------+---------+---------+--------------+------------- itemid | bigint | | not null | | plain | | clock | integer | | not null | 0 | plain | | num | integer | | not null | 0 | plain | | value_min | numeric(16,4) | | not null | 0.0000 | main | | value_avg | numeric(16,4) | | not null | 0.0000 | main | | value_max | numeric(16,4) | | not null | 0.0000 | main | | Indexes: "trends_pkey" PRIMARY KEY, btree (itemid, clock) "trends_clock_idx" btree (clock DESC) Triggers: ts_insert_blocker BEFORE INSERT ON trends FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker() Child tables: _timescaledb_internal._hyper_6_14_chunk, _timescaledb_internal._hyper_6_19_chunk, _timescaledb_internal._hyper_6_24_chunk, _timescaledb_internal._hyper_6_29_chunk, _timescaledb_internal._hyper_6_9_chunk我们修改history和trends为1天后进行清理试试看,我们现在即将进行删除操作,timescaledb中的数据看似是三天的,其实只有两天的数据量,包含一个最早一天的和当前一天的,以保留一天为例开始清理[root@LinuxEA ~]# docker exec -it zabbix-server-pgsql bash bash-4.4# zabbix_server -R config_cache_reload zabbix_server [260]: command sent successfully bash-4.4# zabbix_server -R housekeeper_execute zabbix_server [261]: command sent successfully在回到timescaledbzabbix=# \d+ history Table "public.history" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------------+-----------+----------+---------+---------+--------------+------------- itemid | bigint | | not null | | plain | | clock | integer | | not null | 0 | plain | | value | numeric(16,4) | | not null | 0.0000 | main | | ns | integer | | not null | 0 | plain | | Indexes: "history_1" btree (itemid, clock) "history_clock_idx" btree (clock DESC) Triggers: ts_insert_blocker BEFORE INSERT ON history FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker() Child tables: _timescaledb_internal._hyper_1_21_chunk, _timescaledb_internal._hyper_1_26_chunkzabbix=# \d+ trends Table "public.trends" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -----------+---------------+-----------+----------+---------+---------+--------------+------------- itemid | bigint | | not null | | plain | | clock | integer | | not null | 0 | plain | | num | integer | | not null | 0 | plain | | value_min | numeric(16,4) | | not null | 0.0000 | main | | value_avg | numeric(16,4) | | not null | 0.0000 | main | | value_max | numeric(16,4) | | not null | 0.0000 | main | | Indexes: "trends_pkey" PRIMARY KEY, btree (itemid, clock) "trends_clock_idx" btree (clock DESC) Triggers: ts_insert_blocker BEFORE INSERT ON trends FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker() Child tables: _timescaledb_internal._hyper_6_24_chunk, _timescaledb_internal._hyper_6_29_chunk为了看的更明显,我们在web查看自动发现参考自动发现基于zabbix4.2-zabbix-Discovery教程延伸阅读zabbix TimescaleDB阅读更多zabbix监控教程docker教程zabbix-complete-workslinuxea:Zabbix-complete-works之Zabbix基础安装配置
2019年05月07日
5,210 阅读
2 评论
0 点赞
2019-04-22
linuxea:Zabbix-complete-works之Zabbix基础安装配置
我花了一点时间整理了一套zabbix的安装脚本,便于部署和安装。它包括了zabbix-server,zabbix-agent的安装,初始化配置,在4.0之后加入了docker-compose,随后的server端都采用了docker安装。在最新的更新中,引入了elasticsearch:6.1.4。git地址:https://github.com/marksugar/zabbix-complete-works如果你喜欢这个项目,你可以在github上的zabbix-complete-works右上角点击 ♥ Star或者Fork.我使用了一套docker-compose来编排server端,对于zabbix-agent我使用脚本安装。docker和docker-compose安装参考-docker官网的安装方式 And - docker-compose安装参考docker-compose官网的安装方式先睹为快在zabbix最近的几个版本中的Graph绘图功能我非常讨喜,大致是这样的这样以来,我就可以在 一张图里面看到自定义一个组,或者一部分机器和某个items组成的一张图,这是非常有效的zabbix-server使用最新的4.2稳定版本,引入了elasticsearch,但是对于elasticsearch功能处于开发阶段,支持的版本有限,我这里使用的是6.1.4。我在此主要介绍zabbix的参数,因为这里使用的是docker,如果你要快速了解和安装,那么很有必要了解的。参考zabbix-complete-works项目的docker-compose.yaml文件zabbix/zabbix-server-mysql:alpine-4.2-latest根据以往的使用方式,将数据保存在本地 volumes: - /etc/localtime:/etc/localtime:ro - /etc/timezone:/etc/timezone:ro - /data/zabbix/zbx_env/usr/lib/zabbix/alertscripts:/usr/lib/zabbix/alertscripts:ro - /data/zabbix/zbx_env/usr/lib/zabbix/externalscripts:/usr/lib/zabbix/externalscripts:ro - /data/zabbix/zbx_env/var/lib/zabbix/modules:/var/lib/zabbix/modules:ro - /data/zabbix/zbx_env/var/lib/zabbix/enc:/var/lib/zabbix/enc:ro - /data/zabbix/zbx_env/var/lib/zabbix/ssh_keys:/var/lib/zabbix/ssh_keys:ro - /data/zabbix/zbx_env/var/lib/zabbix/mibs:/var/lib/zabbix/mibs:ro - /data/zabbix/zbx_env/var/lib/zabbix/snmptraps:/var/lib/zabbix/snmptraps:rw环境变量 environment: - DB_SERVER_HOST=127.0.0.1 - MYSQL_DATABASE=zabbix - MYSQL_USER=zabbix - MYSQL_PASSWORD=password - MYSQL_ROOT_PASSWORD=abc123 - ZBX_HISTORYSTORAGEURL=http://127.0.0.1:9200 # elasticsearch - ZBX_HISTORYSTORAGETYPES=dbl,uint,str,log,text # stor add elasticsearch type - DebugLevel=5 - HistoryStorageDateIndex=1 - ZBX_STARTDISCOVERERS=10这里的环境变量对应zabbix-server.conf的配置参数,只不过在前面加上了ZBX_而已注意1,这里提供了MYSQL_ROOT_PASSWORD是数据库root的密码。但这里提供了root密码后,zabbix-server会自动创建用户以及导入sql,请观察日志查看是否有报错产生。2,这里使用了elasticsearch,根据官网的文档,在server配置后,还需要修改web断的配置文件 - ZBX_HISTORYSTORAGEURL=http://127.0.0.1:9200 # elasticsearch - ZBX_HISTORYSTORAGETYPES=dbl,uint,str,log,textzabbix/zabbix-web-nginx-mysql:alpine-4.2-latest环境变量 environment: - DB_SERVER_HOST=127.0.0.1 - MYSQL_DATABASE=zabbix - MYSQL_USER=zabbix - MYSQL_PASSWORD=password - ZBX_SERVER_HOST=127.0.0.1 - ZBX_HISTORYSTORAGEURL=http://127.0.0.1:9200 - ZBX_HISTORYSTORAGETYPES=['dbl','uint','str', 'text', 'log'] # uint,dbl,str,log,text其中,提供在Zabbix-server环境变量中的密码,也就是web链接数据库的密码。而关于elasticsearch的配置需要和server相匹配。最终这里的变量会被替换到容器中的配置文件中。 - ZBX_HISTORYSTORAGEURL=http://127.0.0.1:9200 - ZBX_HISTORYSTORAGETYPES=['dbl','uint','str', 'text', 'log']快速安装mkdir /data/zabbix -p curl -Lk https://raw.githubusercontent.com/marksugar/zabbix-complete-works/master/zabbix_server/graphfont.TTF -o /data/zabbix/graphfont.ttf wget https://raw.githubusercontent.com/marksugar/zabbix-complete-works/master/zabbix_server/docker_zabbix_server/docker-compose.yaml docker-compose -f docker-compose.yaml up -d> elasticsearch你需要注意权限问题,如本示例docker-compose中需要授权: chown -R 1000.1000 /data/elasticsearch/我整理了索引文件,执行创建索引即可(创建索引尤为重要),你也可以参考官网文档正常情况下你将看到如下信息:$ curl http://127.0.0.1:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open str MQWM2bNNRzOvBywM7ne-lw 5 1 0 0 1.1kb 1.1kb yellow open .monitoring-es-6-2019.04.20 tIfs0MkNQUCI4YuEHRmQ6g 1 1 1926 208 901.6kb 901.6kb yellow open dbl Y0992hqaR8KTin9iXKsljQ 5 1 0 0 1.1kb 1.1kb yellow open text s2XMyJtdQQ27b9rS3nWVfg 5 1 0 0 1.1kb 1.1kb yellow open log MAysNczpSKGZbjfjJXBvTg 5 1 0 0 1.1kb 1.1kb yellow open uint JA_8kyXlSLqawyHzo28Ggw 5 1 0 0 1.1kb 1.1kbzabbix-agent快速部署curl -Lk https://raw.githubusercontent.com/marksugar/zabbix-complete-works/master/zabbix_agent/install-agentd.sh|bash -s local IPADDR 在zabbix-agent的附加脚本中监控的默认项体现如下:/root/.ssh/authorized_keys/etc/passwd/etc/zabbix/zabbix_agentd.confOOMiptables磁盘iotcpnginx和php-fpmmariadb-galera其中配置文件和脚本被打包在一个[zabbix_agent_status.tar.gz包中自动发现参考自动发现基于zabbix4.2 zabbix Discovery 教程阅读更多zabbix监控教程docker教程zabbix-complete-workslinuxea:zabbix4.2新功能之TimescaleDB数据源测试
2019年04月22日
5,720 阅读
1 评论
0 点赞
2018-10-26
linuxea:zabbix4.0通过slack发送警报
此前做了Telegram的警报发送,发现误报较多,尝试使用slack的方式,发现简洁明了,颇为好用。代码来自github,可参阅本章。此前的Telegram和slack都没有做警报收敛,在github之上有现成的警报收敛,感兴趣可以看看。zabbix配置下载slack.sh脚本,存放在/usr/lib/zabbix/alertscripts下[root@DT_Node-172_25_250_249 ~]# curl -Lk https://raw.githubusercontent.com/ericoc/zabbix-slack-alertscript/master/slack.sh -o /usr/lib/zabbix/alertscripts/slack.sh [root@DT_Node-172_25_250_249 /usr/lib/zabbix/alertscripts]# ll total 52 -rw-r--r-- 1 root root 1580 Oct 25 10:10 slack.sh打开配置AlertScriptsPath=/usr/lib/zabbix/alertscripts[root@DT_Node-172_25_250_249 /usr/lib/zabbix/alertscripts]# grep AlertScriptsPath /etc/zabbix/zabbix_server.conf ### Option: AlertScriptsPath # AlertScriptsPath=${datadir}/zabbix/alertscripts AlertScriptsPath=/usr/lib/zabbix/alertscripts [root@DT_Node-172_25_250_249 /usr/lib/zabbix/alertscripts]# slack创建一个频道,使用webhook打开slack创建频道在webhook页面选中创建的频道获取webhook url将URL写入到脚本中url='https://hooks.slack.com/services/TDP9T4YH4UDP/frkSC=' username='linuxea.com'命令行测试[root@DT_Node ~]# bash slack.sh '#linuxea-zabbix-monitor' PROBLEM '!' okzabbix web配置配置Medi types配置Action其中Default message简短为好配置Operations发送的用户媒介Resolved 也是如此而后发送的报警信息大致如下
2018年10月26日
4,227 阅读
0 评论
0 点赞
2018-10-25
linuxea:Zabbix4.0通过Telegram发送告警
zabbix 配置 Zabbix-in-Telegram加入你在香港或者其他地方,需要使用Telegram完成zabbix监控告警功能,你可以参考本章。如果在国内,推荐使用丁丁,或者微信,以及QQ等通讯工具。先决条件1,打开zabbix配置AlertScriptsPath=/usr/lib/zabbix/alertscripts2,申请Telegram机器人申请机器人参考: https://core.telegram.org/bots#creating-a-new-bot而后参考Zabbix-in-Telegram进行配置:https://github.com/ableev/Zabbix-in-Telegram配置Zabbix-in-Telegram克隆代码[root@Linuxea_Node ~]# git clone https://github.com/ableev/Zabbix-in-Telegram.git Cloning into 'Zabbix-in-Telegram'... remote: Enumerating objects: 9, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (9/9), done. remote: Total 474 (delta 3), reused 1 (delta 0), pack-reused 465 Receiving objects: 100% (474/474), 169.39 KiB | 182.00 KiB/s, done. Resolving deltas: 100% (269/269), done.安装pip[root@Linuxea_Node ~]# yum install python-pip安装requirements.txt文件中的依赖[root@Linuxea_Node ~]# cd Zabbix-in-Telegram/ [root@Linuxea_Node ~/Zabbix-in-Telegram]# pip install -r requirements.txt复制zbxtg.py zbxtg_settings.example.py zbxtg_group.py 到/usr/lib/zabbix/alertscripts/[root@Linuxea_Node ~/Zabbix-in-Telegram]# cp zbxtg.py /usr/lib/zabbix/alertscripts[root@Linuxea_Node ~/Zabbix-in-Telegram]# cp zbxtg_settings.example.py /usr/lib/zabbix/alertscripts/[root@Linuxea_Node ~/Zabbix-in-Telegram]# cp zbxtg_group.py /usr/lib/zabbix/alertscripts/而后编辑zbxtg_settings.py,主要修改三个配置信息,如下:tg_key = "KEY" # telegram bot api keyzbx_server = "http://www.linuxea.com/zabbix/" # zabbix server full url zbx_api_user = "Admin" zbx_api_pass = "zabbix"tg_key就是申请机器人时候生成的。zabbix的用户名密码必须是能够登录的,且有权限的,可以使用Admin你可以通过 ./zbxtg.py "group name And username" "test" --group进行测试(你必须先创建群组,而后将bot拉入群内)配置zabbix-server-web创建Media types创建必要的Media types创建用户创建用户为后面添加的部分,此前缺少的部分,由于环境不一样,截图有些不同。但是大致的步骤肯定是一样的创建组我们创建必要的用户来进行发送报警信息,为了方便,我们理应创建一个组Administratior-> User group -> Create user group 在user group中填写创建的名字而后在Permissions中选择读权限,而后在select中选择所有,而后点击Add添加组到Permissions,最后Add创建User group创建用户Administratior-> Users -> Create user 在user 中填写创建的名字。在groups中点击select,在弹出的对话框中选择刚创建的telegram_group组,如下图而后在Media中,点击Add在弹出的对话框中,在type中选择创建过的Media types。而send to在本章Telegram的案例中是指Telegram的群名(Zabbix-in-Telegram)。而后选择之发送Disaster的报警 创建 action登录到页面中在configuration->Actions->Triggers->Create action创建一个action而后在Action的New condition中选择Trigger severity 选择High 和Disaster当发生Disaster和High 就会触发这个动作在Operations中,填写触发后的message,内容如下Default subject:告警主机: {HOST.NAME}问题详情: {ITEM.NAME}:{ITEM.VALUE} 告警时间: {EVENT.DATE} {EVENT.TIME} 告警等级: {TRIGGER.SEVERITY} 告警信息: {TRIGGER.NAME} 告警项目: {TRIGGER.KEY1} 当前状态: {TRIGGER.STATUS}.{ITEM.VALUE} 事件ID: {EVENT.ID} zbxtg;graphs zbxtg;graphs_period=10800 zbxtg;itemid:{ITEM.ID1} zbxtg;title:{HOST.HOST} - {TRIGGER.NAME}而后添加用户权限和媒介,如下图Recovery operations中与Operations一样的操作Default subject:恢复主机: {HOST.NAME}问题详情: {ITEM.NAME}:{ITEM.VALUE} 恢复时间: {EVENT.DATE} {EVENT.TIME} 事件等级: {TRIGGER.SEVERITY} 恢复项目: {TRIGGER.KEY1} 当前状态: {TRIGGER.STATUS}.{ITEM.VALUE} 事件ID: {EVENT.ID} zbxtg;graphs zbxtg;graphs_period=10800 zbxtg;itemid:{ITEM.ID1} zbxtg;title:{HOST.HOST} - {TRIGGER.NAME}而后将机器人拉入到群内,模拟一次故障成功发送图到Telegram中
2018年10月25日
5,709 阅读
0 评论
0 点赞
2018-03-05
linuxea:Zabbix快速升级
以yum安装为例,直接卸载后重新安装即可,如果存在proxy层,也需要更新停止服务[root@DS-VM-Node114 ~]# systemctl stop zabbix-server.service [root@DS-VM-Node114 ~]# systemctl stop zabbix-agent.service [root@DS-VM-Node114 ~]# rpm -qa zabbix-server-mysql zabbix-server-mysql-3.2.6-1.el7.x86_64 [root@DS-VM-Node114 ~]# rpm -qa zabbix-web-mysql zabbix-web-mysql-3.2.6-1.el7.noarch [root@DS-VM-Node114 ~]# rpm -qa zabbix-release zabbix-release-3.2-1.el7.noarch卸载前进行配置备份[root@DS-VM-Node114 ~]# mkdir /bak && cp -r /etc/zabbix/* /bak开始卸载[root@DS-VM-Node114 ~]# yum remove zabbix-release-3.2-1.el7.noarch zabbix-web-mysql-3.2.6-1.el7.noarch zabbix-server-mysql-3.2.6-1.el7.x86_64 zabbix-agent.x86_64 zabbix-get.x86_64 下载安装包http://repo.zabbix.com/zabbix/3.4/rhel/7/x86_64/[root@DS-VM-Node114 ~]# rpm -vih http://repo.zabbix.com/zabbix/3.4/rhel/7/x86_64/zabbix-release-3.4-2.el7.noarch.rpm 获取http://repo.zabbix.com/zabbix/3.4/rhel/7/x86_64/zabbix-release-3.4-2.el7.noarch.rpm 准备中... ################################# [100%] 正在升级/安装... 1:zabbix-release-3.4-2.el7 ################################# [100%]开始安装[root@DS-VM-Node114 ~]# yum install zabbix-web-mysql zabbix-server-mysql zabbix-agent zabbix-get -y相比较之前的,多出一个 zabbix-web.noarch 0:3.4.5-1.el7 将配置文件复位[root@DS-VM-Node114 /etc/zabbix]# mv zabbix_server.conf zabbix_server.conf.bak [root@DS-VM-Node114 /etc/zabbix]# mv zabbix_server.conf.rpmsave zabbix_server.conf [root@DS-VM-Node114 /etc/zabbix]# mv zabbix_agentd.conf zabbix_agentd.conf.3.4 [root@DS-VM-Node114 /etc/zabbix]# mv zabbix_agentd.conf.rpmsave zabbix_agentd.conf systemctl start zabbix-server.service 启动启动即可,启动的同时看观察日志有update的日志 27104:20180108:143513.438 Starting Zabbix Server. Zabbix 3.4.5 (revision 76340). 27104:20180108:143513.438 ****** Enabled features ****** 27104:20180108:143513.438 SNMP monitoring: YES 27104:20180108:143513.438 IPMI monitoring: YES 27104:20180108:143513.438 Web monitoring: YES 27104:20180108:143513.438 VMware monitoring: YES 27104:20180108:143513.438 SMTP authentication: YES 27104:20180108:143513.438 Jabber notifications: YES 27104:20180108:143513.438 Ez Texting notifications: YES 27104:20180108:143513.438 ODBC: YES 27104:20180108:143513.438 SSH2 support: YES 27104:20180108:143513.438 IPv6 support: YES 27104:20180108:143513.438 TLS support: YES 27104:20180108:143513.438 ****************************** 27104:20180108:143513.438 using configuration file: /etc/zabbix/zabbix_server.conf 27104:20180108:143513.448 current database version (mandatory/optional): 03040000/03040005 27104:20180108:143513.449 required mandatory version: 03040000 27104:20180108:143513.461 slow query: 0.006521 sec, "select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.snmp_community,i.snmp_oid,i.port,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.ipmi_sensor,i.delay,i.trapper_hosts,i.logtimefmt,i.params,i.state,i.authtype,i.username,i.password,i.publickey,i.privatekey,i.flags,i.interfaceid,i.snmpv3_authprotocol,i.snmpv3_privprotocol,i.snmpv3_contextname,i.lastlogsize,i.mtime,i.history,i.trends,i.inventory_link,i.valuemapid,i.units,i.error,i.jmx_endpoint,i.master_itemid from items i,hosts h where i.hostid=h.hostid and h.status in (0,1) and i.flags<>2" 27104:20180108:143513.465 slow query: 0.004410 sec, "select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.step from item_preproc pp,items i,hosts h where pp.itemid=i.itemid and i.hostid=h.hostid and h.status in (0,1) and i.flags<>2 order by pp.itemid" 27104:20180108:143513.477 slow query: 0.005493 sec, "select i.itemid,f.functionid,f.function,f.parameter,t.triggerid from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0,1) and t.flags<>2" 27104:20180108:143513.491 slow query: 0.013620 sec, "select distinct t.triggerid,t.description,t.expression,t.error,t.priority,t.type,t.value,t.state,t.lastchange,t.status,t.recovery_mode,t.recovery_expression,t.correlation_mode,t.correlation_tag from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0,1) and t.flags<>2" 27104:20180108:143513.496 slow query: 0.004696 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)" 27104:20180108:143513.502 server #0 started [main process] 27126:20180108:143513.503 server #1 started [configuration syncer #1] 27127:20180108:143513.503 server #2 started [alerter #1] 27128:20180108:143513.503 server #3 started [alerter #2] 27129:20180108:143513.504 server #4 started [alerter #3] 27130:20180108:143513.504 server #5 started [housekeeper #1] 27131:20180108:143513.504 server #6 started [timer #1] 27132:20180108:143513.505 server #7 started [http poller #1] 27133:20180108:143513.505 server #8 started [discoverer #1] 27134:20180108:143513.506 server #9 started [history syncer #1] 27135:20180108:143513.506 server #10 started [history syncer #2] 27136:20180108:143513.506 server #11 started [history syncer #3] 27137:20180108:143513.507 server #12 started [history syncer #4] 27138:20180108:143513.507 server #13 started [escalator #1] 27139:20180108:143513.508 server #14 started [proxy poller #1] 27145:20180108:143513.510 server #20 started [poller #4] 27146:20180108:143513.510 server #21 started [poller #5] 27140:20180108:143513.523 server #15 started [self-monitoring #1] 27141:20180108:143513.523 server #16 started [task manager #1] 27142:20180108:143513.523 server #17 started [poller #1] 27150:20180108:143513.524 server #25 started [trapper #3] 27143:20180108:143513.525 server #18 started [poller #2] 27151:20180108:143513.526 server #26 started [trapper #4] 27152:20180108:143513.528 server #27 started [trapper #5] 27153:20180108:143513.530 server #28 started [icmp pinger #1] 27154:20180108:143513.530 server #29 started [alert manager #1] 27155:20180108:143513.530 server #30 started [preprocessing manager #1] 27144:20180108:143513.533 server #19 started [poller #3] 27148:20180108:143513.538 server #23 started [trapper #1] 27149:20180108:143513.540 server #24 started [trapper #2] 27147:20180108:143513.541 server #22 started [unreachable poller #1] 27133:20180108:143513.542 slow query: 0.001181 sec, "select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and r.nextcheck<=1515393313 and mod(r.druleid,1)=0" 27157:20180108:143514.116 server #32 started [preprocessing worker #2] 27158:20180108:143514.120 server #33 started [preprocessing worker #3] 27156:20180108:143514.131 server #31 started [preprocessing worker #1]
2018年03月05日
3,491 阅读
0 评论
0 点赞
2017-09-30
linuxea:Zabbix之Actions服务自愈
在之前的很多zabbix的文章中并未提起zabbix Actions,Actions动作完成自愈,在我看来需要对zabbix做规划后使用较妥Actions可以在指定的条件下触发操作,这些操作可以是一个脚本也可以是一段信息,并且可以指定机器进行发送通常使用于报警介质和运行其他脚本达到预期的目的示例当redis端口挂掉后进行重新启动redis创建一个items如下:这里使用zabbix-agent,type不能为active,action不支持创建actionconfiguration->create action->action参考:https://www.zabbix.com/documentation/3.2/manual/config/notifications/actionconfiguration->create action->action->operations参考:https://www.zabbix.com/documentation/3.2/manual/config/notifications/action/operationagent机器配置注释requiretty[root@linuxea-Node61 /data/rds]# sed -i 's/Defaults requiretty/#Defaults requiretty/g' /etc/sudoers && cat /etc/sudoers|grep Defaults 添加sudo文件[root@linuxea-Node61 /data/rds]# echo 'zabbix ALL=(root)NOPASSWD:/usr/bin/docker rm -f redis,/usr/local/bin/docker-compose -f /data/rds/docker-compose.yaml up -d' >> /etc/sudoers修改zabbix-agentd配置文件[root@linuxea-Node61 /data/rds]# sed -i 's/# EnableRemoteCommands=0/EnableRemoteCommands=1/g' /etc/zabbix/zabbix_agentd.conf [root@linuxea-Node61 /data/rds]# sed -i 's/# LogRemoteCommands=0/LogRemoteCommands=1/g' /etc/zabbix/zabbix_agentd.conf测试当redis 6379端口关闭zabbix server日志会抛出日志[root@linuxea-Node114 /var/log/zabbix]# grep sudo zabbix_server.log 5891:20170927:222228.830 slow query: 0.002530 sec, "insert into alerts (alertid,actionid,eventid,clock,message,status,error,esc_step,alerttype) values (13,7,44551,1506522148,'10.0.1.61:sudo /usr/bin/docker rm -f redis && sudo /usr/local/bin/docker-compose -f /data/rds/docker-compose.yaml up -d',1,'',1,1); zabbix-agent端也会有日志抛出,并且执行成功[root@linuxea-Node61 /data/rds]# tail -f /var/log/zabbix/zabbix_agentd.log -f 13953:20170927:221641.462 ************************** 13953:20170927:221641.462 using configuration file: /etc/zabbix/zabbix_agentd.conf 13953:20170927:221641.462 agent #0 started [main process] 13967:20170927:221641.462 agent #1 started [collector] 13968:20170927:221641.463 agent #2 started [listener #1] 13969:20170927:221641.463 agent #3 started [listener #2] 13974:20170927:221641.465 agent #5 started [active checks #1] 13973:20170927:221641.465 agent #4 started [listener #3] 13969:20170927:222416.166 Executing command 'sudo /usr/bin/docker rm -f redis && sudo /usr/local/bin/docker-compose -f /data/rds/docker-compose.yaml up -d' 报警消失
2017年09月30日
5,208 阅读
0 评论
0 点赞
2017-09-08
linuxea:zabbix-proxy3.0版本Yum快速安装
zabbix-proxy yum源安装方式。非常快速简洁,十分推荐,这里需要注意的是不要搞错下载的包和本地的系统,一定要一致,尽管在zabbix3.2后的改动还是挺大,下次在体验:下载地址centos6:http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/ centos7:http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/下载三个安装包即可,随后安装http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-agent-3.0.10-1.el7.x86_64.rpm http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-proxy-mysql-3.0.10-1.el7.x86_64.rpm http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-sender-3.0.10-1.el7.x86_64.rpm http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-server-mysql-3.0.10-1.el7.x86_64.rpm数据库配置这里使用docker,因为较少的机器curl -Lk https://raw.githubusercontent.com/LinuxEA-Mark/docker-mariaDB/master/alpine-maridb/create-alpine-mariadb.sh |bash授权GRANT ALL PRIVILEGES ON zabbix.* To 'zabbix'@'127.0.0.1' IDENTIFIED BY 'e367361714c9';安装路径[root@linuxea-Node5 /etc/zabbix]# rpm -ql zabbix-proxy-mysql/etc/logrotate.d/zabbix-proxy/etc/zabbix/zabbix_proxy.conf/usr/lib/systemd/system/zabbix-proxy.service/usr/lib/tmpfiles.d/zabbix-proxy.conf/usr/lib/zabbix/externalscripts/usr/sbin/zabbix_proxy_mysql/usr/share/doc/zabbix-proxy-mysql-3.0.10/usr/share/doc/zabbix-proxy-mysql-3.0.10/AUTHORS/usr/share/doc/zabbix-proxy-mysql-3.0.10/COPYING/usr/share/doc/zabbix-proxy-mysql-3.0.10/ChangeLog/usr/share/doc/zabbix-proxy-mysql-3.0.10/NEWS/usr/share/doc/zabbix-proxy-mysql-3.0.10/README/usr/share/doc/zabbix-proxy-mysql-3.0.10/schema.sql.gz/usr/share/man/man8/zabbix_proxy.8.gz导入sql语句[root@linuxea-Node5 /etc/zabbix]# cd /usr/share/doc/zabbix-proxy-mysql-3.0.10/ [root@linuxea-Node5 /usr/share/doc/zabbix-proxy-mysql-3.0.10]# gunzip schema.sql.gz [root@linuxea-Node5 /usr/share/doc/zabbix-proxy-mysql-3.0.10]# cp schema.sql /data/mariadb/数据库:MariaDB [zabbix]> source /data/mariadb/schema.sqlproxy配置文件[root@linuxea-Node60 ~]# egrep -v "^$|^#" /etc/zabbix/zabbix_proxy.conf Server=415.95.93.21 Hostname=Zabbix_Porxy-10 LogFile=/var/log/zabbix_proxy.log LogFileSize=0 PidFile=/tmp/zabbix_proxy.pid DBHost=127.0.0.1 DBName=zabbix_proxy DBUser=zabbix_proxy DBPassword=tsd3213123 DBPort=3306 Timeout=4 LogSlowQueries=3000 [root@linuxea-Node60 ~]# 在整个配置文件中,Server填写server的ip,Hostname需要和server中的Proxies中的Proxy name保持一致,如下但创建主机时,需要选中注意:agent的机器中server指向proxy ip并不是server ip在来查看:[root@linuxea-Node5 /usr/share/doc/zabbix-proxy-mysql-3.0.10]# tail -f /var/log/zabbix/zabbix_proxy.log 21119:20170812:150242.742 proxy #10 started [trapper #1] 21128:20170812:150242.746 proxy #19 started [history syncer #1] 21126:20170812:150242.746 proxy #17 started [http poller #1] 21127:20170812:150242.746 proxy #18 started [discoverer #1] 21117:20170812:150242.747 proxy #9 started [unreachable poller #1] 21108:20170812:150242.751 cannot send heartbeat message to server at "415.95.93.21": proxy "172.16.0.5" not found 21130:20170812:150242.752 proxy #21 started [history syncer #3] 21132:20170812:150242.752 proxy #23 started [self-monitoring #1] 21129:20170812:150242.752 proxy #20 started [history syncer #2] 21131:20170812:150242.752 proxy #22 started [history syncer #4] [root@linuxea-Node5 /usr/share/doc/zabbix-proxy-mysql-3.0.10]#
2017年09月08日
3,707 阅读
0 评论
0 点赞
2017-08-07
linuxea:Zabbix监控Galera Cluster集群和Master slave主从
Galera Cluster监控授权数据库GRANT SELECT ON *.* TO 'zabbix'@'127.0.0.1' IDENTIFIED BY '123';追加到配置和脚本echo "UserParameter=maria.db[*],/etc/zabbix/scripts/mariadb.sh \$1" >> /etc/zabbix/zabbix_agentd.confzabbixmy账号配置文件这里需要写一个配置文件,主要保存上面授权的账号和密码,通过脚本来调用,并不写在脚本中cat > /etc/zabbix/zabbixmy.conf << EOF [client] host=127.0.0.1 user=zabbix password='123' EOF脚本文件 cat /etc/zabbix/scripts/mariadb.sh脚本文件较长,主要包含了基础的状态健康和缓存监控,并没有其他,之前有写过FPMM监控,发现很多其实用不上,当有太多集群数据库的时候,itmes精简是非常必要的#/bin/bash DEF="--defaults-file=/etc/zabbix/zabbixmy.conf" MYSQL='/usr/local/mariadb/bin/mysqladmin' ARGS=1 if [ $# -ne "$ARGS" ];then echo "Please input one arguement:" fi case $1 in Com_update) result=`${MYSQL} $DEF extended-status |awk '/Com_update\W/{print $4}'` echo $result ;; Slow_queries) result=`${MYSQL} $DEF extended-status |awk '/Slow_queries/{print $4}'` echo $result ;; com_select) result=`${MYSQL} $DEF extended-status |awk '/Com_select\W/{print $4}'` echo $result ;; Com_insert) result=`${MYSQL} $DEF extended-status |awk '/Com_insert\W/{print $4}'` echo $result ;; Com_delete) result=`${MYSQL} $DEF extended-status |awk '/Com_delete\W/{print $4}'` echo $result ;; #查询的数量 Questions) result=`${MYSQL} $DEF status|awk '/Questions/{print $6}'` echo $result ;; #已经建立的链接 Threads_connected) result=`${MYSQL} $DEF "extended-status"|awk '/Threads_connected/{print $4}'` echo $result ;; #正在运行的连接 Threads_running) result=`${MYSQL} $DEF "extended-status"|awk '/Threads_running/{print $4}'` echo $result ;; #由于服务器内部本身导致的错误 Connection_errors_internal) result=`${MYSQL} $DEF "extended-status"|awk '/Connection_errors_internal/{print $4}'` echo $result ;; #尝试与服务器建立连接但是失败的次数 Aborted_connects) result=`${MYSQL} $DEF "extended-status"|awk '/Aborted_connects/{print $4}'` echo $result ;; #由于到达最大连接数导致的错误 Connection_errors_max_connections) result=`${MYSQL} $DEF "extended-status"|awk '/Connection_errors_max_connections/{print $4}'` echo $result ;; #Innodb_buffer读取缓存请求的数量 Innodb_buffer_pool_read_requests) result=`${MYSQL} $DEF "extended-status"|awk '/Innodb_buffer_pool_read_requests/{print $4}'` echo $result ;; #Innodb_buffer需要读取磁盘的请求数 Innodb_buffer_pool_reads) result=`${MYSQL} $DEF "extended-status"|awk '/Innodb_buffer_pool_reads/{print $4}'` echo $result ;; #Innodb_buffer BP中总页面数 Innodb_buffer_pool_pages_total) result=`${MYSQL} $DEF "extended-status"|awk '/Innodb_buffer_pool_pages_total/{print $4}'` echo $result ;; #Innodb_buffer空页数 Innodb_buffer_pool_pages_free) result=`${MYSQL} $DEF "extended-status"|awk '/Innodb_buffer_pool_pages_free/{print $4}'` echo $result ;; #wsrep_cluster_status集群状态 wsrep_cluster_status) result=`${MYSQL} $DEF "extended-status"|awk '/wsrep_cluster_status/{print $4}'` echo $result ;; #wsrep_cluster_size集群成员 wsrep_cluster_size) result=`${MYSQL} $DEF "extended-status"|awk '/wsrep_cluster_size/{print $4}'` echo $result ;; #wsrep_ready wsrep_ready) result=`${MYSQL} $DEF "extended-status"|awk '/wsrep_ready/{print $4}'` echo $result ;; #wsrep_local_recv_queue_avg平均请求队列长度 wsrep_local_recv_queue_avg) result=`${MYSQL} $DEF "extended-status"|awk '/wsrep_local_recv_queue_avg/{print $4}'` echo $result ;; #wsrep_local_send_queue_avg上次查询之后的平均发送队列长度 wsrep_local_send_queue_avg) result=`${MYSQL} $DEF "extended-status"|awk '/wsrep_local_send_queue_avg/{print $4}'` echo $result ;; mping) result=`${MYSQL} $DEF ping|grep -c alive` echo $result ;; *) echo "Usage:$0(Com_update|Slow_queries|Com_select|Com_insert|Com_delete|Questions|Threads_connected|Threads_running|Connection_errors_internal|Aborted_connects|Connection_errors_max_connections|Innodb_buffer_pool_read_requests|Innodb_buffer_pool_reads|Innodb_buffer|Innodb_buffer_pool_pages_free|wsrep_cluster_status|wsrep_cluster_size|wsrep_ready|wsrep_local_recv_queue_avg|wsrep_local_send_queue_avg|mping)" ;; esacitmesitems并没有截图,主要的项如下: Aborted_connects尝试与服务器建立连接但是失败的次数 maria.db[Aborted_connects] 30s 7d Com_delete删除_30s maria.db[Com_delete] 30s 7d Com_insert插入_30s maria.db[Com_insert] 30s 7d com_select查询_30s maria.db[com_select] 30s 7d Com_update修改_30s maria.db[Com_update] 30s 7d Connection_errors_internal由于服务器内部本身导致的错误 maria.db[Connection_errors_internal] 30s 7d Connection_errors_max_connections由于到达最大连接数导致的错误 maria.db[Connection_errors_max_connections] 30s 7d Innodb_buffer_pool_pages_free空页数 maria.db[Innodb_buffer_pool_pages_free] 30s 7d Innodb_buffer_pool_pages_totalBP中总页面数 maria.db[Innodb_buffer_pool_pages_total] 30s 7d Innodb_buffer_pool_reads需要读取磁盘的请求数 maria.db[Innodb_buffer_pool_reads] 30s 7d Innodb_buffer_pool_read_requests读取缓存请求的数量 maria.db[Innodb_buffer_pool_read_requests] 30s 7d mping maria.db[mping] 30s 7d Questions查询的数量_2m maria.db[Questions] 2m 7d Slow_queries_慢查询_5m maria.db[Slow_queries] 5m 7d Threads_connected已经建立的链接 maria.db[Threads_connected] 30s 7d Threads_running正在运行的连接 maria.db[Threads_running] 30s 7d wsrep_cluster_size集群成员 maria.db[wsrep_cluster_size] 30s 7d wsrep_cluster_status集群状态 maria.db[wsrep_cluster_status] 30s 7d wsrep_local_recv_queue_avg平均请求队列长度 maria.db[wsrep_local_recv_queue_avg] 30s 7d wsrep_local_send_queue_avg上次查询之后的平均发送队列长度 maria.db[wsrep_local_send_queue_avg] 30s 7d wsrep_ready maria.db[wsrep_ready] 30s 7dTriggers这里有报警的阈值 {HOST.NAME} Node is not ready {Mariadb_Customize_0607:maria.db[wsrep_ready].regexp(ON)}<>1 {HOST.NAME} mariadb is down! {Mariadb_Customize_0607:maria.db[mping].last()}=0 {HOST.NAME} Mariadb Cluster chenage {Mariadb_Customize_0607:maria.db[wsrep_cluster_size].last(1m)}<>3 {HOST.NAME} Innodb_buffer_pool缓存的命用率低于90% ({Mariadb_Customize_0607:maria.db[Innodb_buffer_pool_read_requests].last(0)}-{Mariadb_Customize_0607:maria.db[Innodb_buffer_pool_reads].last(0)})/{Mariadb_Customize_0607:maria.db[Innodb_buffer_pool_read_requests].last(0)}*100<95 {HOST.NAME} Innodb_buffer_pool缓存的使用率高于99% ({Mariadb_Customize_0607:maria.db[Innodb_buffer_pool_pages_total].last(0)}-{Mariadb_Customize_0607:maria.db[Innodb_buffer_pool_pages_free].last(0)})/{Mariadb_Customize_0607:maria.db[Innodb_buffer_pool_pages_total].last(0)}*100>99 {HOST.NAME} cluster_status no-Primary {Mariadb_Customize_0607:maria.db[wsrep_cluster_status].regexp(Primary)}<>1命中率低于%90,数据库如果没有使用也会报innodb_buffer_pool计算: Innodb_buffer_pool_read_requests记录了读取请求的数量,而Innodb_buffer_pool_reads记录了缓冲池无法满足,因而只能从磁盘读取的请求数量,也就是说,如果Innodb_buffer_pool_reads的值开始增加,意味着数据库性能大有问题。缓存的使用率和命中率可以通过如下方法计算:(Innodb_buffer_pool_pages_total - Innodb_buffer_pool_pages_free) / Innodb_buffer_pool_pages_total * 100% (Innodb_buffer_pool_read_requests - Innodb_buffer_pool_reads) / Innodb_buffer_pool_read_requests * 100%如果数据库从磁盘进行大量读取,而缓冲池还有许多闲置空间,这可能是因为缓存最近才清理过,还处于预热阶段。集群监控项目:1,mysql -e "show status;" |awk '/wsrep_cluster_status/{print $2}'|grep -c Primary2,wsrep_cluster_status显示集群里节点的主状态。标准返回primary。如返回non-Primary或其他值说明是多个节点改变导致的节点丢失或者脑裂。如果所有节点都返回不是Primary,则要重设quorum。具体参见http://galeracluster.com/documentation-webpages/quorumreset.html如果返回都正常,说明复制机制在每个节点都能正常工作,下一步该检查每个节点的状态确保他们都能收到write-setshow global status like 'wsrep_cluster_status'; +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_status | Primary | +----------------------+---------+2,mysql -e "show status;" |awk '/wsrep_cluster_size/{print $2}'wsrep_cluster_size显示了集群中节点的个数3,mysql -e "show status;" |awk '/wsrep_cluster_state_uuid/{print $2}'wsrep_cluster_conf_id显示了整个集群的变化次数。所有节点都应相同,否则说明某个节点与集群断开了节点状态:1,mysql -e "show status;" |awk '/wsrep_ready/{print $2}'|grep -c ONwsrep_ready显示了节点是否可以接受queries。ON表示正常,如果是OFF几乎所有的query都会报错,报错信息提示2, mysql -e "show status;" |awk '/wsrep_connected/{print $2}'|grep -c ONSHOW GLOBAL STATUS LIKE 'wsrep_connected’显示该节点是否与其他节点有网络连接。(实验得知,当把某节点的网卡down掉之后,该值仍为on。说明网络还在)丢失连接的问题可能在于配置wsrep_cluster_address或wsrep_cluster_name的错误3,mysql -e "show status;" |awk '/wsrep_local_state_comment/{print $2}'|grep -c Initializedwsrep_local_state_comment 以人能读懂的方式显示节点的状态,正常的返回值是Joining, Waiting on SST, Joined, Synced or Donor,返回Initialized说明已不在正常工作状态健康状态:1,mysql -e "show status;" |awk '/wsrep_local_recv_queue_avg/{print $2}'平均请求队列长度。当返回值大于0时,说明apply write-sets比收write-set慢,有等待。堆积太多可能导致启动flow control2,mysql -e "show status;" |awk '/wsrep_local_send_queue_avg/{print $2}'显示自上次查询之后的平均发送队列长度。比如网络瓶颈和flow control都可能是原因主从监控授权:Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 217076 Server version: 10.0.29-MariaDB-wsrep MariaDB Server, wsrep_25.16.rc3fc46e Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> GRANT replication client on *.* TO 'zabbix'@'127.0.0.1' IDENTIFIED BY 'password'; Query OK, 0 rows affected (0.02 sec) MariaDB [(none)]> exit Bye如果只是要监控主从线程,就需要这个文件### 用户文件 cat > /etc/zabbix/zabbixmy.conf << EOF [client] host=127.0.0.1 user=zabbix password='password' EOF调用脚本cat /etc/zabbix/scripts/IO_SQL.sh#/bin/bash DEF="--defaults-file=/etc/zabbix/zabbixmy.conf" MYSQL='/usr/local/mariadb/bin/mysql' ARGS=1 if [ $# -ne "$ARGS" ];then echo "Please input one arguement:" fi case $1 in Slave_IO_Running) result=`${MYSQL} $DEF -e "show slave status\G"|awk '/Slave_IO_Running/{print $2}'` echo $result ;; Slave_SQL_Running) result=`${MYSQL} $DEF -e "show slave status\G"|awk '/Slave_SQL_Running/{print $2}'` echo $result ;; *) echo "Usage:$0(Slave_SQL_Running|Slave_IO_Running)" ;; esac追加到zabbix_agentd.confecho "UserParameter=maria.IO_SQL[*],/etc/zabbix/scripts/IO_SQL.sh \$1" >> /etc/zabbix/zabbix_agentd.confitems如下:maria.IO_SQL[Slave_IO_Running] maria.IO_SQL[Slave_SQL_Running]Triggers 如下:{HOST.NAME} Mariadb Slave SQL Not Running {Mariadb_M-S_Thread:maria.IO_SQL[Slave_SQL_Running].regexp(ON)}<>1 Enabled {HOST.NAME} Mariadb Slave IO Not Running {Mariadb_M-S_Thread:maria.IO_SQL[Slave_IO_Running].regexp(ON)}<>1
2017年08月07日
4,586 阅读
0 评论
0 点赞
2017-08-01
linuxea:Zabbix监控基础模板优化
zabbix 优化基础监控系统资源,如:CPU,内存,网络,磁盘IO,tcp链接等,在这里有些容易被忽视的监控如防火墙,还有一些文件等CPU load优化1,linux修改CPU监控通常我们使用top查看CPU负载,在zabbix中,你看到的system.cpu.load[percpu,avg1]是单个核数的负载(The processor load is calculated as system CPU load divided by number of CPU cores.),那么在基础监控中,我们要修改报警阈值(Triggers),默认5,这里需要修改,修改如下:网上有很多假设,假设1核心的cpu运行在0.7以下为正常,运行超过0.7甚至更高为负载较高,性能会降低,默认是5参考:https://www.zabbix.com/documentation/3.2/pt/manual/config/items/itemtypes/zabbix_agent?s%5B%5D=zabbix&s%5B%5D=boottime#chaves_suportadas修改如下:平均cpu load已超过3已持续5分钟 {Template OS Linux:system.cpu.load[percpu,avg15].avg(5m)}>3 平均cpu load已超过1已持续5分钟 {Template OS Linux:system.cpu.load[percpu,avg15].avg(5m)}>1当然有必要有一个可以显示CPU Load Average的图,那就添加一个CPU的监控TOP可观察的样子,这里我们可以计算上面说的0.7load和超过0.7的load,当然,我们以15分钟时间段来做这个是原本的percpuProcessor load (15 min average per core) system.cpu.load[percpu,avg15] 克隆修改成如下:CPU Load Average (15 min average per core) system.cpu.load[all,avg15]我们还需要拿出CPU的内核来计算:cpu核心*0.7 < CPU Load Average到此,还是不够直观,在添加核数 system.cpu.num,这是一个内置的key,直接添加并且添加到图中CPU number system.cpu.numTriggers{HOST.NAME} 15分钟持续CPU负载较高 ({Template OS Linux:system.cpu.num.last()}*0.7)<{Template OS Linux:system.cpu.load[all,avg15].last()} {HOST.NAME} 15分钟持续负载超过核心数 ({Template OS Linux:system.cpu.num.last()}*1)<{Template OS Linux:system.cpu.load[all,avg15].last()}2,windows基础监控修改Window:CPU使用的百分比情况,添加key如下 perf_counter["\Processor(_Total)\% User Time"] 系统CPU使用时间百分比情况 perf_counter["\Processor(_Total)\% Processor Time"] 系统CPU使用负载百分比情况 system.cpu.util[,,avg1] 系统CPU平均1min的利用百分比I/O读写加入到图中,这些是存在的,只需要找到File write bytes per second perf_counter[\2\18] 磁盘写入量 (bytes) File read bytes per second perf_counter[\2\16] 磁盘读取量 (bytes) Average disk write queue length perf_counter[\234(_Total)\1404] 磁盘写入队列数 Average disk read queue length perf_counter[\234(_Total)\1402] 磁盘读取队列数Trigger:CPU使用负载百分比情况做一个监控项,但达到85%就告警perf_counter["\Processor(_Total)\% Processor Time"]如下:{HOST.NAME} 5分钟内CPU持续占用率超过85% {Template OS Windows:perf_counter["\Processor(_Total)\% Processor Time"].avg(5m)}>85 另外修改windows的Processor load (15 min average)的阈值{HOST.NAME} 平均CPU load 15分钟超过7 {Template OS Windows:system.cpu.load[percpu,avg15].avg(5m)}>7 {HOST.NAME} 平均CPU load 15分钟超过4 {Template OS Windows:system.cpu.load[percpu,avg15].avg(5m)}>4 {HOST.NAME} 平均CPU load 15分钟超过2 {Template OS Windows:system.cpu.load[percpu,avg15].avg(5m)}>2在奉上windows的安装包点击下载3,iptables监控除去本身的用户密码文件监控很有必要监控iptables和authorized_keys,以及配置文件监控假设你就安装在默认路径下。需要sudo的权限,注释Defaults requiretty,并且重启echo "UserParameter=iptables_lins,/usr/bin/sudo iptables -S |md5sum|awk '{print \$1}'" >> /etc/zabbix/zabbix_agentd.conf echo 'zabbix ALL=(root)NOPASSWD:/usr/sbin/iptables,/usr/bin/cksum /etc/sysconfig/iptables' >>/etc/sudoers echo 'UserParameter=iptables_file,/usr/bin/sudo /usr/bin/cksum /etc/sysconfig/iptables'|awk '{print \$1}' >>/etc/zabbix/zabbix_agentd.conf sed -i 's/Defaults requiretty/#Defaults requiretty/g' /etc/sudoers && cat /etc/sudoers|grep Defaults systemctl restart zabbix-agentitemsiptables临时插入监控 iptables_lins iptables配置文件发生改变 iptables_fileTriggersiptables临时表发生变化 {Templates IPtables:iptables_lins.diff(0)}>0 iptables配置文件发生改变 {Templates IPtables:iptables_file.diff(0)}>0 3,authorized_keysauthorized_keys也需要做一些有必要的监控,并且加锁echo 'zabbix ALL=(root)NOPASSWD:/usr/bin/cksum /root/.ssh/authorized_keys' >>/etc/sudoers echo "UserParameter=authorized_keys,sudo /usr/bin/cksum /root/.ssh/authorized_keys|awk '{print \$1}'" >> /etc/zabbix/zabbix_agentd.conf systemctl restart zabbix-agent.service加锁chattr -R +i /root/.ssh/ 锁定 chattr -R -i /root/.ssh/ 去锁itemsChecksum of authorized_keys authorized_keysTriggers/root/.ssh/authorized_keys has been changed on {HOST.NAME} {Templates_authorized_keys:authorized_keys.diff(0)}>0 4,zabbix配置文件监控Checksum of /etc/zabbix/zabbix_agentd.conf vfs.file.cksum[/etc/zabbix/zabbix_agentd.conf]zabbix配置文件不需要sudo权限,直接添加keyTriggers/etc/zabbix/zabbix_agentd.conf has been changed on {HOST.NAME} {Template OS Linux:vfs.file.cksum[/etc/zabbix/zabbix_agentd.conf].diff(0)}>0加锁chattr -R +i /etc/zabbix/zabbix_agentd.conf 锁定 chattr -R -i /etc/zabbix/zabbix_agentd.conf 去锁这里有个前提是你需要将UserParameter写到这个文件,否则就锁定目录关于CPU负载,这里有篇文章描述的很到位,点击查看
2017年08月01日
7,620 阅读
3 评论
0 点赞
1
2
...
6