linuxea:MHA0.56配置实践


MHA专门用作多个MYSQL服务器(主从环境)提供master故障后转移,并保证数据一致性,由于MHA本身使用的资源很少,甚至于管理100对以上的。
MHA分为manager和node,manager可以运行在node节点之上,manager负责监控和故障转移,其中包含一些脚本配置,如:发送邮件等。

  • 可以配置为如下:
    一主多从:一个主节点,多个slave
    单主多从:一个主节点,多个从节点,从节点不接替主节点
    单主多从,一个候选节点:一个主节点,多个slave,其中一个slave为候选主节点
    以及多主,三层复制等

一般情况下,我们使用一个主,多个从节点,并且从节点中有一个节点是候选主节点,当主节点down掉后,候选节点便成为主节点对外提供服务。这个场景中会分配一个VIP
(虚拟IP),主崩溃后候选节点接管,在MHA中可以通过配置文件参数调用外部脚本设置,也可以手动更新,更或者使用Keepalived来执行。

当然,如果读并不大,使用一主一从更为稳妥,因为数据一致性得到了保障,但是同时牺牲了读性能和扩展空间,并且可用性大大降低

在MHA场景中使用半同步复制大大降低了这种数据丢失的风险。通常MHA会试图从故障的主设备保存二进制文件,但是有些时候不总是成功,如果出现SSH链接不通的问题,则二进制文件会出现问题,将有丢失最新数据的分享

MHA是可以使用半同步复制,它基于MySQL复制机制。值得注意的是,如果只有一个从站接收到最新的二进制日志事件,MHA可以将这些事件应用到所有其他从站,以便它们可以保持一致。

  • 优点:
    主节点故障切换和从节点升级可以快速完成
    主站崩溃不会导致数据不一致()
    不用修改mysql配置
    性能有一定的提升
    适用场景多
    自定义扩展(脚本)
    基于GTID的故障转移

  • 要求:
    1,SSH公钥认证
    Manager通过SSH内部连接到MySQL服务器。最新从站上的MHA节点也通过SSH(scp)在内部将中继日志文件发送到其他从站(如果大于10个主机, 修改/etc/ssh/ sshd_config,默认为10)。为了使这些程序自动化,SSH公钥认证必不可少。而后使用MHA Manager中包含的masterha_check_ssh命令来检查SSH连接是否正常工作。
    2,MHA支持5.0或者以上的版本
    3,主节点必须启用log-bin
    4,备用写库需要授权
    5,禁用中继日志自动删除
    6,仅支持mysql
    为了使MHA真正起作用,大多数设置在启动masterha_manager或masterha_check_repl时自动检查。

结构如下:

前端使用LVS(DR)+keepalived代理端口到中间件Cetus,Cetus是网易的开源项目(后面会说cetus和maxscale),LVS会将请求发送到两台Cetus,由Cetus发送到后端数据库,后端使用MHA,本篇关注红色区域内容即可

篇幅太长,如果有误请指正。所有的内容来自https://github.com/yoshinorim/mha4mysql-manager/wiki

MHA

+-----------------+----------------+------------------+------------+
+   IP    +      类型       +       角色        +     OS     +
+-----------------+----------------+------------------+------------+
+    10.0.1.61    +      MHA       + MHA manager(0.56)+  Centos7.2 +
+-----------------+----------------+------------------+------------+
+                        VIP 10.0.1.161                            +
+-----------------+----------------+------------------+------------+
+  10.10.240.117  +  Mysql-Master  +     MHA NODE     +  Centos7.2 +
+-----------------+----------------+------------------+------------+
+   10.10.0.98    +  Mysql-Slave   +     MHA NODE     +  Centos7.2 +
+-----------------+----------------+------------------+------------+
+   10.0.1.49     +  Mysql-Slave   +     MHA NODE     +  Centos7.2 +
+-----------------+----------------+------------------+------------+
  • 一主多从环境准备

1.1 免密码登陆

+----------------------------+
+ 10.10.240.117 master       +
+----------------------------+
+ ssh-keygen -t rsa          +
+ ssh-copy-id 10.10.0.98     +
+ ssh-copy-id 10.0.1.49      +
+ ssh-copy-id 10.0.1.61      +
+----------------------------+

+----------------------------+
+     10.10.0.98  slave      +
+----------------------------+
+ ssh-keygen -t rsa          +
+ ssh-copy-id 10.10.240.117  +
+ ssh-copy-id 10.0.1.49      +
+ ssh-copy-id 10.0.1.61      +
+----------------------------+

+----------------------------+
+   10.0.1.49   slave        +
+----------------------------+
+ ssh-keygen -t rsa          +
+ ssh-copy-id 10.10.240.117  +
+ ssh-copy-id 10.10.0.98     +
+ ssh-copy-id 10.0.1.61      +
+----------------------------+

+----------------------------+
+     10.0.1.61  MHA         +
+----------------------------+
+ ssh-keygen -t rsa          +
+ ssh-copy-id 10.10.240.117  +
+ ssh-copy-id 10.10.0.98     +
+ ssh-copy-id 10.0.1.49      +
+----------------------------+

1.2 三台机器安装好Mysql

[root@Linuxea-VM-Node117 ~]# yum -y install make automake libtool pkgconfig libaio-devel mysql-devel
[root@Linuxea-VM-Node117 ~]# axel -n 50 http://ftp.ntu.edu.tw/MySQL/DownloaLinuxea/MySQL-5.6/mysql-5.6.40-linux-glibc2.12-x86_64.tar.gz
[root@Linuxea-VM-Node117 ~]# tar xf mysql-5.6.40-linux-glibc2.12-x86_64.tar.gz -C /usr/local
[root@Linuxea-VM-Node117 ~]# mkdir /data/mysql -p
[root@Linuxea-VM-Node117 ~]# groupadd -r -g 306 mysql
[root@Linuxea-VM-Node117 ~]# useradd -g 306  -r  -u 307 mysql 
[root@Linuxea-VM-Node117 ~]# chown -R mysql.mysql /data/mysql/
[root@Linuxea-VM-Node117 ~]# cd /usr/local && ln -s mysql-5.6.40-linux-glibc2.12-x86_64 mysql && cd mysql
[root@Linuxea-VM-Node117 /usr/local/mysql]# chown -R mysql.mysql /usr/local/mysql
[root@Linuxea-VM-Node117 /usr/local/mysql]# ./scripts/mysql_install_db --user=mysql --datadir=/data/mysql --basedir=/usr/local/mysql
[root@Linuxea-VM-Node117 /usr/local/mysql]# echo 'export PATH=/usr/local/mysql/bin:$PATH' > /etc/profile.d/mysql.sh && source /etc/profile.d/mysql.sh
[root@Linuxea-VM-Node117 /usr/local/mysql]# cp support-files/mysql.server /etc/init.d/mysqld 

其中配置文件之一,10.10.240.117 master

[root@Linuxea-VM-Node117 /etc]# cat /etc/my.cnf
[client]
port = 3306
socket = /tmp/mysql.sock
#default-character-set = utf8

[mysqld]
relay_log = relay-bin
basedir = /usr/local/mysql
datadir = /data/mysql
pid-file = /data/mysql/mysql.pid
user = mysql
bind-address = 0.0.0.0
server-id = 2

init-connect = 'SET NAMES utf8'
character-set-server = utf8
default-time_zone = '-4:00'

skip-name-resolve
back_log = 300

max_connections = 999
max_connect_errors = 1024000
open_files_limit = 65535
table_open_cache = 256
max_allowed_packet = 50M
max_heap_table_size = 512M
tmp_table_size = 32M

read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
key_buffer_size = 16M

thread_cache_size = 16

query_cache_type = 2
query_cache_size = 16M
query_cache_limit = 5M
thread_stack = 192k

ft_min_word_len = 4

log_bin = mysql-bin
binlog_format = ROW
expire_logs_days = 7

log_error = /data/mysql/mysql-error.log
slow_query_log = 1
long_query_time = 1
#log_slow_verbosity=query_plan
slow_query_log_file = /data/mysql/mysql-slow.log

performance_schema = 0
skip-external-locking #跳过外部锁定,避免external locking

###myisam###
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 16M
myisam_max_sort_file_size = 1G
myisam_repair_threaLinuxea = 1

###InnoDB###
default_storage_engine = InnoDB
innodb_file_per_table = 1
innodb_open_files = 500
innodb_buffer_pool_size = 2G
innodb_write_io_threaLinuxea = 8
innodb_read_io_threaLinuxea = 8
innodb_thread_concurrency = 16
innodb_purge_threaLinuxea = 1
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 10
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
innodb_rollback_on_timeout = 1
innodb_additional_mem_pool_size = 16M
innodb_force_recovery=0
innodb_locks_unsafe_for_binlog = 1

interactive_timeout = 28800
wait_timeout = 120


[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer_size = 16M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M

[mysqld_safe]
malloc_lib=/usr/lib64/libjemalloc.so.1

1.3 数据库初始化

DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.db WHERE Db LIKE 'test%';
DROP DATABASE test;
UPDATE mysql.user SET password = password('linuxea') WHERE user = 'root';
flush privileges;

1.4 在master上创建用户进行同步

10.10.240.117 master
修改添加

server-id = 1 # 主从中,这里的ID号不能一样
log_bin = master-bin
binlog_format = ROW
read-only = on  

1.5 创建复制的用户

mysql>  grant replication slave on *.* to slave@'%' identified by "password";
Query OK, 0 rows affected (0.00 sec)

主节点的logs位置

mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000001 |      2510 |
+------------------+-----------+
1 row in set (0.00 sec)

查看中继日志打开状态

mysql> SHOW GLOBAL VARIABLES LIKE '%only';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| innodb_optimize_fulltext_only | OFF   |
| innodb_read_only              | OFF   |
| read_only                     | ON    |
| tx_read_only                  | OFF   |
+-------------------------------+-------+
4 rows in set (0.00 sec)
  • 开始主从同步

2 在slave上进行同步

10.10.0.98 slave

可以使用SHOW GLOBAL VARIABLES LIKE '%only';查看状态,如果没有开启则需要开启

mysql> SHOW GLOBAL VARIABLES LIKE '%only';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| innodb_optimize_fulltext_only | OFF   |
| innodb_read_only              | OFF   |
| read_only                     | ON    |
| tx_read_only                  | OFF   |
+-------------------------------+-------+
4 rows in set (0.00 sec)

打开中继日志和修改id

server-id = 2  # 主从中,这里的ID号不能一样
relay-log = relay-bin 

2.1 开始复制

开始复制的配置,指明开始的log和点

mysql> CHANGE MASTER TO MASTER_HOST='10.10.240.117',MASTER_USER='slave',MASTER_PASSWORD='password',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=2788,MASTER_CONNECT_RETRY=5,MASTER_HEARTBEAT_PERIOD=2;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

打开slave io进程和sql进程

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.240.117
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 5
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 2788
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

mysql> 

重点关注这两个状态先

             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

2.2 在第二台从库授权(10.0.1.49)

10.0.1.49 slave 同样执行和10.10.0.98一样的操作

  • 半复制状态

3.半复制

在上面已经配置好主从了,现在开启半复制

mysql> show global variables like 'have_dynamic_loading';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| have_dynamic_loading | YES   |
+----------------------+-------+
1 row in set (0.00 sec)

在所有的主从节点安装插件(master/slave)

mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.10 sec)

mysql>  INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)

状态信息

mysql>  SHOW GLOBAL VARIABLES LIKE '%semi%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | OFF   |
| rpl_semi_sync_master_timeout       | 10000 |  # 等待响应时间
| rpl_semi_sync_master_trace_level   | 32    |  # 跟踪级别
| rpl_semi_sync_master_wait_no_slave | ON    |
| rpl_semi_sync_slave_enabled        | OFF   |
| rpl_semi_sync_slave_trace_level    | 32    |
+------------------------------------+-------+
6 rows in set (0.00 sec)

3.1 主节点开启同步(master)配置

SET GLOBAL rpl_semi_sync_master_enabled=1; 表示在 master 上已经开启半同步复制模式。

mysql>  SET GLOBAL rpl_semi_sync_master_enabled=1;
Query OK, 0 rows affected (0.00 sec)

mysql>  SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL rpl_semi_sync_master_timeout =4000;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW GLOBAL VARIABLES LIKE '%semi%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | ON    |
| rpl_semi_sync_master_timeout       | 4000  |
| rpl_semi_sync_master_trace_level   | 32    |
| rpl_semi_sync_master_wait_no_slave | ON    |
| rpl_semi_sync_slave_enabled        | ON    |
| rpl_semi_sync_slave_trace_level    | 32    |
+------------------------------------+-------+
6 rows in set (0.00 sec)

mysql> 

3.2 从库半同步(slave)配置

mysql> show variables like "rpl_semi_sync_slave_enabled";
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| rpl_semi_sync_slave_enabled | OFF   |
+-----------------------------+-------+
1 row in set (0.00 sec)

3.3 需要一次停止和开启IO_THREAD

mysql> STOP SLAVE IO_THREAD;
Query OK, 0 rows affected (0.25 sec)

mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> START SLAVE IO_THREAD;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like "rpl_semi_sync_slave_enabled";
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| rpl_semi_sync_slave_enabled | ON    |
+-----------------------------+-------+
1 row in set (0.00 sec)

3.4 在查看主节点状态

mysql> show global status like '%semi%';
+--------------------------------------------+---------+
| Variable_name                              | Value   |
+--------------------------------------------+---------+
| Rpl_semi_sync_master_clients               | 2       |
| Rpl_semi_sync_master_net_avg_wait_time     | 472     |
| Rpl_semi_sync_master_net_wait_time         | 1972924 |
| Rpl_semi_sync_master_net_waits             | 4174    |
| Rpl_semi_sync_master_no_times              | 0       |
| Rpl_semi_sync_master_no_tx                 | 0       |
| Rpl_semi_sync_master_status                | ON      |
| Rpl_semi_sync_master_timefunc_failures     | 0       |
| Rpl_semi_sync_master_tx_avg_wait_time      | 573     |
| Rpl_semi_sync_master_tx_wait_time          | 1578871 |
| Rpl_semi_sync_master_tx_waits              | 2752    |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0       |
| Rpl_semi_sync_master_wait_sessions         | 0       |
| Rpl_semi_sync_master_yes_tx                | 2752    |
| Rpl_semi_sync_slave_status                 | OFF     |   # 主节点上这个是关闭的
+--------------------------------------------+---------+
15 rows in set (0.00 sec)

mysql> 

3.5 在查看从节点状态

mysql>  show global status like 'Rpl_semi_sync_%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 0     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | OFF   |  # 从节点上这个也是关闭的
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
| Rpl_semi_sync_slave_status                 | ON    |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)

mysql> 
  • 注意
Rpl_semi_sync_master_yes_tx     Semi-sync模式下,成功的事务数
Rpl_semi_sync_master_no_tx      库未及时响应的事务数,如果这个值很大就有问题
  • 写入配置文件
    写可以写入到配置文件中永久生效
    Master上:
[mysqld]
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=10000  # 10秒(默认)

所有Slave上:

[mysqld]
rpl_semi_sync_slave_enabled=1
  • 注意:两台slave服务器设置read_only(由mha设置原master备机为read_only=0 ,从库对外提供读服务,只所以没有写进配置文件,是因为随时slave会提升为master)
mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)
  • 4,MHA安装
    在10.0.1.61 mha-manager安装

4.1 mha4mysql-manager安装

[root@Linuxea-VM-Node61 ~]# yum install perl-DBD-MySQL -y
[root@Linuxea-VM-Node61 ~]# rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@Linuxea-VM-Node61 ~]# yum -y install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-devel  \
perl-Module-Install.noarch perl cpan perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager \ 
perl-Time-HiRes net-tools
[root@Linuxea-VM-Node61 ~]# wget https://downloads.mariadb.com/MHA/mha4mysql-manager-0.56.tar.gz
[root@Linuxea-VM-Node61 ~]# tar xf mha4mysql-manager-0.56.tar.gz
[root@Linuxea-VM-Node61 ~]# cd mha4mysql-manager-0.56/

安装便是一路回车和yes即可

[root@Linuxea-VM-Node61 ~/mha4mysql-manager-0.56]#  perl Makefile.PL 
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.627)
- DBD::mysql            ...loaded. (4.023)
- Time::HiRes           ...loaded. (1.9725)
- Config::Tiny          ...loaded. (2.14)
- Log::Dispatch         ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.18)
- MHA::NodeConst        ...missing.
==> Auto-install the 1 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...

Would you like to configure as much as possible automatically? [yes] yes
Would you like me to automatically choose some CPAN mirror
sites for you? (This means connecting to the Internet) [yes] yes
Looks good
Warning: prerequisite MHA::NodeConst 0 not found.
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json
[root@Linuxea-VM-Node61 ~/mha4mysql-manager-0.56]# make
[root@Linuxea-VM-Node61 ~/mha4mysql-manager-0.56]# make install

4.2 mha4mysql-node安装(所有MHA NODE节点都需要安装,包括MHA管理节点)

rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-devel \
perl-Module-Install.noarch perl cpan perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager \
perl-Time-HiRes net-tools               
wget https://downloaLinuxea.mariadb.com/MHA/mha4mysql-node-0.56.tar.gz
tar xf mha4mysql-node-0.56.tar.gz 
cd mha4mysql-node-0.56/
perl Makefile.PL
make 
make install 

4.3 所有机器授权mha账户

创建mha用户

grant all privileges on *.* to mha@'%' identified by 'password_123';

4.4 管理配置文件配置(MHA 主机10.0.1.61 )

创建工作目录

mkdir /data/masterha/logs -p 

配置文件如下
vim /etc/masterha_default.cnf

[server default]
# 授权的账户和密码
user=mha
password=password_123
# 工作目录和日志文件为孩子
manager_workdir=/data/masterha
manager_log=/data/masterha/manager.log
# bing-log的保存位置
master_binlog_dir=/data/mysql
#设置远端mysql在发生切换时binlog的保存位置
remote_workdir=/data/masterha
# 免密钥登录的账户名
ssh_user=root
# 授权的数据库复制的账户和密码
repl_user=slave
repl_password=password
#设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
ping_interval=1
#---------------------------------------------------------------------------------------------------
# 设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
# shutdown_script=""
# 设置手动切换时候的切换脚本(可省略)
# master_ip_online_change_script=/data/masterha/master_ip_failover
# 设置发生切换后发送的报警的脚本(可省略)
# report_script=/data/masterha/send_report
# 设置自动failover时候的切换脚本(可省略)
# master_ip_failover_script=/data/masterha/master_ip_failover
#----------------------------------------------------------------------------------------------------
# secondary_check_script= /usr/local/bin/masterha_secondary_check -s server1 -s server2 --master_host=server1 --master_ip=10.10.240.117 --master6

[server1]
hostname=10.10.240.117
#设置master 保存binlog
master_binlog_dir=/data/mysql
#设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中时间最新的slave
candidate_master=1
#relay_log_purge=0
port=3306
#-------------------------check_repl_delay=0--------------------------------
check_repl_delay=0
#默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该
#slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间通过设置
# check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时
# 这个参数对于设置了candidate_master=1的主机非常有用
# 因为这个候选主在切换的过程中一定是新的master
#---------------------------------------------------------------------------

[server2]
hostname=10.10.0.98
port=3306
master_binlog_dir=/data/mysql
candidate_master=1
check_repl_delay=0

[server3]
hostname=10.0.1.49
port=3306
master_binlog_dir=/data/mysql
no_master=1

4.5验证互信

[root@Linuxea-VM-Node146 ~/mha4mysql-node-0.56]# masterha_check_ssh --conf=/etc/masterha_default.cnf
Sat May 26 13:29:31 2018 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sat May 26 13:29:31 2018 - [info] Reading application default configurations from /etc/masterha_default.cnf..
Sat May 26 13:29:31 2018 - [info] Reading server configurations from /etc/masterha_default.cnf..
Sat May 26 13:29:31 2018 - [info] Starting SSH connection tests..
Sat May 26 13:29:31 2018 - [debug] 
Sat May 26 13:29:31 2018 - [debug]  Connecting via SSH from root@10.10.240.117(10.10.240.117:22) to root@10.10.0.98(10.10.0.98:22)..
Sat May 26 13:29:31 2018 - [debug]   ok.
Sat May 26 13:29:31 2018 - [debug]  Connecting via SSH from root@10.10.240.117(10.10.240.117:22) to root@10.0.1.49(10.0.1.49:22)..
Sat May 26 13:29:31 2018 - [debug]   ok.
Sat May 26 13:29:32 2018 - [debug] 
Sat May 26 13:29:31 2018 - [debug]  Connecting via SSH from root@10.10.0.98(10.10.0.98:22) to root@10.10.240.117(10.10.240.117:22)..
Sat May 26 13:29:32 2018 - [debug]   ok.
Sat May 26 13:29:32 2018 - [debug]  Connecting via SSH from root@10.10.0.98(10.10.0.98:22) to root@10.0.1.49(10.0.1.49:22)..
Sat May 26 13:29:32 2018 - [debug]   ok.
Sat May 26 13:29:33 2018 - [debug] 
Sat May 26 13:29:32 2018 - [debug]  Connecting via SSH from root@10.0.1.49(10.0.1.49:22) to root@10.10.240.117(10.10.240.117:22)..
Sat May 26 13:29:32 2018 - [debug]   ok.
Sat May 26 13:29:32 2018 - [debug]  Connecting via SSH from root@10.0.1.49(10.0.1.49:22) to root@10.10.0.98(10.10.0.98:22)..
Sat May 26 13:29:32 2018 - [debug]   ok.
Sat May 26 13:29:33 2018 - [info] All SSH connection tests passed successfully.

4.6 检查配置

[root@Linuxea-VM-Node61 ~]# masterha_check_repl --conf=/etc/masterha_default.cnf
Sat May 26 14:31:39 2018 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sat May 26 14:31:39 2018 - [info] Reading application default configurations from /etc/masterha_default.cnf..
Sat May 26 14:31:39 2018 - [info] Reading server configurations from /etc/masterha_default.cnf..
Sat May 26 14:31:39 2018 - [info] MHA::MasterMonitor version 0.56.
Sat May 26 14:31:40 2018 - [info] Dead Servers:
Sat May 26 14:31:40 2018 - [info] Alive Servers:
Sat May 26 14:31:40 2018 - [info]   10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info]   10.10.0.98(10.10.0.98:3306)
Sat May 26 14:31:40 2018 - [info]   10.0.1.49(10.0.1.49:3306)
Sat May 26 14:31:40 2018 - [info] Alive Slaves:
Sat May 26 14:31:40 2018 - [info]   10.10.0.98(10.10.0.98:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sat May 26 14:31:40 2018 - [info]     Replicating from 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 26 14:31:40 2018 - [info]   10.0.1.49(10.0.1.49:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sat May 26 14:31:40 2018 - [info]     Replicating from 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info]     Not candidate for the new Master (no_master is set)
Sat May 26 14:31:40 2018 - [info] Current Alive Master: 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info] Checking slave configurations..
Sat May 26 14:31:40 2018 - [warning]  relay_log_purge=0 is not set on slave 10.10.0.98(10.10.0.98:3306).
Sat May 26 14:31:40 2018 - [warning]  relay_log_purge=0 is not set on slave 10.0.1.49(10.0.1.49:3306).
Sat May 26 14:31:40 2018 - [info] Checking replication filtering settings..
Sat May 26 14:31:40 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat May 26 14:31:40 2018 - [info]  Replication filtering check ok.
Sat May 26 14:31:40 2018 - [info] Starting SSH connection tests..
Sat May 26 14:31:42 2018 - [info] All SSH connection tests passed successfully.
Sat May 26 14:31:42 2018 - [info] Checking MHA Node version..
Sat May 26 14:31:42 2018 - [info]  Version check ok.
Sat May 26 14:31:42 2018 - [info] Checking SSH publickey authentication settings on the current master..
Sat May 26 14:31:42 2018 - [info] HealthCheck: SSH to 10.10.240.117 is reachable.
Sat May 26 14:31:42 2018 - [info] Master MHA Node version is 0.53.
Sat May 26 14:31:42 2018 - [info] Checking recovery script configurations on the current master..
Sat May 26 14:31:42 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/masterha/save_binary_logs_test --manager_version=0.56 --start_f 
Sat May 26 14:31:42 2018 - [info]   Connecting to root@10.10.240.117(10.10.240.117).. 
  Creating /data/masterha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql, up to mysql-bin.000001
Sat May 26 14:31:42 2018 - [info] Master setting check done.
Sat May 26 14:31:42 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat May 26 14:31:42 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.10.0.98 --slave_ip=10.10.0.98 --slave_port=3306 --workdir=/data/masterha --targetx
Sat May 26 14:31:42 2018 - [info]   Connecting to root@10.10.0.98(10.10.0.98:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to relay-bin.000002
    Temporary relay log file is /data/mysql/relay-bin.000002
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat May 26 14:31:42 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.0.1.49 --slave_ip=10.0.1.49 --slave_port=3306 --workdir=/data/masterha --target_vx
Sat May 26 14:31:42 2018 - [info]   Connecting to root@10.0.1.49(10.0.1.49:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to relay-bin.000002
    Temporary relay log file is /data/mysql/relay-bin.000002
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat May 26 14:31:43 2018 - [info] Slaves settings check done.
Sat May 26 14:31:43 2018 - [info] 
10.10.240.117 (current master)
 +--10.10.0.98
 +--10.0.1.49

Sat May 26 14:31:43 2018 - [info] Checking replication health on 10.10.0.98..
Sat May 26 14:31:43 2018 - [info]  ok.
Sat May 26 14:31:43 2018 - [info] Checking replication health on 10.0.1.49..
Sat May 26 14:31:43 2018 - [info]  ok.
Sat May 26 14:31:43 2018 - [warning] master_ip_failover_script is not defined.
Sat May 26 14:31:43 2018 - [warning] shutdown_script is not defined.
Sat May 26 14:31:43 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
  • 报错:
Sat May 26 13:35:35 2018 - [error][/usr/local/share/perl5/MHA/Server.pm, ln383] 10.10.0.98(10.10.0.98:3306): User slave does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.

10.10.0.98是候选master,所以要授权有复制的权限
GRANT REPLICATION SLAVE ON . TO 'slave'@'%' IDENTIFIED by 'password';

使用type查看mysql和mysqlbinlog位置,链接到/usr/bin下,示例如下:

ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
  • 高可用配置

5.配置VIP

到此位置,仅仅配置好了mha,我们测试他的高可用性
配置中打开这个配置

master_ip_failover_script=/data/masterha/master_ip_failover
master_ip_online_change_script=/data/masterha/master_ip_failover

脚本内容如下,要设置好VIP,不过一开始的VIP要在主机master上手动设置,脚本中设置VIP地址

[root@Linuxea-Node61 /data/masterha]# cat master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
  
use Getopt::Long;
  
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
  
my $vip = '10.0.1.161/24';  # Virtual IP           #设置VIP地址
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
$ssh_user = "root";
GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);
  
exit &main();
  
sub main {
  
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
  
    if ( $command eq "stop" || $command eq "stopssh" ) {
  
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
        # If you manage master ip address at global catalog database,
        # invalidate orig_master_ip here.
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
  
        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_only=0, etc) here.
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
  
# A simple system call that enable the VIP on the new master
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
  
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

脚本修改后在配置Master VIP(在主节点/W)

[root@Linuxea-Node117 ~/mha4mysql-node-0.56]# /sbin/ifconfig eth0:1 10.0.1.161/24

在mha-manager节点启动

[root@Linuxea-Node61 /data/masterha]#  nohup masterha_manager --conf=/etc/masterha_default.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/manager.log 2>&1 & 

观察日志(日志太长,删了一些不需要的)

[root@Linuxea-Node61 /data/masterha]# tail -f /data/masterha/manager.log 
Sun Jun  3 18:12:30 2018 - [info]  read_only=1 is not set on slave 10.10.240.117(10.10.240.117:3306).
Sun Jun  3 18:12:30 2018 - [warning]  relay_log_purge=0 is not set on slave 10.10.240.117(10.10.240.117:3306).
Sun Jun  3 18:12:30 2018 - [info]  read_only=1 is not set on slave 10.0.1.49(10.0.1.49:3306).
Sun Jun  3 18:12:30 2018 - [warning]  relay_log_purge=0 is not set on slave 10.0.1.49(10.0.1.49:3306).
Sun Jun  3 18:12:30 2018 - [info] Checking replication filtering settings..
Sun Jun  3 18:12:30 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Jun  3 18:12:30 2018 - [info]  Replication filtering check ok.
Sun Jun  3 18:12:30 2018 - [info] Starting SSH connection tests..
Sun Jun  3 18:12:31 2018 - [info] All SSH connection tests passed successfully.
Sun Jun  3 18:12:31 2018 - [info] Checking MHA Node version..
Sun Jun  3 18:12:32 2018 - [info]  Version check ok.
Sun Jun  3 18:12:32 2018 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun  3 18:12:32 2018 - [info] HealthCheck: SSH to 10.10.0.98 is reachable.
Sun Jun  3 18:12:32 2018 - [info] Master MHA Node version is 0.56.
Sun Jun  3 18:12:32 2018 - [info] Checking recovery script configurations on the current master..
Sun Jun  3 18:12:32 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/masterha/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000002 
Sun Jun  3 18:12:32 2018 - [info]   Connecting to root@10.10.0.98(10.10.0.98).. 
  Creating /data/masterha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql, up to mysql-bin.000002
Sun Jun  3 18:12:32 2018 - [info] Master setting check done.
Sun Jun  3 18:12:32 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun  3 18:12:32 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.10.240.117 --slave_ip=10.10.240.117 --slave_port=3306 --workdir=/data/masterha --target_version=5.6.40-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
Sun Jun  3 18:12:32 2018 - [info]   Connecting to root@10.10.240.117(10.10.240.117:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to relay-bin.000004
    Temporary relay log file is /data/mysql/relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun  3 18:12:32 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.0.1.49 --slave_ip=10.0.1.49 --slave_port=3306 --workdir=/data/masterha --target_version=5.6.40-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
Sun Jun  3 18:12:32 2018 - [info]   Connecting to root@10.0.1.49(10.0.1.49:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to relay-bin.000004
    Temporary relay log file is /data/mysql/relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun  3 18:12:32 2018 - [info] Slaves settings check done.
Sun Jun  3 18:12:32 2018 - [info] 
10.10.0.98 (current master)
 +--10.10.240.117
 +--10.0.1.49

Sun Jun  3 18:12:32 2018 - [info] Checking master_ip_failover_script status:
Sun Jun  3 18:12:32 2018 - [info]   /data/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.10.0.98 --orig_master_ip=10.10.0.98 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===

Checking the Status of the script.. OK 
ssh: Could not resolve hostname cluster1: Name or service not known
Sun Jun  3 18:12:32 2018 - [info]  OK.
Sun Jun  3 18:12:32 2018 - [warning] shutdown_script is not defined.
Sun Jun  3 18:12:32 2018 - [info] Set master ping interval 1 seconds.
Sun Jun  3 18:12:32 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sun Jun  3 18:12:32 2018 - [info] Starting ping health check on 10.10.0.98(10.10.0.98:3306)..
Sun Jun  3 18:12:32 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
  • 可以运行masterha_master_switch来测试
[root@linuea-VM-Node145 /data/masterha]#  masterha_master_switch --conf=/etc/masterha_default.cnf --master_state=alive
Sun Jun  3 18:05:47 2018 - [info] MHA::MasterRotate version 0.56.
Sun Jun  3 18:05:47 2018 - [info] Starting online master switch..
Sun Jun  3 18:05:47 2018 - [info] 
Sun Jun  3 18:05:47 2018 - [info] * Phase 1: Configuration Check Phase..
Sun Jun  3 18:05:47 2018 - [info] 
Sun Jun  3 18:05:47 2018 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sun Jun  3 18:05:47 2018 - [info] Reading application default configurations from /etc/masterha_default.cnf..
Sun Jun  3 18:05:47 2018 - [info] Reading server configurations from /etc/masterha_default.cnf..
Sun Jun  3 18:05:48 2018 - [info] Current Alive Master: 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:05:48 2018 - [info] Alive Slaves:
Sun Jun  3 18:05:48 2018 - [info]   10.10.0.98(10.10.0.98:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:05:48 2018 - [info]     Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:05:48 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun  3 18:05:48 2018 - [info]   10.0.1.49(10.0.1.49:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:05:48 2018 - [info]     Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:05:48 2018 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.10.240.117(10.10.240.117:3306)? (YES/no): yes
Sun Jun  3 18:05:50 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Jun  3 18:05:50 2018 - [info]  ok.
Sun Jun  3 18:05:50 2018 - [info] Checking MHA is not monitoring or doing failover..
Sun Jun  3 18:05:50 2018 - [info] Checking replication health on 10.10.0.98..
Sun Jun  3 18:05:50 2018 - [info]  ok.
Sun Jun  3 18:05:50 2018 - [info] Checking replication health on 10.0.1.49..
Sun Jun  3 18:05:50 2018 - [info]  ok.
Sun Jun  3 18:05:50 2018 - [info] Searching new master from slaves..
Sun Jun  3 18:05:50 2018 - [info]  Candidate masters from the configuration file:
Sun Jun  3 18:05:50 2018 - [info]   10.10.240.117(10.10.240.117:3306)  Version=5.6.40-log log-bin:enabled
Sun Jun  3 18:05:50 2018 - [info]   10.10.0.98(10.10.0.98:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:05:50 2018 - [info]     Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:05:50 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun  3 18:05:50 2018 - [info]  Non-candidate masters:
Sun Jun  3 18:05:50 2018 - [info]   10.0.1.49(10.0.1.49:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:05:50 2018 - [info]     Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:05:50 2018 - [info]     Not candidate for the new Master (no_master is set)
Sun Jun  3 18:05:50 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sun Jun  3 18:05:50 2018 - [info] 
From:
10.10.240.117 (current master)
 +--10.10.0.98
 +--10.0.1.49

To:
10.10.0.98 (new master)
 +--10.0.1.49

Starting master switch from 10.10.240.117(10.10.240.117:3306) to 10.10.0.98(10.10.0.98:3306)? (yes/NO): yes
Sun Jun  3 18:05:53 2018 - [info] Checking whether 10.10.0.98(10.10.0.98:3306) is ok for the new master..
Sun Jun  3 18:05:53 2018 - [info]  ok.
Sun Jun  3 18:05:53 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jun  3 18:05:53 2018 - [info] 
Sun Jun  3 18:05:53 2018 - [info] * Phase 2: Rejecting updates Phase..
Sun Jun  3 18:05:53 2018 - [info] 
Sun Jun  3 18:05:53 2018 - [info] Executing master ip online change script to disable write on the current master:
Sun Jun  3 18:05:53 2018 - [info]   /data/masterha/master_ip_online_change_script --command=stop --orig_master_host=10.10.240.117 --orig_master_ip=10.10.240.117 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='password_123' --new_master_host=10.10.0.98 --new_master_ip=10.10.0.98 --new_master_port=3306 --new_master_user='mha' --new_master_password='password_123' --orig_master_ssh_user=root --new_master_ssh_user=root  
Unknown option: orig_master_user
Unknown option: orig_master_password
Unknown option: new_master_user
Unknown option: new_master_password
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===

Disabling the VIP on old master: 10.10.240.117 
Sun Jun  3 18:05:53 2018 - [info]  ok.
Sun Jun  3 18:05:53 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Jun  3 18:05:53 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Jun  3 18:05:53 2018 - [info]  ok.
Sun Jun  3 18:05:53 2018 - [info] Orig master binlog:pos is mysql-bin.000002:120.
Sun Jun  3 18:05:53 2018 - [info]  Waiting to execute all relay logs on 10.10.0.98(10.10.0.98:3306)..
Sun Jun  3 18:05:53 2018 - [info]  master_pos_wait(mysql-bin.000002:120) completed on 10.10.0.98(10.10.0.98:3306). Executed 0 events.
Sun Jun  3 18:05:53 2018 - [info]   done.
Sun Jun  3 18:05:53 2018 - [info] Getting new master's binlog name and position..
Sun Jun  3 18:05:53 2018 - [info]  mysql-bin.000001:637
Sun Jun  3 18:05:53 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.10.0.98', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=637, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Sun Jun  3 18:05:53 2018 - [info] Executing master ip online change script to allow write on the new master:
Sun Jun  3 18:05:53 2018 - [info]   /data/masterha/master_ip_online_change_script --command=start --orig_master_host=10.10.240.117 --orig_master_ip=10.10.240.117 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='password_123' --new_master_host=10.10.0.98 --new_master_ip=10.10.0.98 --new_master_port=3306 --new_master_user='mha' --new_master_password='password_123' --orig_master_ssh_user=root --new_master_ssh_user=root  
Unknown option: orig_master_user
Unknown option: orig_master_password
Unknown option: new_master_user
Unknown option: new_master_password
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===

Enabling the VIP - 10.0.1.161/24 on the new master - 10.10.0.98 
Sun Jun  3 18:05:53 2018 - [info]  ok.
Sun Jun  3 18:05:53 2018 - [info] 
Sun Jun  3 18:05:53 2018 - [info] * Switching slaves in parallel..
Sun Jun  3 18:05:53 2018 - [info] 
Sun Jun  3 18:05:53 2018 - [info] -- Slave switch on host 10.0.1.49(10.0.1.49:3306) started, pid: 3835
Sun Jun  3 18:05:53 2018 - [info] 
Sun Jun  3 18:05:54 2018 - [info] Log messages from 10.0.1.49 ...
Sun Jun  3 18:05:54 2018 - [info] 
Sun Jun  3 18:05:53 2018 - [info]  Waiting to execute all relay logs on 10.0.1.49(10.0.1.49:3306)..
Sun Jun  3 18:05:53 2018 - [info]  master_pos_wait(mysql-bin.000002:120) completed on 10.0.1.49(10.0.1.49:3306). Executed 0 events.
Sun Jun  3 18:05:53 2018 - [info]   done.
Sun Jun  3 18:05:53 2018 - [info]  Resetting slave 10.0.1.49(10.0.1.49:3306) and starting replication from the new master 10.10.0.98(10.10.0.98:3306)..
Sun Jun  3 18:05:53 2018 - [info]  Executed CHANGE MASTER.
Sun Jun  3 18:05:53 2018 - [info]  Slave started.
Sun Jun  3 18:05:54 2018 - [info] End of log messages from 10.0.1.49 ...
Sun Jun  3 18:05:54 2018 - [info] 
Sun Jun  3 18:05:54 2018 - [info] -- Slave switch on host 10.0.1.49(10.0.1.49:3306) succeeded.
Sun Jun  3 18:05:54 2018 - [info] Unlocking all tables on the orig master:
Sun Jun  3 18:05:54 2018 - [info] Executing UNLOCK TABLES..
Sun Jun  3 18:05:54 2018 - [info]  ok.
Sun Jun  3 18:05:54 2018 - [info] All new slave servers switched successfully.
Sun Jun  3 18:05:54 2018 - [info] 
Sun Jun  3 18:05:54 2018 - [info] * Phase 5: New master cleanup phase..
Sun Jun  3 18:05:54 2018 - [info] 
Sun Jun  3 18:05:54 2018 - [info]  10.10.0.98: Resetting slave info succeeded.
Sun Jun  3 18:05:54 2018 - [info] Switching master to 10.10.0.98(10.10.0.98:3306) completed successfully.

如果没有报错,我们需要重新组成mha,因为运行masterha_master_switch已经使mha主节点(10.10.240.117)发生改变,将剔除的机器重新同步主从即可,也就是说原来的主节点(10.10.240.117)已经手动切换到候选节点10.10.0.98了,10.10.240.117需要重新同步加入主从

  • 模拟故障
    现在主节点是10.10.0.98,关闭master(10.10.0.98)的mysql,模拟自动切换,脚本在上面已经有
    注意:现在已经将10.10.240.117组成一主两从了,配置中候选节点分别是10.10.240.117和10.10.0.98,现在关闭98,主节点将会切换到10.10.240.117,并且vip也会随之偏移
[root@Linuxea-Node98 ~/mha4mysql-node-0.56]# /etc/init.d/mysqld stop
Shutting down MySQL..... SUCCESS! 

观察日志(日志贴的不完整):

Sun Jun  3 18:21:58 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Sun Jun  3 18:21:58 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/masterha/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Sun Jun  3 18:21:58 2018 - [info] HealthCheck: SSH to 10.10.0.98 is reachable.
Sun Jun  3 18:21:59 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.10.0.98' (111))
Sun Jun  3 18:21:59 2018 - [warning] Connection failed 1 time(s)..
Sun Jun  3 18:22:00 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.10.0.98' (111))
Sun Jun  3 18:22:00 2018 - [warning] Connection failed 2 time(s)..
Sun Jun  3 18:22:01 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.10.0.98' (111))
Sun Jun  3 18:22:01 2018 - [warning] Connection failed 3 time(s)..
Sun Jun  3 18:22:01 2018 - [warning] Master is not reachable from health checker!
Sun Jun  3 18:22:01 2018 - [warning] Master 10.10.0.98(10.10.0.98:3306) is not reachable!
Sun Jun  3 18:22:01 2018 - [warning] SSH is reachable.
Sun Jun  3 18:22:01 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha_default.cnf again, and trying to connect to all servers to check server status..
Sun Jun  3 18:22:02 2018 - [info]   10.0.1.49(10.0.1.49:3306)
Sun Jun  3 18:22:02 2018 - [info] Alive Slaves:
Sun Jun  3 18:22:02 2018 - [info]   10.10.240.117(10.10.240.117:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:22:02 2018 - [info]     Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun  3 18:22:02 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun  3 18:22:02 2018 - [info]   10.0.1.49(10.0.1.49:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:22:02 2018 - [info]     Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun  3 18:22:02 2018 - [info]     Not candidate for the new Master (no_master is set)
Sun Jun  3 18:22:02 2018 - [info] Checking slave configurations..
Sun Jun  3 18:22:02 2018 - [info]  read_only=1 is not set on slave 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:22:03 2018 - [info] Dead Servers:
Sun Jun  3 18:22:03 2018 - [info]   10.10.0.98(10.10.0.98:3306)
Sun Jun  3 18:22:03 2018 - [info] Checking master reachability via mysql(double check)..
Sun Jun  3 18:22:03 2018 - [info]  ok.
Sun Jun  3 18:22:03 2018 - [info] Alive Servers:
Sun Jun  3 18:22:03 2018 - [info]   10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:22:03 2018 - [info]   10.0.1.49(10.0.1.49:3306)
Sun Jun  3 18:22:03 2018 - [info] Alive Slaves:
Sun Jun  3 18:22:03 2018 - [info]   10.10.240.117(10.10.240.117:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:22:03 2018 - [info]     Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun  3 18:22:03 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun  3 18:22:03 2018 - [info]   10.0.1.49(10.0.1.49:3306)  Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun  3 18:22:03 2018 - [info]     Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun  3 18:22:03 2018 - [info]     Not candidate for the new Master (no_master is set)
Sun Jun  3 18:22:03 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jun  3 18:22:03 2018 - [info] 
Sun Jun  3 18:22:03 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Jun  3 18:22:03 2018 - [info] 
Sun Jun  3 18:22:03 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Jun  3 18:22:03 2018 - [info] Executing master IP deactivatation script:
Sun Jun  3 18:22:03 2018 - [info]   /data/masterha/master_ip_failover --orig_master_host=10.10.0.98 --orig_master_ip=10.10.0.98 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===

Disabling the VIP on old master: 10.10.0.98 
Sun Jun  3 18:22:03 2018 - [info]  done
Sun Jun  3 18:22:04 2018 - [info]     Not candidate for the new Master (no_master is set)
Sun Jun  3 18:22:04 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sun Jun  3 18:22:04 2018 - [info] New master is 10.10.240.117(10.10.240.117:3306)
Sun Jun  3 18:22:04 2018 - [info] Starting master failover..
Sun Jun  3 18:22:04 2018 - [info] 
From:
10.10.0.98 (current master)
 +--10.10.240.117
 +--10.0.1.49

To:
10.10.240.117 (new master)
 +--10.0.1.49
Sun Jun  3 18:22:04 2018 - [info] 
Sun Jun  3 18:22:04 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sun Jun  3 18:22:04 2018 - [info] Read_Master_Log_Pos(mysql-bin.000002:120). No need to recover from Exec_Master_Log_Pos.
Sun Jun  3 18:22:04 2018 - [info] Connecting to the target slave host 10.10.240.117, running recover script..
Sun Jun  3 18:22:04 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.10.240.117 --slave_ip=10.10.240.117  --slave_port=3306 --apply_files=/data/masterha/saved_master_binlog_from_10.10.0.98_3306_20180603182202.binlog --workdir=/data/masterha --target_version=5.6.40-log --timestamp=20180603182202 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Sun Jun  3 18:22:04 2018 - [info] 
MySQL client version is 5.6.40. Using --binary-mode.
Applying differential binary/relay log files /data/masterha/saved_master_binlog_from_10.10.0.98_3306_20180603182202.binlog on 10.10.240.117:3306. This may take long time...
Applying log files succeeded.
Sun Jun  3 18:22:04 2018 - [info]  All relay logs were successfully applied.
Sun Jun  3 18:22:04 2018 - [info] Getting new master's binlog name and position..
Sun Jun  3 18:22:04 2018 - [info]  mysql-bin.000002:532
Sun Jun  3 18:22:04 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.10.240.117', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=532, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Sun Jun  3 18:22:04 2018 - [info] Executing master IP activate script:
Sun Jun  3 18:22:04 2018 - [info]   /data/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.10.0.98 --orig_master_ip=10.10.0.98 --orig_master_port=3306 --new_master_host=10.10.240.117 --new_master_ip=10.10.240.117 --new_master_port=3306 --new_master_user='mha' --new_master_password='password_123'  
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===

Enabling the VIP - 10.0.1.161/24 on the new master - 10.10.240.117 
Sun Jun  3 18:22:04 2018 - [info]  OK.
Sun Jun  3 18:22:06 2018 - [info]  Slave started.
Sun Jun  3 18:22:06 2018 - [info] End of log messages from 10.0.1.49.
Sun Jun  3 18:22:06 2018 - [info] Master failover to 10.10.240.117(10.10.240.117:3306) completed successfully.
Sun Jun  3 18:22:06 2018 - [info] Deleted server2 entry from /etc/masterha_default.cnf .
Sun Jun  3 18:22:06 2018 - [info] 

----- Failover Report -----

masterha_default: MySQL Master failover 10.10.0.98 to 10.10.240.117 succeeded

Master 10.10.0.98 is down!

Check MHA Manager logs at DS-VM-Node145.cluster.com:/data/masterha/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.10.0.98.
The latest slave 10.10.240.117(10.10.240.117:3306) has all relay logs for recovery.
Selected 10.10.240.117 as a new master.
10.10.240.117: OK: Applying all logs succeeded.
10.10.240.117: OK: Activated master IP address.
10.0.1.49: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.0.1.49: OK: Applying all logs succeeded. Slave started, replicating from 10.10.240.117.
10.10.240.117: Resetting slave info succeeded.
Master failover to 10.10.240.117(10.10.240.117:3306) completed successfully.

如果配置没有问题,vip会如愿飘到配置文件中的备用master(10.10.240.117)上,而主从就剩下10.10.240.117(主),10.0.1.49(从)

[root@Linuxea-Node117 ~]# ip a|grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.10.240.117/8 brd 10.255.255.255 scope global dynamic eth0
    inet 10.0.1.161/24 brd 10.0.1.255 scope global eth0:0

现在只有10.10.240.117(master)和10.0.1.49(slave),并且mha节点的配置文件已经发生变更,默认会将关闭的节点,或者说是宕机的mysql节点配置会被剔除,大概如下(如果恢复需要将配置加回来):

[server default]
manager_log=/data/masterha/manager.log
manager_workdir=/data/masterha
master_binlog_dir=/data/mysql
master_ip_failover_script=/data/masterha/master_ip_failover
master_ip_online_change_script=/data/masterha/master_ip_online_change_script
password=password_123
ping_interval=1
remote_workdir=/data/masterha
repl_password=password
repl_user=slave
ssh_user=root
user=mha

[server1]
candidate_master=1
check_repl_delay=0
hostname=10.10.240.117
master_binlog_dir=/data/mysql
port=3306

[server3]
hostname=10.0.1.49
master_binlog_dir=/data/mysql
no_master=1
port=3306
  • 恢复节点

删除mha上的/data/masterha/masterha_default.failover.complete

6.恢复节点

现在候选节点已经接替原有的master,原有的(10.10.0.98)master以从节点的角色上线即可(总之需要重新加入到主从中),也就是说将10.10.0.98重新同步主节点
mysql> reset slave;
Query OK, 0 rows affected (0.00 sec)

mysql> CHANGE MASTER TO
    -> MASTER_HOST='10.10.240.117',
    -> MASTER_USER='slave',
    -> MASTER_PASSWORD='password',
    -> MASTER_PORT=3306,
    -> MASTER_LOG_FILE='mysql-bin.000002',
    -> MASTER_LOG_POS=1653,
    -> MASTER_CONNECT_RETRY=20;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.240.117
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 20
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1653
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

注意:故障后配置文件/etc/masterha_default.cnf已经变更,需要添加原主节点的信息(10.10.0.98),因为在切换的过程中配置将被删除
将配置添加

[server2]
candidate_master=1
check_repl_delay=0
hostname=10.10.0.98
master_binlog_dir=/data/mysql
port=3306

并且此IP地址是10.10.240.117,并不是VIP地址,VIP地址只会随着脚本飘逸供应用使用(也就是会飘逸到主节点)

1 分享

您可以选择一种方式赞助本站

支付宝扫码赞助

支付宝扫码赞助

日期: 2018-05-31分类: Mariadb

标签: mariadb

发表评论