概述
MHA(Master High Availability)目前在 MySQL 高可用方面是一个成熟的解决方案,它由日本 DeNA 公司的 youshimaton(现就职于 Facebook 公司)开发,是一套优秀的作为 MySQL 高可用性环境下故障切换和主从提升的高可用软件。
该软件由两部分组成:MHA Manager(管理节点)和 MHA Node(数据节点)。MHA Manager 可以多带带部署在一台独立的机器上管理多个 master-slave 集群,也可以部署在一台 slave 节点上。MHA Node 运行在每台 MySQL 服务器上,MHA Manager 会定时探测集群中的 master 节点,当 master 出现故障时,它可以自动将最新数据的 slave 提升为新的 master,然后将所有其他的 slave 重新指向新的 master。整个故障转移过程对应用程序完全透明。
下面是MHA安装和failover过程具体介绍。
环境
manager:10.230.20.156
master(mysql): 10.230.20.157
slave(mysql): 10.230.20.158
vip:10.230.20.159
架构图
配置ssh互信
在主机名为manager, node1-master,node2-slave上以相同的用户root创建ssh互信。
配置ssh互信的步骤如下:
首先,在要配置互信的机器上,生成各自的经过认证的key文件;
其次,将所有的key文件汇总到一个总的认证文件中;
将这个包含了所有互信机器认证key的认证文件,分发到各个机器中去;
验证互信。
使用root用户登陆:
cd /root/.ssh
ssh-keygen -t rsa
在manager上执行以下命令(将其他主机的公钥都整合到manager的authorized_keys文件)
ssh manager cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
ssh node1-master cat /root/.ssh/id_rsa.pub >> /root /.ssh/authorized_keys
ssh node2-slave cat /root /.ssh/id_rsa.pub >> /root /.ssh/authorized_keys
chmod 600 /root /.ssh/authorized_keys
在manager上执行以下命令:
scp /root /.ssh/authorized_keys node1-master: /root /.ssh/
scp /root /.ssh/authorized_keys node2-slave: /root /.ssh/
在各个节点上运行以下命令,若不需要输入密码就显示系统当前日期,就说明SSH互信已经配置成功了。
配置M-S
(略)
vip挂载脚本
第一次需要手动挂载在master上,vip挂载脚本:
#!/bin/bash
#
#vip is for MHA.
#
. /etc/rc.d/init.d/functions
VIP=10.230.20.159
SIP=$2
host=`/bin/hostname`
case "$1" in
start)
# Start vip with the Communication port on this machine.
/sbin/ifconfig $SIP:0 $VIP broadcast $VIP netmask 255.255.255.0 up
/sbin/route add -host $VIP dev $SIP:0
;;
stop)
# Stop vip with the Communication port on this machine.
/sbin/ifconfig $SIP:0 down
;;
status)
# Status of vip with the Communication port on this machine.
islothere=`/sbin/ifconfig $SIP:0 | grep $VIP`
isrothere=`netstat -rn | grep "$SIP:0" | grep $VIP`
if [ ! "$islothere" -o ! "isrothere" ];then
# Either the route or the $SIP:0 device
# not found.
echo "vip is Stopped."
else
echo "vip is Running."
fi
;;
*)
# Invalid entry.
echo "$0: Usage: $0 {start|status|stop}"
exit 1
;;
esac
#赋权(master主机上)
[root@node1-master ~]# chmod +x vip.sh
[root@node1-master ~]# ./vip.sh start eth0
[root@node1-master ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:BB:7A:26
inet addr:10.230.20.157 Bcast:10.230.20.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:febb:7a26/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:9922122 errors:0 dropped:0 overruns:0 frame:0
TX packets:782588 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2027380843 (1.8 GiB) TX bytes:60565453 (57.7 MiB)
eth0:0 Link encap:Ethernet HWaddr 00:50:56:BB:7A:26
inet addr:10.230.20.159 Bcast:10.230.20.159 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1584 errors:0 dropped:0 overruns:0 frame:0
TX packets:1584 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:149120 (145.6 KiB) TX bytes:149120 (145.6 KiB)
安装MHA
采用rpm安装,manager节点需要安装manager包和node包,master和slave节点需要安装node包。
#yum install perl-DBI perl-DBD-mysql perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-Config-Tiny rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate #rpm -ivh mha4mysql-manager-0.56-0.el6.noarch(1).rpm #rpm -ivh mha4mysql-node-0.56-0.el6.noarch(1).rpm |
#yum install perl-DBI perl-DBD-mysql perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-Config-Tiny rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate #rpm -ivh mha4mysql-node-0.56-0.el6.noarch(1).rpm |
[root@manager ~]# vi /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
user=root
password=111111
ssh_user=root #ssh互信用户
repl_user=repl #复制账户和密码
repl_password=111111
ping_interval=1 #检测频率
master_ip_failover_script="/etc/masterha/master_ip_failover" #ip切换脚本
#shutdown_script=""
#master_ip_online_change_script=""
report_script="/etc/masterha/send_report"
[server1]
hostname=10.230.20.157 #第一个mysql配置信息
port=3306
candidate_master=1
master_binlog_dir="/opt/mysql/log/binlog" #binlog位置
[server2]
hostname=10.230.20.158 #第二个mysql配置信息
port=3306
candidate_master=1
master_binlog_dir="/opt/mysql/log/binlog" #binlog位置
[root@manager masterha]#vi /etc/masterha/ master_ip_failover
[root@manager masterha]# chmod +x /etc/masterha/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => all;
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = 10.230.20.159/24; # Virtual IP
my $key = "0";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
$ssh_user = "root";
GetOptions(
command=s => $command,
ssh_user=s => $ssh_user,
orig_master_host=s => $orig_master_host,
orig_master_ip=s => $orig_master_ip,
orig_master_port=i => $orig_master_port,
new_master_host=s => $new_master_host,
new_master_ip=s => $new_master_ip,
new_master_port=i => $new_master_port,
);
exit &main();
sub main {
print "
IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===
";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host
";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@
";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK
";
`ssh $ssh_user@cluster1 " $ssh_start_vip "`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
";
}
检测状态
启动mha:
[root@manager ~]#masterha_manager --conf=/etc/masterha/app1.cnf > /var/log/masterha/app1/manager.log 2&>1 &
在manager上执行以下脚本可以检测manager节点到node节点间连接状态、主从复制状态、数据库启动状态等:
[root@manager ~]#masterha_check_ssh --conf=/etc/masterha/app1.cnf
[root@manager ~]#masterha_check_repl --conf=/etc/masterha/app1.cnf
[root@manager ~]#masterha_check_status --conf=/etc/masterha/app1.cnf
[root@manager ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Jul 14 14:34:37 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 14 14:34:37 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:34:37 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:34:37 2020 - [info] Starting SSH connection tests..
Thu Jul 14 14:34:38 2020 - [debug]
Thu Jul 14 14:34:37 2020 - [debug] Connecting via SSH from root@10.230.20.157(10.230.20.157:22) to root@10.230.20.158(10.230.20.158:22)..
Thu Jul 14 14:34:37 2020 - [debug] ok.
Thu Jul 14 14:34:38 2020 - [debug]
Thu Jul 14 14:34:38 2020 - [debug] Connecting via SSH from root@10.230.20.158(10.230.20.158:22) to root@10.230.20.157(10.230.20.157:22)..
Thu Jul 14 14:34:38 2020 - [debug] ok.
Thu Jul 14 14:34:38 2020 - [info] All SSH connection tests passed successfully.
[root@manager ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Jul 14 14:39:16 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 14 14:39:16 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 14 14:39:17 2020 - [info] GTID failover mode = 1
Thu Jul 14 14:39:17 2020 - [info] Dead Servers:
Thu Jul 14 14:39:17 2020 - [info] Alive Servers:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:39:17 2020 - [info] Alive Slaves:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:39:17 2020 - [info] GTID ON
Thu Jul 14 14:39:17 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:39:17 2020 - [info] Current Alive Master: 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Checking slave configurations..
Thu Jul 14 14:39:17 2020 - [info] read_only=1 is not set on slave 10.230.20.158(10.230.20.158:3306).
Thu Jul 14 14:39:17 2020 - [info] Checking replication filtering settings..
Thu Jul 14 14:39:17 2020 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 14 14:39:17 2020 - [info] Replication filtering check ok.
Thu Jul 14 14:39:17 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Jul 14 14:39:17 2020 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 14 14:39:17 2020 - [info] HealthCheck: SSH to 10.230.20.157 is reachable.
Thu Jul 14 14:39:17 2020 - [info]
10.230.20.157(10.230.20.157:3306) (current master)
+--10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:39:17 2020 - [info] Checking replication health on 10.230.20.158..
Thu Jul 14 14:39:17 2020 - [info] ok.
Thu Jul 14 14:39:17 2020 - [info] Checking master_ip_failover_script status:
Thu Jul 14 14:39:17 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.230.20.157 --orig_master_ip=10.230.20.157 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===
Checking the Status of the script.. OK
ssh: Could not resolve hostname cluster1: Name or service not known
Thu Jul 14 14:39:18 2020 - [info] OK.
Thu Jul 14 14:39:18 2020 - [warning] shutdown_script is not defined.
Thu Jul 14 14:39:18 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:7773) is running(0:PING_OK), master:10.230.20.157
#具体过程可以回查manager.log
Thu Jul 14 14:39:16 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 14 14:39:16 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 14 14:39:17 2020 - [info] GTID failover mode = 1
Thu Jul 14 14:39:17 2020 - [info] Dead Servers:
Thu Jul 14 14:39:17 2020 - [info] Alive Servers:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:39:17 2020 - [info] Alive Slaves:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:39:17 2020 - [info] GTID ON
Thu Jul 14 14:39:17 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:39:17 2020 - [info] Current Alive Master: 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Checking slave configurations..
Thu Jul 14 14:39:17 2020 - [info] read_only=1 is not set on slave 10.230.20.158(10.230.20.158:3306).
Thu Jul 14 14:39:17 2020 - [info] Checking replication filtering settings..
Thu Jul 14 14:39:17 2020 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 14 14:39:17 2020 - [info] Replication filtering check ok.
Thu Jul 14 14:39:17 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Jul 14 14:39:17 2020 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 14 14:39:17 2020 - [info] HealthCheck: SSH to 10.230.20.157 is reachable.
Thu Jul 14 14:39:17 2020 - [info]
10.230.20.157(10.230.20.157:3306) (current master)
+--10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:39:17 2020 - [info] Checking replication health on 10.230.20.158..
Thu Jul 14 14:39:17 2020 - [info] ok.
Thu Jul 14 14:39:17 2020 - [info] Checking master_ip_failover_script status:
Thu Jul 14 14:39:17 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.230.20.157 --orig_master_ip=10.230.20.157 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===
Checking the Status of the script.. OK
ssh: Could not resolve hostname cluster1: Name or service not known
Thu Jul 14 14:39:18 2020 - [info] OK.
Thu Jul 14 14:39:18 2020 - [warning] shutdown_script is not defined.
Thu Jul 14 14:39:18 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
Thu Jul 14 14:41:04 2020 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 14 14:41:04 2020 - [info] GTID failover mode = 1
Thu Jul 14 14:41:04 2020 - [info] Dead Servers:
Thu Jul 14 14:41:04 2020 - [info] Alive Servers:
Thu Jul 14 14:41:04 2020 - [info] Alive Slaves:
Thu Jul 14 14:41:04 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:41:04 2020 - [info] GTID ON
Thu Jul 14 14:41:04 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:41:04 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:41:04 2020 - [info] Current Alive Master: 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:41:04 2020 - [info] Checking slave configurations..
Thu Jul 14 14:41:04 2020 - [info] read_only=1 is not set on slave 10.230.20.158(10.230.20.158:3306).
Thu Jul 14 14:41:04 2020 - [info] Checking replication filtering settings..
Thu Jul 14 14:41:04 2020 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 14 14:41:04 2020 - [info] Replication filtering check ok.
Thu Jul 14 14:41:04 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Jul 14 14:41:04 2020 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 14 14:41:04 2020 - [info] HealthCheck: SSH to 10.230.20.157 is reachable.
Thu Jul 14 14:41:04 2020 - [info]
10.230.20.157(10.230.20.157:3306) (current master)
+--10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:41:04 2020 - [info] Checking master_ip_failover_script status:
Thu Jul 14 14:41:04 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.230.20.157 --orig_master_ip=10.230.20.157 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===
Checking the Status of the script.. OK
ssh: Could not resolve hostname cluster1: Name or service not known^M
Thu Jul 14 14:41:04 2020 - [info] OK.
Thu Jul 14 14:41:04 2020 - [warning] shutdown_script is not defined.
Thu Jul 14 14:41:04 2020 - [info] Set master ping interval 1 seconds.
Thu Jul 14 14:41:04 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Jul 14 14:41:04 2020 - [info] Starting ping health check on 10.230.20.157(10.230.20.157:3306)..
Thu Jul 14 14:41:04 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesnt respond..
验证MHA
现象:vip开始在master上,当manager监测到master宕机后,调用脚本,将主上延迟的日志脚本拉到slave上并执行完成(sql_thread执行完成)。然后调用master_ip_failover脚本,将vip绑定在slave上,继续对外提供服务。
Master机器宕掉mysql
[root@node1-master ~]# service mysqld stop
Shutting down MySQL............ SUCCESS!
[root@ node2-slave ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:BB:7A:27
inet addr:10.230.20.158 Bcast:10.230.20.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:febb:7a27/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:9413285 errors:0 dropped:0 overruns:0 frame:0
TX packets:574709 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1368635304 (1.2 GiB) TX bytes:46609394 (44.4 MiB)
eth0:0 Link encap:Ethernet HWaddr 00:50:56:BB:7A:27
inet addr:10.230.20.159 Bcast:10.230.20.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1627 errors:0 dropped:0 overruns:0 frame:0
TX packets:1627 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:155778 (152.1 KiB) TX bytes:155778 (152.1 KiB)
[root@client ~]# mysql -uroot -p111111 -h 10.230.20.159 -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 112
Server version: 5.7.12-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type help; or h for help. Type c to clear the current input statement.
[root@10.230.20.159][(none)]> show slave statusG
Empty set (0.00 sec)
切换日志
管理节点查看到日志(fairover之后,mha manager进程会自动关闭)
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===
Disabling the VIP on old master: 10.230.20.157
Thu Jul 14 14:54:19 2020 - [info] done.
Thu Jul 14 14:54:19 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 14 14:54:19 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] The latest binary log file/position on all slaves is bin.000013:234
Thu Jul 14 14:54:19 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 14 14:54:19 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:54:19 2020 - [info] GTID ON
Thu Jul 14 14:54:19 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:54:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:54:19 2020 - [info] The oldest binary log file/position on all slaves is bin.000013:234
Thu Jul 14 14:54:19 2020 - [info] Oldest slaves:
Thu Jul 14 14:54:19 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:54:19 2020 - [info] GTID ON
Thu Jul 14 14:54:19 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:54:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] Searching new master from slaves..
….
Thu Jul 14 14:54:19 2020 - [info] done.
Thu Jul 14 14:54:19 2020 - [info] Getting new masters binlog name and position..
Thu Jul 14 14:54:19 2020 - [info] bin.000006:194
Thu Jul 14 14:54:19 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: bin.000006, 194, 548dab0b-3691-11e6-b14e-005056bb7a26:137-154:157,
74645350-368b-11e6-912d-005056bb7a27:1-142
Thu Jul 14 14:54:19 2020 - [info] Executing master IP activate script:
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===
Enabling the VIP - 10.230.20.159/24 on the new master - 10.230.20.158
Thu Jul 14 14:54:19 2020 - [info] OK.
Thu Jul 14 14:54:19 2020 - [info] ** Finished master recovery successfully.
Thu Jul 14 14:54:19 2020 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] All new slave servers recovered successfully.
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 5: New master cleanup phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] Resetting slave info on the new master..
Thu Jul 14 14:54:19 2020 - [info] 10.230.20.158: Resetting slave info succeeded.
Thu Jul 14 14:54:19 2020 - [info] Master failover to 10.230.20.158(10.230.20.158:3306) completed successfully.
Thu Jul 14 14:54:19 2020 - [info]
----- Failover Report -----
app1: MySQL Master failover 10.230.20.157(10.230.20.157:3306) to 10.230.20.158(10.230.20.158:3306) succeeded
Master 10.230.20.157(10.230.20.157:3306) is down!
Check MHA Manager logs at manager:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 10.230.20.157(10.230.20.157:3306)
Selected 10.230.20.158(10.230.20.158:3306) as a new master.
10.230.20.158(10.230.20.158:3306): OK: Applying all logs succeeded.
10.230.20.158(10.230.20.158:3306): OK: Activated master IP address.
10.230.20.158(10.230.20.158:3306): Resetting slave info succeeded.
Master failover to 10.230.20.158(10.230.20.158:3306) completed successfully.
Thu Jul 14 14:54:19 2020 - [info] Sending mail..
总结
至此完成整个MHA的安装与failover测试,可以将MHA切换过程总结为以下几条:
配置文件检查阶段,这个阶段会检查整个集群配置文件配置;
宕机的 master 处理,这个阶段包括虚拟 ip 摘除操作;
复制 dead maste 和最新 slave 相差的 relay log,并保存到 MHA Manger 具体的目录下;
识别含有最新更新的 slave;
应用从 master 保存的二进制日志事件(binlog events)(这点信息对于将故障master修复后加入集群很重要);
提升一个 slave 为新的 master 进行复制;
使其他的 slave 连接新的 master 进行复制。
切换完成后,关注如下变化:
vip 自动从原来的 master 切换到新的 master,同时,manager 节点的监控进程自动退出。
在日志目录(/var/log/mha/app1)产生一个 app1.failover.complete 文件
/etc/mha/app1.cnf 配置文件中原来老的 master 配置被删除。
更多精彩干货分享
点击下方名片关注
IT那活儿
文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/129765.html
摘要:前面的文章介绍了怎么从单点开始搭建集群,列表如下安装二进制版集群搭建主备搭建集群搭建主主从模式集群搭建高可用架构集群搭建今天说另一个常用的高可用方案概述简介是由实现的一款高可用程序,出现故障时,以最小的停机时间通常秒执行的故障转 前面的文章介绍了怎么从单点开始搭建MySQL集群,列表如下 MySQL 安装(二进制版) MySQL集群搭建(1)-主备搭建 MySQL集群搭建(2)-主主...
阅读 1355·2023-01-11 13:20
阅读 1705·2023-01-11 13:20
阅读 1214·2023-01-11 13:20
阅读 1906·2023-01-11 13:20
阅读 4164·2023-01-11 13:20
阅读 2754·2023-01-11 13:20
阅读 1399·2023-01-11 13:20
阅读 3670·2023-01-11 13:20