MHA安装及failover详细讲解

IT那活儿发布于2023-01-11 13:20 / 1819人阅读

MHA安装及failover详细讲解

概述

MHA（Master High Availability）目前在 MySQL 高可用方面是一个成熟的解决方案，它由日本 DeNA 公司的 youshimaton（现就职于 Facebook 公司）开发，是一套优秀的作为 MySQL 高可用性环境下故障切换和主从提升的高可用软件。

该软件由两部分组成：MHA Manager（管理节点）和 MHA Node（数据节点）。MHA Manager 可以多带带部署在一台独立的机器上管理多个 master-slave 集群，也可以部署在一台 slave 节点上。MHA Node 运行在每台 MySQL 服务器上，MHA Manager 会定时探测集群中的 master 节点，当 master 出现故障时，它可以自动将最新数据的 slave 提升为新的 master，然后将所有其他的 slave 重新指向新的 master。整个故障转移过程对应用程序完全透明。

下面是MHA安装和failover过程具体介绍。

环境

manager：10.230.20.156

master(mysql): 10.230.20.157

slave(mysql): 10.230.20.158

vip：10.230.20.159

架构图

配置ssh互信

在主机名为manager, node1-master,node2-slave上以相同的用户root创建ssh互信。

配置ssh互信的步骤如下：

首先，在要配置互信的机器上，生成各自的经过认证的key文件；
其次，将所有的key文件汇总到一个总的认证文件中；
将这个包含了所有互信机器认证key的认证文件，分发到各个机器中去；
验证互信。

1. 在各上各自创建 RSA密钥和公钥

使用root用户登陆：

cd /root/.ssh

ssh-keygen -t rsa

2. 整合公钥文件

在manager上执行以下命令(将其他主机的公钥都整合到manager的authorized_keys文件)

ssh manager cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
ssh node1-master cat /root/.ssh/id_rsa.pub >> /root /.ssh/authorized_keys
ssh node2-slave cat /root /.ssh/id_rsa.pub >> /root /.ssh/authorized_keys
chmod 600  /root /.ssh/authorized_keys

3. 分发整合后的公钥文件authorized_keys

在manager上执行以下命令：

scp /root /.ssh/authorized_keys node1-master: /root /.ssh/
scp /root /.ssh/authorized_keys node2-slave: /root /.ssh/

3. 测试ssh互信

在各个节点上运行以下命令，若不需要输入密码就显示系统当前日期，就说明SSH互信已经配置成功了。

ssh manager date

ssh node1-master date

ssh node2-slave date

配置M-S
（略）

vip挂载脚本

第一次需要手动挂载在master上，vip挂载脚本：

#!/bin/bash
#
#vip is for MHA.
#
. /etc/rc.d/init.d/functions

VIP=10.230.20.159
SIP=$2

host=`/bin/hostname`

case "$1" in
start)
       # Start vip with the Communication port on this machine.
        /sbin/ifconfig $SIP:0 $VIP broadcast $VIP netmask 255.255.255.0 up
        /sbin/route add -host $VIP dev $SIP:0

;;
stop)

        # Stop vip with the Communication port on this machine.
        /sbin/ifconfig $SIP:0 down

;;
status)

        # Status of vip with the Communication port on this machine.
        islothere=`/sbin/ifconfig $SIP:0 | grep $VIP`
        isrothere=`netstat -rn | grep "$SIP:0" | grep $VIP`
        if [ ! "$islothere" -o ! "isrothere" ];then
            # Either the route or the $SIP:0 device
            # not found.
            echo "vip is Stopped."
        else
            echo "vip is Running."
        fi
;;
*)
            # Invalid entry.
            echo "$0: Usage: $0 {start|status|stop}"
            exit 1
;;
esac

#赋权（master主机上）
[root@node1-master ~]# chmod +x vip.sh
[root@node1-master ~]# ./vip.sh start eth0
[root@node1-master ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:BB:7A:26
          inet addr:10.230.20.157  Bcast:10.230.20.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:febb:7a26/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9922122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:782588 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2027380843 (1.8 GiB) TX bytes:60565453 (57.7 MiB)

eth0:0    Link encap:Ethernet  HWaddr 00:50:56:BB:7A:26
          inet addr:10.230.20.159  Bcast:10.230.20.159  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1584 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1584 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:149120 (145.6 KiB) TX bytes:149120 (145.6 KiB)

安装MHA

采用rpm安装，manager节点需要安装manager包和node包，master和slave节点需要安装node包。

manager节点：

#yum install perl-DBI perl-DBD-mysql perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-Config-Tiny rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate

#rpm -ivh mha4mysql-manager-0.56-0.el6.noarch(1).rpm

#rpm -ivh mha4mysql-node-0.56-0.el6.noarch(1).rpm

master和slave节点：

#yum install perl-DBI perl-DBD-mysql perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-Config-Tiny rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate

#rpm -ivh mha4mysql-node-0.56-0.el6.noarch(1).rpm

配置mha主配置文件:

[root@manager ~]# vi /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
user=root
password=111111

ssh_user=root #ssh互信用户

repl_user=repl #复制账户和密码
repl_password=111111

ping_interval=1            #检测频率
master_ip_failover_script="/etc/masterha/master_ip_failover"     #ip切换脚本
#shutdown_script=""
#master_ip_online_change_script=""
report_script="/etc/masterha/send_report"


[server1]
hostname=10.230.20.157                            #第一个mysql配置信息
port=3306
candidate_master=1
master_binlog_dir="/opt/mysql/log/binlog"          #binlog位置


[server2]
hostname=10.230.20.158                   #第二个mysql配置信息
port=3306
candidate_master=1
master_binlog_dir="/opt/mysql/log/binlog"        #binlog位置

master_ip_failover切换脚本：

[root@manager masterha]#vi /etc/masterha/ master_ip_failover
[root@manager masterha]# chmod +x /etc/masterha/master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => all;

use Getopt::Long;

my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);

my $vip = 10.230.20.159/24; # Virtual IP
my $key = "0";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
$ssh_user = "root";
GetOptions(
    command=s          => $command,
    ssh_user=s         => $ssh_user,
    orig_master_host=s => $orig_master_host,
    orig_master_ip=s   => $orig_master_ip,
    orig_master_port=i => $orig_master_port,
    new_master_host=s  => $new_master_host,
    new_master_ip=s    => $new_master_ip,
    new_master_port=i  => $new_master_port,
);

exit &main();

sub main {

    print "

IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===

";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
        # If you manage master ip address at global catalog database,
        # invalidate orig_master_ip here.
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host 
";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@
";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_only=0, etc) here.
        my $exit_code = 10;
        eval {
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK 
";
        `ssh $ssh_user@cluster1 " $ssh_start_vip "`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
    `ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    `ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
";
}

检测状态

启动mha：

[root@manager ~]#masterha_manager --conf=/etc/masterha/app1.cnf > /var/log/masterha/app1/manager.log 2&>1 &

在manager上执行以下脚本可以检测manager节点到node节点间连接状态、主从复制状态、数据库启动状态等：

[root@manager ~]#masterha_check_ssh --conf=/etc/masterha/app1.cnf
[root@manager ~]#masterha_check_repl --conf=/etc/masterha/app1.cnf
[root@manager ~]#masterha_check_status --conf=/etc/masterha/app1.cnf

[root@manager ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Thu Jul 14 14:34:37 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 14 14:34:37 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:34:37 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:34:37 2020 - [info] Starting SSH connection tests..
Thu Jul 14 14:34:38 2020 - [debug] 
Thu Jul 14 14:34:37 2020 - [debug] Connecting via SSH from root@10.230.20.157(10.230.20.157:22) to root@10.230.20.158(10.230.20.158:22)..
Thu Jul 14 14:34:37 2020 - [debug] ok.
Thu Jul 14 14:34:38 2020 - [debug] 
Thu Jul 14 14:34:38 2020 - [debug] Connecting via SSH from root@10.230.20.158(10.230.20.158:22) to root@10.230.20.157(10.230.20.157:22)..
Thu Jul 14 14:34:38 2020 - [debug] ok.
Thu Jul 14 14:34:38 2020 - [info] All SSH connection tests passed successfully.


[root@manager ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Thu Jul 14 14:39:16 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 14 14:39:16 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 14 14:39:17 2020 - [info] GTID failover mode = 1
Thu Jul 14 14:39:17 2020 - [info] Dead Servers:
Thu Jul 14 14:39:17 2020 - [info] Alive Servers:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:39:17 2020 - [info] Alive Slaves:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:39:17 2020 - [info] GTID ON
Thu Jul 14 14:39:17 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:39:17 2020 - [info] Current Alive Master: 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Checking slave configurations..
Thu Jul 14 14:39:17 2020 - [info] read_only=1 is not set on slave 10.230.20.158(10.230.20.158:3306).
Thu Jul 14 14:39:17 2020 - [info] Checking replication filtering settings..
Thu Jul 14 14:39:17 2020 - [info] binlog_do_db= , binlog_ignore_db= 
Thu Jul 14 14:39:17 2020 - [info] Replication filtering check ok.
Thu Jul 14 14:39:17 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Jul 14 14:39:17 2020 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 14 14:39:17 2020 - [info] HealthCheck: SSH to 10.230.20.157 is reachable.
Thu Jul 14 14:39:17 2020 - [info] 
10.230.20.157(10.230.20.157:3306) (current master)
 +--10.230.20.158(10.230.20.158:3306)

Thu Jul 14 14:39:17 2020 - [info] Checking replication health on 10.230.20.158..
Thu Jul 14 14:39:17 2020 - [info] ok.
Thu Jul 14 14:39:17 2020 - [info] Checking master_ip_failover_script status:
Thu Jul 14 14:39:17 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.230.20.157 --orig_master_ip=10.230.20.157 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===

Checking the Status of the script.. OK 
ssh: Could not resolve hostname cluster1: Name or service not known
Thu Jul 14 14:39:18 2020 - [info] OK.
Thu Jul 14 14:39:18 2020 - [warning] shutdown_script is not defined.
Thu Jul 14 14:39:18 2020 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.


[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf 
app1 (pid:7773) is running(0:PING_OK), master:10.230.20.157

#具体过程可以回查manager.log

Thu Jul 14 14:39:16 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 14 14:39:16 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jul 14 14:39:16 2020 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 14 14:39:17 2020 - [info] GTID failover mode = 1
Thu Jul 14 14:39:17 2020 - [info] Dead Servers:
Thu Jul 14 14:39:17 2020 - [info] Alive Servers:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306)
Thu Jul 14 14:39:17 2020 - [info] Alive Slaves:
Thu Jul 14 14:39:17 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:39:17 2020 - [info] GTID ON
Thu Jul 14 14:39:17 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:39:17 2020 - [info] Current Alive Master: 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:39:17 2020 - [info] Checking slave configurations..
Thu Jul 14 14:39:17 2020 - [info] read_only=1 is not set on slave 10.230.20.158(10.230.20.158:3306).
Thu Jul 14 14:39:17 2020 - [info] Checking replication filtering settings..
Thu Jul 14 14:39:17 2020 - [info] binlog_do_db= , binlog_ignore_db= 
Thu Jul 14 14:39:17 2020 - [info] Replication filtering check ok.
Thu Jul 14 14:39:17 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Jul 14 14:39:17 2020 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 14 14:39:17 2020 - [info] HealthCheck: SSH to 10.230.20.157 is reachable.
Thu Jul 14 14:39:17 2020 - [info] 
10.230.20.157(10.230.20.157:3306) (current master)
 +--10.230.20.158(10.230.20.158:3306)

Thu Jul 14 14:39:17 2020 - [info] Checking replication health on 10.230.20.158..
Thu Jul 14 14:39:17 2020 - [info] ok.
Thu Jul 14 14:39:17 2020 - [info] Checking master_ip_failover_script status:
Thu Jul 14 14:39:17 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.230.20.157 --orig_master_ip=10.230.20.157 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===

Checking the Status of the script.. OK 
ssh: Could not resolve hostname cluster1: Name or service not known
Thu Jul 14 14:39:18 2020 - [info] OK.
Thu Jul 14 14:39:18 2020 - [warning] shutdown_script is not defined.
Thu Jul 14 14:39:18 2020 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

Thu Jul 14 14:41:04 2020 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 14 14:41:04 2020 - [info] GTID failover mode = 1
Thu Jul 14 14:41:04 2020 - [info] Dead Servers:
Thu Jul 14 14:41:04 2020 - [info] Alive Servers:
Thu Jul 14 14:41:04 2020 - [info] Alive Slaves:
Thu Jul 14 14:41:04 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:41:04 2020 - [info] GTID ON
Thu Jul 14 14:41:04 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:41:04 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:41:04 2020 - [info] Current Alive Master: 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:41:04 2020 - [info] Checking slave configurations..
Thu Jul 14 14:41:04 2020 - [info] read_only=1 is not set on slave 10.230.20.158(10.230.20.158:3306).
Thu Jul 14 14:41:04 2020 - [info] Checking replication filtering settings..
Thu Jul 14 14:41:04 2020 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 14 14:41:04 2020 - [info] Replication filtering check ok.
Thu Jul 14 14:41:04 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Jul 14 14:41:04 2020 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 14 14:41:04 2020 - [info] HealthCheck: SSH to 10.230.20.157 is reachable.
Thu Jul 14 14:41:04 2020 - [info]
10.230.20.157(10.230.20.157:3306) (current master)
 +--10.230.20.158(10.230.20.158:3306)

Thu Jul 14 14:41:04 2020 - [info] Checking master_ip_failover_script status:
Thu Jul 14 14:41:04 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.230.20.157 --orig_master_ip=10.230.20.157 --orig_master_port=3306


IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===

Checking the Status of the script.. OK
ssh: Could not resolve hostname cluster1: Name or service not known^M
Thu Jul 14 14:41:04 2020 - [info] OK.
Thu Jul 14 14:41:04 2020 - [warning] shutdown_script is not defined.
Thu Jul 14 14:41:04 2020 - [info] Set master ping interval 1 seconds.
Thu Jul 14 14:41:04 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Jul 14 14:41:04 2020 - [info] Starting ping health check on 10.230.20.157(10.230.20.157:3306)..
Thu Jul 14 14:41:04 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesnt respond..

验证MHA

现象：vip开始在master上，当manager监测到master宕机后，调用脚本，将主上延迟的日志脚本拉到slave上并执行完成（sql_thread执行完成）。然后调用master_ip_failover脚本，将vip绑定在slave上，继续对外提供服务。

Master机器宕掉mysql
[root@node1-master ~]# service mysqld stop
Shutting down MySQL............ SUCCESS!

[root@ node2-slave ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:BB:7A:27
          inet addr:10.230.20.158  Bcast:10.230.20.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:febb:7a27/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500  Metric:1
          RX packets:9413285 errors:0 dropped:0 overruns:0 frame:0
          TX packets:574709 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1368635304 (1.2 GiB) TX bytes:46609394 (44.4 MiB)

eth0:0    Link encap:Ethernet HWaddr 00:50:56:BB:7A:27
          inet addr:10.230.20.159  Bcast:10.230.20.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500  Metric:1

lo Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436  Metric:1
          RX packets:1627 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1627 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:155778 (152.1 KiB) TX bytes:155778 (152.1 KiB)

[root@client ~]# mysql -uroot -p111111 -h 10.230.20.159 -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 112
Server version: 5.7.12-log MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type help; or h for help. Type c to clear the current input statement.

[root@10.230.20.159][(none)]> show slave statusG
Empty set (0.00 sec)

切换日志

管理节点查看到日志(fairover之后，mha manager进程会自动关闭)

IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===

Disabling the VIP on old master: 10.230.20.157
Thu Jul 14 14:54:19 2020 - [info] done.
Thu Jul 14 14:54:19 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 14 14:54:19 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] The latest binary log file/position on all slaves is bin.000013:234
Thu Jul 14 14:54:19 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 14 14:54:19 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:54:19 2020 - [info] GTID ON
Thu Jul 14 14:54:19 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:54:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:54:19 2020 - [info] The oldest binary log file/position on all slaves is bin.000013:234
Thu Jul 14 14:54:19 2020 - [info] Oldest slaves:
Thu Jul 14 14:54:19 2020 - [info] 10.230.20.158(10.230.20.158:3306) Version=5.7.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 14 14:54:19 2020 - [info] GTID ON
Thu Jul 14 14:54:19 2020 - [info] Replicating from 10.230.20.157(10.230.20.157:3306)
Thu Jul 14 14:54:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] Searching new master from slaves..
….
Thu Jul 14 14:54:19 2020 - [info] done.
Thu Jul 14 14:54:19 2020 - [info] Getting new masters binlog name and position..
Thu Jul 14 14:54:19 2020 - [info] bin.000006:194
Thu Jul 14 14:54:19 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: bin.000006, 194, 548dab0b-3691-11e6-b14e-005056bb7a26:137-154:157,
74645350-368b-11e6-912d-005056bb7a27:1-142
Thu Jul 14 14:54:19 2020 - [info] Executing master IP activate script:
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.230.20.159/24===

Enabling the VIP - 10.230.20.159/24 on the new master - 10.230.20.158
Thu Jul 14 14:54:19 2020 - [info] OK.
Thu Jul 14 14:54:19 2020 - [info] ** Finished master recovery successfully.
Thu Jul 14 14:54:19 2020 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] All new slave servers recovered successfully.
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] * Phase 5: New master cleanup phase..
Thu Jul 14 14:54:19 2020 - [info]
Thu Jul 14 14:54:19 2020 - [info] Resetting slave info on the new master..
Thu Jul 14 14:54:19 2020 - [info] 10.230.20.158: Resetting slave info succeeded.
Thu Jul 14 14:54:19 2020 - [info] Master failover to 10.230.20.158(10.230.20.158:3306) completed successfully.
Thu Jul 14 14:54:19 2020 - [info]

----- Failover Report -----

app1: MySQL Master failover 10.230.20.157(10.230.20.157:3306) to 10.230.20.158(10.230.20.158:3306) succeeded

Master 10.230.20.157(10.230.20.157:3306) is down!

Check MHA Manager logs at manager:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.230.20.157(10.230.20.157:3306)
Selected 10.230.20.158(10.230.20.158:3306) as a new master.
10.230.20.158(10.230.20.158:3306): OK: Applying all logs succeeded.
10.230.20.158(10.230.20.158:3306): OK: Activated master IP address.
10.230.20.158(10.230.20.158:3306): Resetting slave info succeeded.
Master failover to 10.230.20.158(10.230.20.158:3306) completed successfully.
Thu Jul 14 14:54:19 2020 - [info] Sending mail..

总结

至此完成整个MHA的安装与failover测试,可以将MHA切换过程总结为以下几条：

配置文件检查阶段,这个阶段会检查整个集群配置文件配置;
宕机的 master 处理,这个阶段包括虚拟 ip 摘除操作;
复制 dead maste 和最新 slave 相差的 relay log,并保存到 MHA Manger 具体的目录下;
识别含有最新更新的 slave;
应用从 master 保存的二进制日志事件(binlog events)（这点信息对于将故障master修复后加入集群很重要）;
提升一个 slave 为新的 master 进行复制;
使其他的 slave 连接新的 master 进行复制。

切换完成后,关注如下变化:

vip 自动从原来的 master 切换到新的 master,同时,manager 节点的监控进程自动退出。
在日志目录(/var/log/mha/app1)产生一个 app1.failover.complete 文件
/etc/mha/app1.cnf 配置文件中原来老的 master 配置被删除。

END

更多精彩干货分享

点击下方名片关注

IT那活儿

云服务器 GPU云服务器安装php详细说明 centos7安装详细图解 mha python下载及安装教程

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/129765.html

MySQL集群搭建(5)-MHA高可用架构

摘要：前面的文章介绍了怎么从单点开始搭建集群，列表如下安装二进制版集群搭建主备搭建集群搭建主主从模式集群搭建高可用架构集群搭建今天说另一个常用的高可用方案概述简介是由实现的一款高可用程序，出现故障时，以最小的停机时间通常秒执行的故障转前面的文章介绍了怎么从单点开始搭建MySQL集群，列表如下 MySQL 安装(二进制版) MySQL集群搭建(1)-主备搭建 MySQL集群搭建(2)-主主...

Michael_Lin 2019-06-25 19:01 评论0 收藏0

发表评论

登陆后可评论

0条评论

IT那活儿

男|高级讲师

我要关注我要私信

TA的文章

消息中间件故障分析一例

阅读 1493·2023-01-11 13:20
RAC双节点crash回复一例

阅读 1853·2023-01-11 13:20
ORA-600处理一例

阅读 1290·2023-01-11 13:20
双节点RAC实例2 HANG 故障分析一例

阅读 2042·2023-01-11 13:20
RAC集群节点1重启分析一例

阅读 4244·2023-01-11 13:20
CRS启动报错CRS-1656处理分享

阅读 2958·2023-01-11 13:20
oracle 12CR2打补丁报错处理一例

阅读 1583·2023-01-11 13:20
分布式缓存组件故障分析及监控优化

阅读 3857·2023-01-11 13:20

资讯专栏INFORMATION COLUMN

上云采购季！| 2核2G4M爆款云服务器低至59元/年，更有多台、长期优惠，快来选购！

MHA安装及failover详细讲解

2. 整合公钥文件

3. 分发整合后的公钥文件authorized_keys

3. 测试ssh互信

manager节点：

master_ip_failover切换脚本：

相关文章

**MySQL集群搭建(5)-MHA高可用架构**

发表评论

0条评论

IT那活儿

男|高级讲师

TA的文章

消息中间件故障分析一例

RAC双节点crash回复一例

ORA-600处理一例

双节点RAC实例2 HANG 故障分析一例

RAC集群节点1重启分析一例

CRS启动报错CRS-1656处理分享

oracle 12CR2打补丁报错处理一例

分布式缓存组件故障分析及监控优化

最新活动