In modern internet architectures, high availability is a crucial consideration in system design. This article will detail how to use Keepalived and HAProxy to build a highly available load balancing cluster, ensuring service continuity and reliability.
The practical configuration section was not validated, and the article planning relies on AI completion
(Since I cannot see the image, I’m simply repeating the markdown as it was provided.)
Technical Overview
Keepalived Introduction
Keepalived is a high availability solution based on the VRRP (Virtual Router Redundancy Protocol) protocol, primarily used to implement server failover and load balancing.
Key Features:
- VRRP Protocol Support: Enables virtual IP address master/slave switching
- Health Checks: Monitors service status and automatically performs failover
- Simple Configuration: Complex high availability architectures can be achieved through configuration files alone
- Lightweight: Low resource consumption and excellent performance
Working Principle: Keepalived utilizes the VRRP protocol to share a virtual IP address across multiple servers. In normal operation, the master server holds the virtual IP and provides services; when the master server fails, the backup server automatically takes over the virtual IP, ensuring service continuity.
HAProxy Overview
HAProxy is a high-performance load balancer and reverse proxy server, widely used in high concurrency scenarios.
Key Features:
- Load Balancing: Supports various load balancing algorithms
- Health Checks: Monitors backend server status in real-time
- SSL Termination: Supports HTTPS traffic handling
- Statistical Monitoring: Provides detailed running state statistics
Application Scenarios:
- Web service load balancing
- Database connection pooling
- Microservice gateways
- API interface proxy
Architecture Design
Overall Architecture
┌─────────────────┐
│ Client │
└─────────┬───────┘
│
┌─────────▼───────┐
│ Virtual IP │
│ (VIP) │
└─────────┬───────┘
│
┌───────────────┼───────────────┐
│ │ │
┌─────────▼───────┐ ┌─────────▼───────┐
│ HAProxy-1 │ │ HAProxy-2 │
│ (Master) │◄────────────►│ (Backup) │
│ + Keepalived │ VRRP │ + Keepalived │
└─────────┬───────┘ └─────────┬───────┘
│ │
└──────────┬─────────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌───────▼───────┐ ┌──────▼──────┐ ┌───────▼───────┐
│ Web Server 1 │ │ Web Server 2│ │ Web Server 3 │
│ Backend │ │ Backend │ │ Backend │
└───────────────┘ └─────────────┘ └───────────────┘
Component Description
- Virtual IP (VIP): A unified entry point for clients to access.
- HAProxy Master-Backup Nodes: Provides load balancing services and achieves high availability through Keepalived.
- Backend Servers: The actual web servers providing the service.
Environment Setup
Server Planning
Role | IP Address | Hostname | Service |
---|---|---|---|
HAProxy Master Node | 192.168.1.10 | lb-master | HAProxy + Keepalived |
Server Planning
Role | IP Address | Hostname | Service |
---|---|---|---|
HAProxy Backup Node | 192.168.1.11 | lb-backup | HAProxy + Keepalived |
Server Planning
Role | IP Address | Hostname | Service |
---|---|---|---|
Virtual IP | 192.168.1.100 | - | VIP |
Server Planning
Role | IP Address | Hostname | Service |
---|
Server Planning
Role | IP Address | Hostname | Service |
---|
Server Planning
Role | IP Address | Hostname | Service |
---|
Software Installation
Install the necessary software on the HAProxy master and backup nodes:
# CentOS/RHEL
yum install -y haproxy keepalived
# Ubuntu/Debian
apt-get update
apt-get install -y haproxy keepalived
# Enable services to start automatically on boot
systemctl enable haproxy keepalived
Keepalived Configuration
Master Node Configuration (lb-master)
Create the configuration file /etc/keepalived/keepalived.conf
:
! Configuration File for keepalived
global_defs {
router_id LB_MASTER
script_user root
enable_script_security
}
# Script to check HAProxy service status
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
weight -2
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass mypassword123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
LB Backup Configuration
Create the configuration file /etc/keepalived/keepalived.conf
:
! Configuration File for keepalived
global_defs {
router_id LB_BACKUP
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
weight -2
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass mypassword123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
Health Check Script
Create the HAProxy health check script /etc/keepalived/check_haproxy.sh
:
#!/bin/bash
# Check if the haproxy process is running
if [ $(ps -C haproxy --no-header | wc -l) -eq 0 ]; then
# Attempt to start HAProxy
systemctl start haproxy
sleep 2
# Check again, if it's still not running exit
if [ $(ps -C haproxy --no-header | wc -l) -eq 0 ]; then
exit 1
fi
fi
# Check if HAProxy port is listening
if ! netstat -tuln | grep -q ":80 "; then
exit 1
fi
exit 0
State Notification Script
Create the state notification script /etc/keepalived/notify.sh
:
#!/bin/bash
TYPE=$1
NAME=$2
STATE=$3
case $STATE in
"MASTER")
echo "$(date): Became MASTER" >> /var/log/keepalived-state.log
;;
"BACKUP")
echo "$(date): Became BACKUP" >> /var/log/keepalived-state.log
;;
"FAULT")
echo "$(date): Fault detected" >> /var/log/keepalived-state.log
;;
*)
echo "$(date): Unknown state: $STATE" >> /var/log/keepalived-state.log
;;
esac
Set script execution permissions:
chmod +x /etc/keepalived/check_haproxy.sh
chmod +x /etc/keepalived/notify.sh
HAProxy Configuration
Main Configuration
Create the same HAProxy configuration file /etc/haproxy/haproxy.cfg
on the master node:
global
log 127.0.0.1:514 local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
mode http
log global
option httplog
option dontlognull
option log-health-checks
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
# Statistics page configuration
listen stats
bind *:8080
stats enable
stats uri /stats
stats realm HAProxy\ Statistics
stats auth admin:password123
stats refresh 30s
# Frontend configuration
frontend web_frontend
bind *:80
default_backend web_servers
# Backend server configuration
backend web_servers
balance roundrobin
option httpchk GET /health
server web1 192.168.1.20:80 check inter 2000 rise 2 fall 3
server web2 192.168.1.21:80 check inter 2000 rise 2 fall 3
server web3 192.168.1.22:80 check inter 2000 rise 2 fall 3
Configuration Instructions
Global Configuration:
log
: Log configurationchroot
: Security sandboxstats socket
: Management interfacedaemon
: Background execution
Default Configuration:
mode http
: HTTP modebalance roundrobin
: Round robin load balancingoption httpchk
: HTTP health checktimeout
: Various timeout settings
Backend Servers:
check
: Enable health checksinter 2000
: Check interval of 2 secondsrise 2
: Mark as available after 2 consecutive successful checksfall 3
: Mark as unavailable after 3 consecutive failed checks
Service Startup and Testing
Start Service
Start the service on the master and backup nodes:
# Start HAProxy
systemctl start haproxy
systemctl status haproxy
# Start Keepalived
systemctl start keepalived
systemctl status keepalived
Verify VIP Binding
Check if the virtual IP is correctly bound:
# View IP address on the master node
ip addr show
# You should see output similar to:
# eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
# inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
# inet 192.168.1.100/24 scope global secondary eth0:0
Functional Testing
1. Load Balancing Testing
# Repeatedly access the VIP and observe request distribution
for i in {1..10}; do
curl -s http://192.168.1.100/ | grep "Server"
done
2. Failover Testing
# Stop the HAProxy service on the primary node
systemctl stop haproxy
# Observe if the VIP switches to the backup node
ip addr show
# Test if the service is working normally
curl http://192.168.1.100/
3. Backend Server Failure Testing
# Stop one of the web servers
# On the web1 server:
systemctl stop nginx
# Observe the HAProxy statistics page
curl http://192.168.1.100:8080/stats
Monitoring and Maintenance
Log Monitoring
HAProxy Logs
# View HAProxy logs
tail -f /var/log/haproxy.log
# View access statistics
grep "HTTP/1.1" /var/log/haproxy.log | tail -20
Keepalived Logs
# View Keepalived logs
tail -f /var/log/messages | grep keepalived
# View state change logs
tail -f /var/log/keepalived-state.log
Performance Monitoring
Statistical Page Monitoring
Access the HAProxy statistics page: http://192.168.1.100:8080/stats
Key Metrics:
- Session Rate: Session rate
- Session Total: Total number of sessions
- Bytes In/Out: Traffic statistics
- Response Time: Response time
- Server Status: Server status
Command Line Monitoring
# Check HAProxy process status
ps aux | grep haproxy
# Check port listening status
netstat -tuln | grep -E "(80|8080)"
# Check connection count
ss -ant | grep :80 | wc -l
Troubleshooting FAQs
1. VIP cannot be switched
Problem Description: After the master node fails, the VIP does not switch to the backup node. Troubleshooting Steps:
# Check Keepalived configuration
keepalived -t -f /etc/keepalived/keepalived.conf
# View VRRP communication
tcpdump -i eth0 vrrp
# Check firewall settings
iptables -L | grep vrrp
Solution:
- Ensure that VRRP protocol communication is normal.
- Check network interface configuration.
- Verify authentication password consistency.
2. Health Check Failed
Problem Description: Backend server marked as unavailable Troubleshooting Steps:
# Manually execute health check
curl -I http://192.168.1.20/health
# View HAProxy logs
grep "Health check" /var/log/haproxy.log
Solution:
- Ensure the health check URL is accessible
- Adjust the check interval and thresholds
- Check the backend server status
3. Load Unbalance
Problem Description: Requests are not evenly distributed to the backend servers. Troubleshooting Steps:
# View statistics page
curl -s http://192.168.1.100:8080/stats
# Analyze access logs
awk '{print $6}' /var/log/haproxy.log | sort | uniq -c
Solution:
- Check the load balancing algorithm configuration
- Verify server weights settings
- Consider session persistence requirements
Optimization Suggestions
1. Performance Optimization
# Adjust system parameters
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog = 65535' >> /etc/sysctl.conf
sysctl -p
# Optimize HAProxy configuration
# Increase maxconn value
# Adjust timeout parameters
# Enable compression functionality
2. Security Hardening
# Restrict access to the statistics page
# Add ACL rules in haproxy.cfg
acl allowed_ips src 192.168.1.0/24
http-request deny if !allowed_ips
# Enable SSL/TLS
bind *:443 ssl crt /etc/ssl/certs/server.pem
redirect scheme https if !{ ssl_fc }
3. Monitoring and Alerts
# Integrate with a monitoring system
# Configure Prometheus for monitoring
# Set up Grafana dashboards
# Define alert rules
Summary
By combining Keepalived and HAProxy, we successfully built a highly available load balancing cluster. This solution offers the following advantages:
- High Availability: Achieved through VRRP protocol for automatic failover.
- Load Balancing: Intelligently distributes requests to improve system performance.
- Health Checks: Real-time monitoring of service status, automatically removing faulty nodes.
- Ease of Maintenance: Simple configuration and convenient management.
- Cost Effectiveness: Utilizing open-source software to reduce operational costs.
When deploying in a production environment, it’s also necessary to consider comprehensive aspects such as network security, monitoring and alerts, and backup/restore procedures to ensure stable and reliable operation of the system.