OpenNebula HA Setup¶
This guide walks you through the process of setting a highly available cluster for OpenNebula core services: core (oned), scheduler (mm_sched).
OpenNebula uses a distributed consensus protocol to provide fault-tolerance and state consistency across OpenNebula services. In this section, you learn the basics of how to bootstrap and operate an OpenNebula distributed cluster.
If you are interested in fail-over protection against hardware and operating system outages within your virtualized IT environment, check the Virtual Machines High Availability Guide.
This section covers some internals on how OpenNebula implements Raft. You do not need to know these details to effectively operate OpenNebula on HA. These details are provided for those who wish to learn about them to fine tune their deployments.
A consensus algorithm is built around two concepts:
- System State, the OpenNebula data stored in the database tables (users, ACLs, or the VMs in the system).
- Log, a sequence of SQL statements that are consistently applied to the OpenNebula DB in all servers to evolve the system state.
To preserve a consistent view of the system across servers, modifications to system state are performed through a special node, the leader. The servers in the OpenNebula cluster elect a single node to be the leader. The leader periodically sends heartbeats to the other servers, the followers, to keep its leadership. If a leader fails to send the heartbeat, followers promote to candidates and start a new election.
Whenever the system is modified (e.g. a new VM is added to the system), the leader updates the log and replicates the entry in a majority of followers before actually writing it to the database. The latency of DB operations is thus increased, but the system state is safely replicated, and the cluster can continue its operation in case of node failure.
In OpenNebula, read-only operations can be performed through any oned server in the cluster; this means that reads can be arbitrarily stale but generally within the round-trip time of the network.
Requirements and Architecture¶
The recommended deployment size is either 3 or 5 servers, which provides a fault-tolerance for 1 or 2 server failures, respectively. You can add, replace or remove servers once the cluster is up and running.
Every HA cluster requires:
- An odd number of servers (3 is recommended).
- (Recommended) identical server capacities.
- The same software configuration of the servers. (The sole difference would be the
- A working database connection of the same type. MySQL is recommended.
- All the servers must share the credentials.
- Floating IP which will be assigned to the leader.
- A shared filesystem.
The servers should be configured in the following way:
- Sunstone (with or without Apache/Passenger) running on all the nodes.
- Shared datastores must be mounted on all the nodes.
Bootstrapping the HA cluster¶
This section shows examples of all the steps required to deploy the HA Cluster.
To maintain a healthy cluster during the procedure of adding servers to the clusters, make sure you add only one server at a time.
In the following, each configuration step starts with (initial) Leader or (future) Follower to indicate the server where the step must be performed.
Configuration of the initial leader¶
We start with the first server, to perform the initial system bootstrapping.
- Leader: Start OpenNebula
- Leader: Add the server itself to the zone:
onezone list C ID NAME ENDPOINT * 0 OpenNebula http://localhost:2633/RPC2 # We are working on Zone 0 onezone server-add 0 --name server-0 --rpc http://192.168.150.1:2633/RPC2 # It's now available in the zone: onezone show 0 ZONE 0 INFORMATION ID : 0 NAME : OpenNebula ZONE SERVERS ID NAME ENDPOINT 0 server-0 http://192.168.150.1:2633/RPC2 HA & FEDERATION SYNC STATUS ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX 0 server-0 solo 0 -1 0 -1 -1 ZONE TEMPLATE ENDPOINT="http://localhost:2633/RPC2"
Floating IP should be used for zone endpoints and cluster private addresses for the zone server endpoints.
- Leader: Stop OpenNebula service and update
FEDERATION = [ MODE = "STANDALONE", ZONE_ID = 0, SERVER_ID = 0, # changed from -1 to 0 (as 0 is the server id) MASTER_ONED = "" ]
- Leader: [Optional] Enable the RAFT Hooks in
/etc/one/oned.conf. This will add a floating IP to the system.
# Executed when a server transits from follower->leader RAFT_LEADER_HOOK = [ COMMAND = "raft/vip.sh", ARGUMENTS = "leader eth0 10.3.3.2/24" ] # Executed when a server transits from leader->follower RAFT_FOLLOWER_HOOK = [ COMMAND = "raft/vip.sh", ARGUMENTS = "follower eth0 10.3.3.2/24" ]
- Leader: Start OpenNebula.
- Leader: Check the zone. The server is now the leader and has the floating IP:
onezone show 0 ZONE 0 INFORMATION ID : 0 NAME : OpenNebula ZONE SERVERS ID NAME ENDPOINT 0 server-0 http://192.168.150.1:2633/RPC2 HA & FEDERATION SYNC STATUS ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX 0 server-0 leader 1 3 3 -1 -1 ZONE TEMPLATE ENDPOINT="http://localhost:2633/RPC2" ip -o a sh eth0|grep 10.3.3.2/24 2: eth0 inet 10.3.3.2/24 scope global secondary eth0\ valid_lft forever preferred_lft forever
Adding more servers¶
This procedure will discard the OpenNebula database in the server you are adding and substitute it with the database of the initial leader.
Add only one host at a time. Repeat this process for every server you want to add.
- Leader: Create a DB backup in the initial leader and distribute it to the new server, along with the files in
onedb backup -u oneadmin -p oneadmin -d opennebula MySQL dump stored in /var/lib/one/mysql_localhost_opennebula_2017-6-1_11:52:47.sql Use 'onedb restore' or restore the DB using the mysql command: mysql -u user -h server -P port db_name < backup_file # Copy it to the other servers scp /var/lib/one/mysql_localhost_opennebula_2017-6-1_11:52:47.sql <ip>:/tmp # Copy the .one directory (make sure you preseve the owner: oneadmin) ssh <ip> rm -rf /var/lib/one/.one scp -r /var/lib/one/.one/ <ip>:/var/lib/one/
- Follower: Stop OpenNebula on the new server if it is running.
- Follower: Restore the database backup on the new server.
onedb restore -f -u oneadmin -p oneadmin -d opennebula /tmp/mysql_localhost_opennebula_2017-6-1_11:52:47.sql MySQL DB opennebula at localhost restored.
- Leader: Add the new server to OpenNebula (in the initial leader), and note the server id.
onezone server-add 0 --name server-1 --rpc http://192.168.150.2:2633/RPC2
- Leader: Check the zone. The new server is in the error state, since OpenNebula on the new server is still not running. Make a note of the server id, in this case 1.
onezone show 0 ZONE 0 INFORMATION ID : 0 NAME : OpenNebula ZONE SERVERS ID NAME ENDPOINT 0 server-0 http://192.168.150.1:2633/RPC2 1 server-1 http://192.168.150.2:2633/RPC2 HA & FEDERATION SYNC STATUS ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX 0 server-0 leader 1 19 19 -1 -1 1 server-1 error - - - - ZONE TEMPLATE ENDPOINT="http://localhost:2633/RPC2"
- Follower: Edit
/etc/one/oned.confon the new server to set the
SERVER_IDfor the new server. Make sure to enable the hooks as in the initial leader’s configuration.
- Follower: Start the OpenNebula service.
- Leader: Run onezone show 0 to make sure that the new server is in follower state.
onezone show 0 ZONE 0 INFORMATION ID : 0 NAME : OpenNebula ZONE SERVERS ID NAME ENDPOINT 0 server-0 http://192.168.150.1:2633/RPC2 1 server-1 http://192.168.150.2:2633/RPC2 HA & FEDERATION SYNC STATUS ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX 0 server-0 leader 1 21 19 -1 -1 1 server-1 follower 1 16 16 -1 -1 ZONE TEMPLATE ENDPOINT="http://localhost:2633/RPC2"
It may happen that the TERM/INDEX/COMMIT does not match (as above). This is not important right now; it will sync automatically when the database is changed.
Repeat this section to add new servers. Make sure that you only add servers when the cluster is in a healthy state. That means there is 1 leader and the rest are in follower state. If there is one server in error state, fix it before proceeding.
Checking Cluster Health¶
onezone show <id> to see if any of the servers are in error state. If they are in error state, check
/var/log/one/oned.log in both the current leader (if any) and in the host that is in error state. All Raft messages will be logged in that file.
If there is no leader in the cluster please review
/var/log/one/oned.log to make sure that there are no errors taking place.
Adding and Removing Servers¶
In order to add servers you need to use this command:
onezone server-add Command server-add requires one parameter to run ## USAGE server-add <zoneid> Add an OpenNebula server to this zone. valid options: server_name, server_rpc ## OPTIONS -n, --name Zone server name -r, --rpc Zone server RPC endpoint -v, --verbose Verbose mode -h, --help Show this message -V, --version Show version and copyright information --user name User name used to connect to OpenNebula --password password Password to authenticate with OpenNebula --endpoint endpoint URL of OpenNebula xmlrpc frontend
Make sure that there is one leader (by running
onezone show <id>), otherwise it will not work.
To remove a server, use the command:
onezone server-del Command server-del requires 2 parameters to run. ## USAGE server-del <zoneid> <serverid> Delete an OpenNebula server from this zone. ## OPTIONS -v, --verbose Verbose mode -h, --help Show this message -V, --version Show version and copyright information --user name User name used to connect to OpenNebula --password password Password to authenticate with OpenNebula --endpoint endpoint URL of OpenNebula xmlrpc frontend
The whole procedure is documented above.
When a follower is down for some time it may fall out of the recovery window, i.e. the log may not include all the records needed to bring it up-to-date. In order to recover this server you need to:
- Leader: Create a DB backup and copy it to the failed follower. Note that you cannot reuse a previous backup.
- Follower: Stop OpenNebula if it is running.
- Follower: Restore the DB backup from the leader.
- Follower: Start OpenNebula.
- Leader: Reset the failing follower with:
onezone server-reset <zone_id> <server_id_of_failed_follower>
There are several types of Sunstone deployment in an HA environment. The basic one is Sunstone running on each OpenNebula frontend node configured with the local OpenNebula. Only one server, the leader with floating IP, is used by the clients.
It is possible to configure a load balancer (e.g. HAProxy, Pound, Apache or Nginx) over the front-ends to spread the load (read operations) among the nodes. In this case, the Memcached and shared
/var/tmp/ may be required, please see Configuring Sunstone for Large Deployments.
To easy scale out beyond the total number of core OpenNebula daemons, Sunstone can be running on separate machines. They should talk to the cluster floating IP (see
sunstone-server.conf) and may also require Memcached and shared
/var/tmp/ between Sunstone and front-end nodes. Please check Configuring Sunstone for Large Deployments.
Raft Configuration Attributes¶
The Raft algorithm can be tuned by several parameters in the configuration file
/etc/one/oned.conf. The following options are available:
|Raft: Algorithm Attributes|
||Number of DB log records that will be deleted on each purge.|
||Number of DB log records kept, it determines the synchronization window across servers and extra storage space needed.|
||How often applied records are purged according the log retention value. (in seconds).|
||Timeout to start an election process if no heartbeat or log is received from the leader.|
||How often heartbeats are sent to followers.|
||To timeout raft-related API calls. To set an infinite timeout set this value to 0.|
Any change in these parameters can lead to unexpected behavior during the fail-over and result in whole-cluster malfunction. After any configuration change, always check the crash scenarios for the correct behavior.
Compatibility with the earlier HA¶
In OpenNebula <= 5.2, HA was configured using a classical active-passive approach, using Pacemaker and Corosync. While this still works for OpenNebula > 5.2, it is not the recommended way to set up a cluster. However, it is fine if you want to continue using that HA method when coming from earlier versions.
This is documented here: Front-end HA Setup.