OpenNebula HA Setup

This guide walks you through the process of setting a high available cluster for OpenNebula core services: core (oned), scheduler (mm_sched).

OpenNebula uses a distributed consensus protocol to provide fault-tolerance and state consistency across OpenNebula services. In this section, you learn the basics of how to bootstrap and operate an OpenNebula distributed cluster.

Warning

If you are interested in fail-over protection against hardware and operating system outages within your virtualized IT environment, check the Virtual Machines High Availability Guide.

Raft Overview

This section covers some internals on how OpenNebula implements Raft. You do not need to know these details to effectively operate OpenNebula on HA. These details are provided for those who wish to learn about them to fine tune their deployments.

A consensus algorithm is built around two concepts:

  • System State, in OpenNebula the system state is the data stored in the database tables (users, ACLs, or the VMs in the system).
  • Log, a sequence of SQL statements that are consistently applied to the OpenNebula DB in all servers to evolve the system state.

To preserve a consistent view of the system across servers, modifications to system state are performed through a special node, the leader. The servers in the OpenNebula cluster elects a single node to be the leader. The leader periodically sends heartbeats to the other servers, the followers, to keep its leadership. If a leader fails to send the heartbeat, followers promote to candidates and start a new election.

Whenever the system is modified (e.g. a new VM is added to the system), the leader updates the log and replicates the entry in a majority of followers before actually writing it to the database. The latency of DB operations are thus increased, but the system state is safely replicated, and the cluster can continue its operation in case of node failure.

In OpenNebula, read-only operations can be performed through any oned server in the cluster; this means that reads can be arbitrarily stale but generally within the round-trip time of the network.

Requirements and Architecture

The recommended deployment size is either 3 or 5 servers, which provides a fault-tolerance for 1 or 2 server failures, respectively. You can add, replace or remove servers once the cluster is up and running.

Every HA cluster requires:

  • Odd number of servers (3 is recommended).
  • Recommended identical servers capacity.
  • Same software configuration of the servers (the sole difference would be the SERVER_ID field in /etc/one/oned.conf).
  • Working database connection of the same type, MySQL is recommended.
  • All the servers must share the credentials.
  • Floating IP which will be assigned to the leader.
  • Shared filesystem.

The servers should be configured in the following way:

  • Sunstone (with or without Apache/Passenger) running on all the nodes.
  • Shared datastores must be mounted on all the nodes.

Bootstrapping the HA cluster

This section shows on examples all steps required to deploy the HA Cluster.

Warning

To maintain a healthy cluster during the procedure of adding servers to the clusters, make sure you add only one server at a time

Important

In the following, each configuration step starts with (initial) Leader or (future) Follower to indicate the server where the step must be performed.

Configuration of the initial leader

We start with the first server, to perform the initial system bootstrapping.

  • Leader: Start OpenNebula
  • Leader: Add the server itself to the zone:
onezone list
C    ID NAME                      ENDPOINT
*     0 OpenNebula                http://localhost:2633/RPC2

# We are working on Zone 0
onezone server-add 0 --name server-0 --rpc http://192.168.150.1:2633/RPC2

# It's now available in the zone:
onezone show 0
ZONE 0 INFORMATION
ID                : 0
NAME              : OpenNebula


ZONE SERVERS
ID NAME            ENDPOINT
 0 server-0        http://192.168.150.1:2633/RPC2

HA & FEDERATION SYNC STATUS
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX
 0 server-0        solo       0          -1         0          -1    -1

ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"

Important

Floating IP should be used for zone endpoints and cluster private addresses for the zone server endpoints.

  • Leader: Stop OpenNebula service and update SERVER_ID in /etc/one/oned.conf
FEDERATION = [
    MODE          = "STANDALONE",
    ZONE_ID       = 0,
    SERVER_ID     = 0, # changed from -1 to 0 (as 0 is the server id)
    MASTER_ONED   = ""
]
  • Leader: [Optional] Enable the RAFT Hooks in /etc/one/oned.conf. This will add a floating IP to the system.
# Executed when a server transits from follower->leader
RAFT_LEADER_HOOK = [
     COMMAND = "raft/vip.sh",
     ARGUMENTS = "leader eth0 10.3.3.2/24"
]

# Executed when a server transits from leader->follower
RAFT_FOLLOWER_HOOK = [
    COMMAND = "raft/vip.sh",
    ARGUMENTS = "follower eth0 10.3.3.2/24"
]
  • Leader: Start OpenNebula.
  • Leader: Check the zone, the server is now the leader and has the floating IP:
onezone show 0
ZONE 0 INFORMATION
ID                : 0
NAME              : OpenNebula


ZONE SERVERS
ID NAME            ENDPOINT
 0 server-0        http://192.168.150.1:2633/RPC2

HA & FEDERATION SYNC STATUS
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX
 0 server-0        leader     1          3          3          -1    -1

ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"
ip -o a sh eth0|grep 10.3.3.2/24
2: eth0    inet 10.3.3.2/24 scope global secondary eth0\       valid_lft forever preferred_lft forever

Adding more servers

Warning

This procedure will discard the OpenNebula database in the server you are adding and substitute it with the database of the initial leader.

Warning

Add only one host at a time. Repeat this process for every server you want to add.

  • Leader: Create a DB backup in the initial leader and distribute it to the new server, along with the files in /var/lib/one/.one/:
onedb backup -u oneadmin -p oneadmin -d opennebula
MySQL dump stored in /var/lib/one/mysql_localhost_opennebula_2017-6-1_11:52:47.sql
Use 'onedb restore' or restore the DB using the mysql command:
mysql -u user -h server -P port db_name < backup_file

# Copy it to the other servers
scp /var/lib/one/mysql_localhost_opennebula_2017-6-1_11:52:47.sql <ip>:/tmp

# Copy the .one directory (make sure you preseve the owner: oneadmin)
ssh <ip> rm -rf /var/lib/one/.one
scp -r /var/lib/one/.one/ <ip>:/var/lib/one/
  • Follower: Stop OpenNebula on the new server if it is running.
  • Follower: Restore the database backup on the new server.
onedb restore -f -u oneadmin -p oneadmin -d opennebula /tmp/mysql_localhost_opennebula_2017-6-1_11:52:47.sql
MySQL DB opennebula at localhost restored.
  • Leader: Add the new server to OpenNebula (in the initial leader), and note the server id.
onezone server-add 0 --name server-1 --rpc http://192.168.150.2:2633/RPC2
  • Leader: Check the zone, the new server is in error state, since OpenNebula on the new server is still not running. Make a note of the server id, in this case it is 1.
onezone show 0
ZONE 0 INFORMATION
ID                : 0
NAME              : OpenNebula


ZONE SERVERS
ID NAME            ENDPOINT
 0 server-0        http://192.168.150.1:2633/RPC2
 1 server-1        http://192.168.150.2:2633/RPC2

HA & FEDERATION SYNC STATUS
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX
 0 server-0        leader     1          19         19         -1    -1
 1 server-1        error      -          -          -          -

ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"
  • Follower: Edit /etc/one/oned.conf on the new server to set the SERVER_ID for the new server. Make sure to enable the hooks as in the initial leader’s configuration.
  • Follower: Start OpenNebula service.
  • Leader: Run onezone show 0 to make sure that the new server is in follower state.
onezone show 0
ZONE 0 INFORMATION
ID                : 0
NAME              : OpenNebula


ZONE SERVERS
ID NAME            ENDPOINT
 0 server-0        http://192.168.150.1:2633/RPC2
 1 server-1        http://192.168.150.2:2633/RPC2

HA & FEDERATION SYNC STATUS
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX
 0 server-0        leader     1          21         19         -1    -1
 1 server-1        follower   1          16         16         -1    -1

ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"

Note

It may happen the TERM/INDEX/COMMIT does not match (like above). This is not important right now; it will sync automatically when the database is changed.

Repeat this section to add new servers. Make sure that you only add servers when the cluster is in a healthy state. That means there is 1 leader and the rest are in follower state. If there is one server in error state, fix it before proceeding.

Checking Cluster Health

Execute onezone show <id> to see if any of the servers are in error state. If they are in error state, check /var/log/one/oned.log in both the current leader (if any) and in the host that is in error state. All Raft messages will be logged in that file.

If there is no leader in the cluster please review /var/log/one/oned.log to make sure that there are no errors taking place.

Adding and Removing Servers

In order to add servers you need to use this command:

onezone server-add
Command server-add requires one parameter to run
## USAGE
server-add <zoneid>
        Add an OpenNebula server to this zone.
        valid options: server_name, server_rpc

## OPTIONS
     -n, --name                Zone server name
     -r, --rpc                 Zone server RPC endpoint
     -v, --verbose             Verbose mode
     -h, --help                Show this message
     -V, --version             Show version and copyright information
     --user name               User name used to connect to OpenNebula
     --password password       Password to authenticate with OpenNebula
     --endpoint endpoint       URL of OpenNebula xmlrpc frontend

Make sure that there is one leader (by running onezone show <id>), otherwise it will not work.

To remove a server, use the command:

onezone server-del
Command server-del requires 2 parameters to run.
## USAGE
server-del <zoneid> <serverid>
        Delete an OpenNebula server from this zone.

## OPTIONS
     -v, --verbose             Verbose mode
     -h, --help                Show this message
     -V, --version             Show version and copyright information
     --user name               User name used to connect to OpenNebula
     --password password       Password to authenticate with OpenNebula
     --endpoint endpoint       URL of OpenNebula xmlrpc frontend

The whole procedure is documented above.

Recovering servers

When a follower is down for some time it may fall out of the recovery window, i.e. the log may not include all the records needed to bring it up-to-date. In order to recover this server you need to:

  • Leader: Create a DB backup and copy it to the failed follower. Note that you cannot reuse a previous backup.
  • Follower: Stop OpenNebula if running.
  • Follower: Restore the DB backup from the leader.
  • Follower: Start OpenNebula.
  • Leader: Reset the failing follower with:
onezone server-reset <zone_id> <server_id_of_failed_follower>

Shared data between HA nodes

HA deployment requires the filesystem view of most datastores (by default in /var/lib/one/datastores/) to be same on all front-ends. It is necessary to setup a shared filesystem over the datastore directories. This document does not cover configuration and deployment of the shared filesystem; it is left completely up to the cloud administrator.

OpenNebula stores virtual machine logs inside /var/log/one/ as files named ${VMID}.log. It is not recommended to share the whole log directory between the front-ends as there are also other OpenNebula logs which would be randomly overwritten. It is up to the cloud administrator to periodically backup the virtual machine logs on cluster leader and on fail-over to restore from the backup on a new leader (e.g. as part of the raft hook).

Sunstone

There are several types of the Sunstone deployment in HA environment. The basic one is Sunstone running on each OpenNebula front-end node configured with the local OpenNebula. Only one server, the leader with floating IP, is used by the clients.

It is possible to configure a load balancer (e.g. HAProxy, Pound, Apache or Nginx) over the front-ends to spread the load (read operations) among the nodes. In this case, the Memcached and shared /var/tmp/ may be required, please see Configuring Sunstone for Large Deployments.

To easy scale out beyond the total number of core OpenNebula daemons, Sunstone can be running on separate machines. They should talk to the cluster floating IP (see :one_xmlprc: in sunstone-server.conf) and may also require Memcached and shared /var/tmp/ between Sunstone and front-end nodes. Please check Configuring Sunstone for Large Deployments.

Raft Configuration Attributes

The Raft algorithm can be tuned by several parameters in the configuration file /etc/one/oned.conf. Following options are available:

Raft: Algorithm Attributes
LIMIT_PURGE Number of DB log records that will be deleted on each purge.
LOG_RETENTION Number of DB log records kept, it determines the synchronization window across servers and extra storage space needed.
LOG_PURGE_TIMEOUT How often applied records are purged according the log retention value. (in seconds).
ELECTION_TIMEOUT_MS Timeout to start a election process if no heartbeat or log is received from leader.
BROADCAST_TIMEOUT_MS How often heartbeats are sent to followers.
XMLRPC_TIMEOUT_MS To timeout raft related API calls. To set an infinite timeout set this value to 0.

Warning

Any change in these parameters can lead to the unexpected behavior during the fail-over and result in whole cluster malfunction. After any configuration change, always check the crash scenarios for the correct behavior.

Compatibility with the earlier HA

In OpenNebula <= 5.2, HA was configured using a classical active-passive approach, using Pacemaker and Corosync. While this still works for OpenNebula > 5.2, it is not the recommended way to set up a cluster. However, it is fine if you want to continue using that HA coming from earlier versions.

This is documented here: Front-end HA Setup.