The Ceph datastore driver provides OpenNebula users with the possibility of using Ceph block devices as their Virtual Images.
This driver requires that the OpenNebula nodes using the Ceph driver must be Ceph clients of a running Ceph cluster. More information in Ceph documentation.
Images are stored in a Ceph pool, named after its OpenNebula id
one-<IMAGE ID>. Virtual machine disks are stored by default in the same pool (Ceph Mode). You can also choose to export the Image rbd to the hypervisor local storage using the SSH Mode.
Only is necessary to register each image once, then it can be deployed using any mode, ceph or ssh.
Ceph Mode (Default)¶
In this mode, virtual machines will use the same Image rbd volumes for its disks (persistent images), or a new snapshots of the image created in the form
one-<IMAGE ID>-<VM ID>-<DISK ID> (non-persistent images).
For example, consider a system using an Image and System Datastore backed by a Ceph pool named
one. The pool with one Image (ID 0) and two Virtual Machines 14 and 15 using this Image as virtual disk 0 would be similar to:
rbd ls -l -p one --id libvirt NAME SIZE PARENT FMT PROT LOCK one-0 10240M 2 one-0@snap 10240M 2 yes one-0-14-0 10240M one/one-0@snap 2 one-0-15-0 10240M one/one-0@snap 2
In this case context disk and auxiliar files (deployment description and checkpoints) are stored locally in the nodes.
In this mode, the associated rbd file for each disk is exported to a file and stored in the local file system of the hypervisor.
For example, in the previous example if the VM 14 is set to be deployed in a SSH system datastore (e.g. 100), the layout of the datastore in the hypervisor would be similar to:
ls -l /var/lib/one/datastores/100/14 total 609228 -rw-rw-r-- 1 oneadmin oneadmin 1020 Dec 20 14:41 deployment.0 -rw-r--r-- 1 oneadmin oneadmin 10737418240 Dec 20 15:19 disk.0 -rw-rw-r-- 1 oneadmin oneadmin 372736 Dec 20 14:41 disk.1
In this case disk.0 is generated with a command similar to
rbd export one/one-0@snap disk.0
Ceph Cluster Setup¶
This guide assumes that you already have a functional Ceph cluster in place. Additionally you need to:
- Create a pool for the OpenNebula datastores. Write down the name of the pool to include it in the datastore definitions.
ceph osd pool create one 128 ceph osd lspools 0 data,1 metadata,2 rbd,6 one,
- Define a Ceph user to access the datastore pool; this user will also be used by libvirt to access the disk images. For example, create a user
On the Ceph Jewel (v10.2.x) and before:
ceph auth get-or-create client.libvirt \ mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=one'
On the Ceph Luminous (v12.2.x) and later:
ceph auth get-or-create client.libvirt \ mon 'profile rbd' osd 'profile rbd pool=one'
Ceph Luminous release comes with simplified RBD capabilities (more information about user management and authorization capabilities is in the Ceph documentation). When upgrading existing Ceph deployment to the Luminous and later, please ensure the selected user has proper new capabilities. For example, for above user
libvirt by running:
ceph auth caps client.libvirt \ mon 'profile rbd' osd 'profile rbd pool=one'
- Get a copy of the key of this user to distribute it later to the OpenNebula nodes.
ceph auth get-key client.libvirt | tee client.libvirt.key ceph auth get client.libvirt -o ceph.client.libvirt.keyring
- Although RBD format 1 is supported it is strongly recommended to use Format 2. Check that
[global] rbd_default_format = 2
- Pick a set of client nodes of the cluster to act as storage bridges. These nodes will be used to import images into the Ceph Cluster from OpenNebula. These nodes must have
For production environments it is recommended to not co-allocate ceph services (monitor, osds) with OpenNebula nodes or front-end
Frontend and Node Setup¶
In order to use the Ceph cluster the nodes need to be configured as follows:
- The ceph client tools must be available in the machine
mondaemon must be defined in the
ceph.conffor all the nodes, so
portdoesn’t need to be specified explicitly in any Ceph command.
- Copy the Ceph user keyring (
ceph.client.libvirt.keyring) to the nodes under
/etc/ceph, and the user key (
client.libvirt.key) to the oneadmin home.
scp ceph.client.libvirt.keyring root@node:/etc/ceph scp client.libvirt.key oneadmin@node:
Nodes need extra steps to setup credentials in libvirt:
- Generate a secret for the Ceph user and copy it to the nodes under oneadmin home. Write down the
UUIDfor later use.
UUID=`uuidgen`; echo $UUID c7bdeabf-5f2a-4094-9413-58c6a9590980 cat > secret.xml <<EOF <secret ephemeral='no' private='no'> <uuid>$UUID</uuid> <usage type='ceph'> <name>client.libvirt secret</name> </usage> </secret> EOF scp secret.xml oneadmin@node:
- Define the a libvirt secret and remove key files in the nodes:
virsh -c qemu:///system secret-define secret.xml virsh -c qemu:///system secret-set-value --secret $UUID --base64 $(cat client.libvirt.key) rm client.libvirt.key
oneadminaccount needs to access the Ceph Cluster using the
libvirtCeph user defined above. This requires access to the ceph user keyring. Test that Ceph client is properly configured in the node.
ssh oneadmin@node rbd ls -p one --id libvirt
You can read more information about this in the Ceph guide Using libvirt with Ceph.
- Ancillary virtual machine files like context disks, deployment and checkpoint files are created at the nodes under
/var/lib/one/datastores/, make sure that enough storage for these files is provisioned in the nodes.
- If you are going to use the SSH mode, you have to take into account the space needed for the system datastore
ds_idis the ID of the System Datastore.
To use your Ceph cluster with the OpenNebula, you need to define a System and Image datastores. Each Image/System Datastore pair will share same following Ceph configuration attributes:
||The name of the datastore||YES|
||The Ceph pool name||YES|
||The Ceph user name, used by libvirt and rbd commands.||YES|
||Key file for user, if not set default locations are used||NO|
||Non default ceph configuration file if needed.||NO|
||By default RBD Format 2 will be used.||NO|
||List of storage bridges to access the Ceph cluster||YES|
||Space-separated list of Ceph monitors. Example:
||The UUID of the libvirt secret.||YES|
||Name of Ceph erasure coded pool||NO|
||Enables trash feature on given datastore (Luminous+), values: yes|no||NO|
You may add another Image and System Datastores pointing to other pools with different allocation/replication policies in Ceph.
Ceph Luminous release allows use of erasure coding for
RBD images. In general, erasure coded images take up less space, but have worse I/O performance. Erasure coding can be enabled on Image and/or System Datastores by configuring
EC_POOL_NAME with the name of the erasure coded data pool. Regular replicated Ceph pool
POOL_NAME is still required for image metadata. More information in Ceph documentation.
Create a System Datastore¶
System Datastore also requires these attributes:
Create a System Datastore in Sunstone or through the CLI, for example:
cat systemds.txt NAME = ceph_system TM_MAD = ceph TYPE = SYSTEM_DS POOL_NAME = one CEPH_HOST = "host1 host2:port2" CEPH_USER = libvirt CEPH_SECRET = "6f88b54b-5dae-41fe-a43e-b2763f601cfc" BRIDGE_LIST = cephfrontend onedatastore create systemds.txt ID: 101
When different system datastore are available the TM_MAD_SYSTEM attribute will be set after picking the datastore.
Create an Image Datastore¶
Apart from the previous attributes, that need to be the same as the associated System Datastore, the following can be set for an Image Datastore:
||The name of the datastore||YES|
||Default path for image operations in the bridges||NO|
An example of datastore:
> cat ds.conf NAME = "cephds" DS_MAD = ceph TM_MAD = ceph DISK_TYPE = RBD POOL_NAME = one CEPH_HOST = "host1 host2:port2" CEPH_USER = libvirt CEPH_SECRET = "6f88b54b-5dae-41fe-a43e-b2763f601cfc" BRIDGE_LIST = cephfrontend > onedatastore create ds.conf ID: 101
If you are going to use the TM_MAD_SYSTEM attribute with SSH mode, you need to have an SSH type system Datastore configured.
See quotas for more information about quotas over Ceph backend storage.
Default values for the Ceph drivers can be set in
POOL_NAME: Default volume group
STAGING_DIR: Default path for image operations in the storage bridges
RBD_FORMAT: Default format for RBD volumes.
DD_BLOCK_SIZE: Block size for dd operations (default: 64kB).
Using different modes¶
When creating a VM Template you can choose to deploy the disks using the default Ceph mode or the SSH on. Note that the same mode will be used for all disks of the VM. To set the deployment mode add the following attribute to the VM template:
When using Sunstone, the deployment mode needs to be set in Storage tab.