1 =============================
2 Block Devices and OpenStack
3 =============================
5 .. index:: Ceph Block Device; OpenStack
7 You may use Ceph Block Device images with OpenStack through ``libvirt``, which
8 configures the QEMU interface to ``librbd``. Ceph stripes block device images as
9 objects across the cluster, which means that large Ceph Block Device images have
10 better performance than a standalone server!
12 To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``,
13 and OpenStack first. We recommend using a separate physical node for your
14 OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a
15 quad-core processor. The following diagram depicts the OpenStack/Ceph
19 .. ditaa:: +---------------------------------------------------+
21 +---------------------------------------------------+
23 +------------------------+--------------------------+
27 +---------------------------------------------------+
29 +---------------------------------------------------+
31 +---------------------------------------------------+
33 +------------------------+-+------------------------+
35 +------------------------+ +------------------------+
37 .. important:: To use Ceph Block Devices with OpenStack, you must have
38 access to a running Ceph Storage Cluster.
40 Three parts of OpenStack integrate with Ceph's block devices:
42 - **Images**: OpenStack Glance manages images for VMs. Images are immutable.
43 OpenStack treats images as binary blobs and downloads them accordingly.
45 - **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs,
46 or to attach volumes to running VMs. OpenStack manages volumes using
49 - **Guest Disks**: Guest disks are guest operating system disks. By default,
50 when you boot a virtual machine, its disk appears as a file on the filesystem
51 of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior
52 to OpenStack Havana, the only way to boot a VM in Ceph was to use the
53 boot-from-volume functionality of Cinder. However, now it is possible to boot
54 every virtual machine inside Ceph directly without using Cinder, which is
55 advantageous because it allows you to perform maintenance operations easily
56 with the live-migration process. Additionally, if your hypervisor dies it is
57 also convenient to trigger ``nova evacuate`` and run the virtual machine
58 elsewhere almost seamlessly.
60 You can use OpenStack Glance to store images in a Ceph Block Device, and you
61 can use Cinder to boot a VM using a copy-on-write clone of an image.
63 The instructions below detail the setup for Glance, Cinder and Nova, although
64 they do not have to be used together. You may store images in Ceph block devices
65 while running VMs using a local disk, or vice versa.
67 .. important:: Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
68 Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot
69 from volume), the Glance image format must be ``RAW``.
71 .. tip:: This document describes using Ceph Block Devices with OpenStack Havana.
72 For earlier versions of OpenStack see
73 `Block Devices and OpenStack (Dumpling)`_.
75 .. index:: pools; OpenStack
80 By default, Ceph block devices use the ``rbd`` pool. You may use any available
81 pool. We recommend creating a pool for Cinder and a pool for Glance. Ensure
82 your Ceph cluster is running, then create the pools. ::
84 ceph osd pool create volumes 128
85 ceph osd pool create images 128
86 ceph osd pool create backups 128
87 ceph osd pool create vms 128
89 See `Create a Pool`_ for detail on specifying the number of placement groups for
90 your pools, and `Placement Groups`_ for details on the number of placement
91 groups you should set for your pools.
93 Newly created pools must initialized prior to use. Use the ``rbd`` tool
94 to initialize the pools::
101 .. _Create a Pool: ../../rados/operations/pools#createpool
102 .. _Placement Groups: ../../rados/operations/placement-groups
105 Configure OpenStack Ceph Clients
106 ================================
108 The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and
109 ``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file::
111 ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
114 Install Ceph client packages
115 ----------------------------
117 On the ``glance-api`` node, you will need the Python bindings for ``librbd``::
119 sudo apt-get install python-rbd
120 sudo yum install python-rbd
122 On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node,
123 use both the Python bindings and the client command line tools::
125 sudo apt-get install ceph-common
126 sudo yum install ceph-common
129 Setup Ceph Client Authentication
130 --------------------------------
132 If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
133 and Glance. Execute the following::
135 ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images'
136 ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images'
137 ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups'
139 Add the keyrings for ``client.cinder``, ``client.glance``, and
140 ``client.cinder-backup`` to the appropriate nodes and change their ownership::
142 ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
143 ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
144 ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
145 ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
146 ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
147 ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
149 Nodes running ``nova-compute`` need the keyring file for the ``nova-compute``
152 ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
154 They also need to store the secret key of the ``client.cinder`` user in
155 ``libvirt``. The libvirt process needs it to access the cluster while attaching
156 a block device from Cinder.
158 Create a temporary copy of the secret key on the nodes running
161 ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
163 Then, on the compute nodes, add the secret key to ``libvirt`` and remove the
164 temporary copy of the key::
167 457eb676-33da-42ec-9a8c-9293d545c337
169 cat > secret.xml <<EOF
170 <secret ephemeral='no' private='no'>
171 <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
173 <name>client.cinder secret</name>
177 sudo virsh secret-define --file secret.xml
178 Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
179 sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
181 Save the uuid of the secret for configuring ``nova-compute`` later.
183 .. important:: You don't necessarily need the UUID on all the compute nodes.
184 However from a platform consistency perspective, it's better to keep the
187 .. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx
190 Configure OpenStack to use Ceph
191 ===============================
196 Glance can use multiple back ends to store images. To use Ceph block devices by
197 default, configure Glance like the following.
202 Edit ``/etc/glance/glance-api.conf`` and add under the ``[DEFAULT]`` section::
205 rbd_store_user = glance
206 rbd_store_pool = images
207 rbd_store_chunk_size = 8
213 Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
221 rbd_store_pool = images
222 rbd_store_user = glance
223 rbd_store_ceph_conf = /etc/ceph/ceph.conf
224 rbd_store_chunk_size = 8
226 .. important:: Glance has not completely moved to 'store' yet.
227 So we still need to configure the store in the DEFAULT section until Kilo.
232 Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
237 rbd_store_pool = images
238 rbd_store_user = glance
239 rbd_store_ceph_conf = /etc/ceph/ceph.conf
240 rbd_store_chunk_size = 8
242 For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/.
244 Enable copy-on-write cloning of images
245 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
247 Note that this exposes the back end location via Glance's API, so the endpoint
248 with this option enabled should not be publicly accessible.
250 Any OpenStack version except Mitaka
251 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
253 If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section::
255 show_image_direct_url = True
260 To enable image locations and take advantage of copy-on-write cloning for images, add under the ``[DEFAULT]`` section::
262 show_multiple_locations = True
263 show_image_direct_url = True
265 Disable cache management (any OpenStack version)
266 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268 Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``,
269 assuming your configuration file has ``flavor = keystone+cachemanagement``::
277 We recommend to use the following properties for your images:
279 - ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation
280 - ``hw_disk_bus=scsi``: connect every cinder block devices to that controller
281 - ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent
282 - ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent
288 OpenStack requires a driver to interact with Ceph block devices. You must also
289 specify the pool name for the block device. On your OpenStack node, edit
290 ``/etc/cinder/cinder.conf`` by adding::
294 enabled_backends = ceph
297 volume_driver = cinder.volume.drivers.rbd.RBDDriver
298 volume_backend_name = ceph
300 rbd_ceph_conf = /etc/ceph/ceph.conf
301 rbd_flatten_volume_from_snapshot = false
302 rbd_max_clone_depth = 5
303 rbd_store_chunk_size = 4
304 rados_connect_timeout = -1
305 glance_api_version = 2
307 If you are using `cephx authentication`_, also configure the user and uuid of
308 the secret you added to ``libvirt`` as documented earlier::
313 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
315 Note that if you are configuring multiple cinder back ends,
316 ``glance_api_version = 2`` must be in the ``[DEFAULT]`` section.
319 Configuring Cinder Backup
320 -------------------------
322 OpenStack Cinder Backup requires a specific daemon so don't forget to install it.
323 On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add::
325 backup_driver = cinder.backup.drivers.ceph
326 backup_ceph_conf = /etc/ceph/ceph.conf
327 backup_ceph_user = cinder-backup
328 backup_ceph_chunk_size = 134217728
329 backup_ceph_pool = backups
330 backup_ceph_stripe_unit = 0
331 backup_ceph_stripe_count = 0
332 restore_discard_excess_bytes = true
335 Configuring Nova to attach Ceph RBD block device
336 ------------------------------------------------
338 In order to attach Cinder devices (either normal block or by issuing a boot
339 from volume), you must tell Nova (and libvirt) which user and UUID to refer to
340 when attaching the device. libvirt will refer to this user when connecting and
341 authenticating with the Ceph cluster. ::
346 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
348 These two flags are also used by the Nova ephemeral backend.
354 In order to boot all the virtual machines directly into Ceph, you must
355 configure the ephemeral backend for Nova.
357 It is recommended to enable the RBD cache in your Ceph configuration file
358 (enabled by default since Giant). Moreover, enabling the admin socket
359 brings a lot of benefits while troubleshooting. Having one socket
360 per virtual machine using a Ceph block device will help investigating performance and/or wrong behaviors.
362 This socket can be accessed like this::
364 ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
366 Now on every compute nodes edit your Ceph configuration file::
370 rbd cache writethrough until flush = true
371 admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
372 log file = /var/log/qemu/qemu-guest-$pid.log
373 rbd concurrent management ops = 20
375 Configure the permissions of these paths::
377 mkdir -p /var/run/ceph/guests/ /var/log/qemu/
378 chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
380 Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system.
381 The provided example works for RedHat based systems.
383 .. tip:: If your virtual machine is already running you can simply restart it to get the socket
389 Havana and Icehouse require patches to implement copy-on-write cloning and fix
390 bugs with image size and live migration of ephemeral disks on rbd. These are
391 available in branches based on upstream Nova `stable/havana`_ and
392 `stable/icehouse`_. Using them is not mandatory but **highly recommended** in
393 order to take advantage of the copy-on-write clone functionality.
395 On every Compute node, edit ``/etc/nova/nova.conf`` and add::
397 libvirt_images_type = rbd
398 libvirt_images_rbd_pool = vms
399 libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
400 disk_cachemodes="network=writeback"
402 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
404 It is also a good practice to disable file injection. While booting an
405 instance, Nova usually attempts to open the rootfs of the virtual machine.
406 Then, Nova injects values such as password, ssh keys etc. directly into the
407 filesystem. However, it is better to rely on the metadata service and
410 On every Compute node, edit ``/etc/nova/nova.conf`` and add::
412 libvirt_inject_password = false
413 libvirt_inject_key = false
414 libvirt_inject_partition = -2
416 To ensure a proper live-migration, use the following flags::
418 libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
423 In Juno, Ceph block device was moved under the ``[libvirt]`` section.
424 On every Compute node, edit ``/etc/nova/nova.conf`` under the ``[libvirt]``
429 images_rbd_pool = vms
430 images_rbd_ceph_conf = /etc/ceph/ceph.conf
432 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
433 disk_cachemodes="network=writeback"
436 It is also a good practice to disable file injection. While booting an
437 instance, Nova usually attempts to open the rootfs of the virtual machine.
438 Then, Nova injects values such as password, ssh keys etc. directly into the
439 filesystem. However, it is better to rely on the metadata service and
442 On every Compute node, edit ``/etc/nova/nova.conf`` and add the following
443 under the ``[libvirt]`` section::
445 inject_password = false
447 inject_partition = -2
449 To ensure a proper live-migration, use the following flags (under the ``[libvirt]`` section)::
451 live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
456 Enable discard support for virtual machine ephemeral root disk::
461 hw_disk_discard = unmap # enable discard support (be careful of performance)
467 To activate the Ceph block device driver and load the block device pool name
468 into the configuration, you must restart OpenStack. Thus, for Debian based
469 systems execute these commands on the appropriate nodes::
471 sudo glance-control api restart
472 sudo service nova-compute restart
473 sudo service cinder-volume restart
474 sudo service cinder-backup restart
476 For Red Hat based systems execute::
478 sudo service openstack-glance-api restart
479 sudo service openstack-nova-compute restart
480 sudo service openstack-cinder-volume restart
481 sudo service openstack-cinder-backup restart
483 Once OpenStack is up and running, you should be able to create a volume
487 Booting from a Block Device
488 ===========================
490 You can create a volume from an image using the Cinder command line tool::
492 cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
494 Note that image must be RAW format. You can use `qemu-img`_ to convert
495 from one format to another. For example::
497 qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
498 qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
500 When Glance and Cinder are both using Ceph block devices, the image is a
501 copy-on-write clone, so it can create a new volume quickly. In the OpenStack
502 dashboard, you can boot from that volume by performing the following steps:
504 #. Launch a new instance.
505 #. Choose the image associated to the copy-on-write clone.
506 #. Select 'boot from volume'.
507 #. Select the volume you created.
509 .. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd
510 .. _Block Devices and OpenStack (Dumpling): http://docs.ceph.com/docs/dumpling/rbd/rbd-openstack
511 .. _stable/havana: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd
512 .. _stable/icehouse: https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse