]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rbd/rbd-openstack.rst
update sources to ceph Nautilus 14.2.1
[ceph.git] / ceph / doc / rbd / rbd-openstack.rst
CommitLineData
7c673cae
FG
1=============================
2 Block Devices and OpenStack
3=============================
4
5.. index:: Ceph Block Device; OpenStack
6
7You may use Ceph Block Device images with OpenStack through ``libvirt``, which
8configures the QEMU interface to ``librbd``. Ceph stripes block device images as
9objects across the cluster, which means that large Ceph Block Device images have
10better performance than a standalone server!
11
12To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``,
13and OpenStack first. We recommend using a separate physical node for your
14OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a
15quad-core processor. The following diagram depicts the OpenStack/Ceph
16technology stack.
17
18
19.. ditaa:: +---------------------------------------------------+
20 | OpenStack |
21 +---------------------------------------------------+
22 | libvirt |
23 +------------------------+--------------------------+
24 |
25 | configures
26 v
27 +---------------------------------------------------+
28 | QEMU |
29 +---------------------------------------------------+
30 | librbd |
31 +---------------------------------------------------+
32 | librados |
33 +------------------------+-+------------------------+
34 | OSDs | | Monitors |
35 +------------------------+ +------------------------+
36
37.. important:: To use Ceph Block Devices with OpenStack, you must have
38 access to a running Ceph Storage Cluster.
39
40Three parts of OpenStack integrate with Ceph's block devices:
41
42- **Images**: OpenStack Glance manages images for VMs. Images are immutable.
43 OpenStack treats images as binary blobs and downloads them accordingly.
44
45- **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs,
46 or to attach volumes to running VMs. OpenStack manages volumes using
47 Cinder services.
48
49- **Guest Disks**: Guest disks are guest operating system disks. By default,
50 when you boot a virtual machine, its disk appears as a file on the filesystem
51 of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior
52 to OpenStack Havana, the only way to boot a VM in Ceph was to use the
53 boot-from-volume functionality of Cinder. However, now it is possible to boot
54 every virtual machine inside Ceph directly without using Cinder, which is
55 advantageous because it allows you to perform maintenance operations easily
56 with the live-migration process. Additionally, if your hypervisor dies it is
57 also convenient to trigger ``nova evacuate`` and run the virtual machine
58 elsewhere almost seamlessly.
59
60You can use OpenStack Glance to store images in a Ceph Block Device, and you
61can use Cinder to boot a VM using a copy-on-write clone of an image.
62
63The instructions below detail the setup for Glance, Cinder and Nova, although
64they do not have to be used together. You may store images in Ceph block devices
65while running VMs using a local disk, or vice versa.
66
67.. important:: Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
68 Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot
69 from volume), the Glance image format must be ``RAW``.
70
71.. tip:: This document describes using Ceph Block Devices with OpenStack Havana.
72 For earlier versions of OpenStack see
73 `Block Devices and OpenStack (Dumpling)`_.
74
75.. index:: pools; OpenStack
76
77Create a Pool
78=============
79
80By default, Ceph block devices use the ``rbd`` pool. You may use any available
81pool. We recommend creating a pool for Cinder and a pool for Glance. Ensure
82your Ceph cluster is running, then create the pools. ::
83
84 ceph osd pool create volumes 128
85 ceph osd pool create images 128
86 ceph osd pool create backups 128
87 ceph osd pool create vms 128
88
89See `Create a Pool`_ for detail on specifying the number of placement groups for
90your pools, and `Placement Groups`_ for details on the number of placement
91groups you should set for your pools.
92
c07f9fc5
FG
93Newly created pools must initialized prior to use. Use the ``rbd`` tool
94to initialize the pools::
95
96 rbd pool init volumes
97 rbd pool init images
98 rbd pool init backups
99 rbd pool init vms
100
7c673cae
FG
101.. _Create a Pool: ../../rados/operations/pools#createpool
102.. _Placement Groups: ../../rados/operations/placement-groups
103
104
105Configure OpenStack Ceph Clients
106================================
107
108The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and
109``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file::
110
111 ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
112
113
114Install Ceph client packages
115----------------------------
116
c07f9fc5 117On the ``glance-api`` node, you will need the Python bindings for ``librbd``::
7c673cae
FG
118
119 sudo apt-get install python-rbd
120 sudo yum install python-rbd
121
122On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node,
123use both the Python bindings and the client command line tools::
124
125 sudo apt-get install ceph-common
126 sudo yum install ceph-common
127
128
129Setup Ceph Client Authentication
130--------------------------------
131
132If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
133and Glance. Execute the following::
134
c07f9fc5 135 ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images'
11fdf7f2 136 ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
c07f9fc5 137 ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups'
7c673cae
FG
138
139Add the keyrings for ``client.cinder``, ``client.glance``, and
140``client.cinder-backup`` to the appropriate nodes and change their ownership::
141
142 ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
143 ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
144 ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
145 ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
146 ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
147 ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
148
149Nodes running ``nova-compute`` need the keyring file for the ``nova-compute``
150process::
151
152 ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
153
154They also need to store the secret key of the ``client.cinder`` user in
155``libvirt``. The libvirt process needs it to access the cluster while attaching
156a block device from Cinder.
157
158Create a temporary copy of the secret key on the nodes running
159``nova-compute``::
160
161 ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
162
163Then, on the compute nodes, add the secret key to ``libvirt`` and remove the
164temporary copy of the key::
165
166 uuidgen
167 457eb676-33da-42ec-9a8c-9293d545c337
168
169 cat > secret.xml <<EOF
170 <secret ephemeral='no' private='no'>
171 <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
172 <usage type='ceph'>
173 <name>client.cinder secret</name>
174 </usage>
175 </secret>
176 EOF
177 sudo virsh secret-define --file secret.xml
178 Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
179 sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
180
181Save the uuid of the secret for configuring ``nova-compute`` later.
182
183.. important:: You don't necessarily need the UUID on all the compute nodes.
184 However from a platform consistency perspective, it's better to keep the
185 same UUID.
186
187.. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx
188
189
190Configure OpenStack to use Ceph
191===============================
192
193Configuring Glance
194------------------
195
196Glance can use multiple back ends to store images. To use Ceph block devices by
197default, configure Glance like the following.
198
199Prior to Juno
200~~~~~~~~~~~~~~
201
202Edit ``/etc/glance/glance-api.conf`` and add under the ``[DEFAULT]`` section::
203
204 default_store = rbd
205 rbd_store_user = glance
206 rbd_store_pool = images
207 rbd_store_chunk_size = 8
208
209
210Juno
211~~~~
212
213Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
214
215 [DEFAULT]
216 ...
217 default_store = rbd
218 ...
219 [glance_store]
220 stores = rbd
221 rbd_store_pool = images
222 rbd_store_user = glance
223 rbd_store_ceph_conf = /etc/ceph/ceph.conf
224 rbd_store_chunk_size = 8
225
226.. important:: Glance has not completely moved to 'store' yet.
227 So we still need to configure the store in the DEFAULT section until Kilo.
228
229Kilo and after
230~~~~~~~~~~~~~~
231
232Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
233
234 [glance_store]
235 stores = rbd
236 default_store = rbd
237 rbd_store_pool = images
238 rbd_store_user = glance
239 rbd_store_ceph_conf = /etc/ceph/ceph.conf
240 rbd_store_chunk_size = 8
241
242For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/.
243
244Enable copy-on-write cloning of images
245~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
246
247Note that this exposes the back end location via Glance's API, so the endpoint
248with this option enabled should not be publicly accessible.
249
250Any OpenStack version except Mitaka
251^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
252
253If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section::
254
255 show_image_direct_url = True
256
257For Mitaka only
258^^^^^^^^^^^^^^^
259
260To enable image locations and take advantage of copy-on-write cloning for images, add under the ``[DEFAULT]`` section::
261
262 show_multiple_locations = True
263 show_image_direct_url = True
264
265Disable cache management (any OpenStack version)
266~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267
268Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``,
269assuming your configuration file has ``flavor = keystone+cachemanagement``::
270
271 [paste_deploy]
272 flavor = keystone
273
274Image properties
275~~~~~~~~~~~~~~~~
276
277We recommend to use the following properties for your images:
278
279- ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation
280- ``hw_disk_bus=scsi``: connect every cinder block devices to that controller
281- ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent
282- ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent
283
284
285Configuring Cinder
286------------------
287
288OpenStack requires a driver to interact with Ceph block devices. You must also
289specify the pool name for the block device. On your OpenStack node, edit
290``/etc/cinder/cinder.conf`` by adding::
291
292 [DEFAULT]
293 ...
294 enabled_backends = ceph
11fdf7f2 295 glance_api_version = 2
7c673cae
FG
296 ...
297 [ceph]
298 volume_driver = cinder.volume.drivers.rbd.RBDDriver
299 volume_backend_name = ceph
300 rbd_pool = volumes
301 rbd_ceph_conf = /etc/ceph/ceph.conf
302 rbd_flatten_volume_from_snapshot = false
303 rbd_max_clone_depth = 5
304 rbd_store_chunk_size = 4
305 rados_connect_timeout = -1
7c673cae 306
c07f9fc5 307If you are using `cephx authentication`_, also configure the user and uuid of
7c673cae
FG
308the secret you added to ``libvirt`` as documented earlier::
309
310 [ceph]
311 ...
312 rbd_user = cinder
313 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
314
315Note that if you are configuring multiple cinder back ends,
316``glance_api_version = 2`` must be in the ``[DEFAULT]`` section.
317
318
319Configuring Cinder Backup
320-------------------------
321
322OpenStack Cinder Backup requires a specific daemon so don't forget to install it.
323On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add::
324
325 backup_driver = cinder.backup.drivers.ceph
326 backup_ceph_conf = /etc/ceph/ceph.conf
327 backup_ceph_user = cinder-backup
328 backup_ceph_chunk_size = 134217728
329 backup_ceph_pool = backups
330 backup_ceph_stripe_unit = 0
331 backup_ceph_stripe_count = 0
332 restore_discard_excess_bytes = true
333
334
335Configuring Nova to attach Ceph RBD block device
336------------------------------------------------
337
338In order to attach Cinder devices (either normal block or by issuing a boot
339from volume), you must tell Nova (and libvirt) which user and UUID to refer to
340when attaching the device. libvirt will refer to this user when connecting and
341authenticating with the Ceph cluster. ::
342
343 [libvirt]
344 ...
345 rbd_user = cinder
346 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
347
348These two flags are also used by the Nova ephemeral backend.
349
350
351Configuring Nova
352----------------
353
354In order to boot all the virtual machines directly into Ceph, you must
355configure the ephemeral backend for Nova.
356
357It is recommended to enable the RBD cache in your Ceph configuration file
358(enabled by default since Giant). Moreover, enabling the admin socket
359brings a lot of benefits while troubleshooting. Having one socket
360per virtual machine using a Ceph block device will help investigating performance and/or wrong behaviors.
361
362This socket can be accessed like this::
363
364 ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
365
366Now on every compute nodes edit your Ceph configuration file::
367
368 [client]
369 rbd cache = true
370 rbd cache writethrough until flush = true
371 admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
372 log file = /var/log/qemu/qemu-guest-$pid.log
373 rbd concurrent management ops = 20
374
375Configure the permissions of these paths::
376
377 mkdir -p /var/run/ceph/guests/ /var/log/qemu/
378 chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
379
380Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system.
381The provided example works for RedHat based systems.
382
383.. tip:: If your virtual machine is already running you can simply restart it to get the socket
384
385
386Havana and Icehouse
387~~~~~~~~~~~~~~~~~~~
388
389Havana and Icehouse require patches to implement copy-on-write cloning and fix
390bugs with image size and live migration of ephemeral disks on rbd. These are
391available in branches based on upstream Nova `stable/havana`_ and
392`stable/icehouse`_. Using them is not mandatory but **highly recommended** in
393order to take advantage of the copy-on-write clone functionality.
394
395On every Compute node, edit ``/etc/nova/nova.conf`` and add::
396
397 libvirt_images_type = rbd
398 libvirt_images_rbd_pool = vms
399 libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
400 disk_cachemodes="network=writeback"
401 rbd_user = cinder
402 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
403
404It is also a good practice to disable file injection. While booting an
405instance, Nova usually attempts to open the rootfs of the virtual machine.
406Then, Nova injects values such as password, ssh keys etc. directly into the
407filesystem. However, it is better to rely on the metadata service and
408``cloud-init``.
409
410On every Compute node, edit ``/etc/nova/nova.conf`` and add::
411
412 libvirt_inject_password = false
413 libvirt_inject_key = false
414 libvirt_inject_partition = -2
415
416To ensure a proper live-migration, use the following flags::
417
418 libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
419
420Juno
421~~~~
422
423In Juno, Ceph block device was moved under the ``[libvirt]`` section.
424On every Compute node, edit ``/etc/nova/nova.conf`` under the ``[libvirt]``
425section and add::
426
427 [libvirt]
428 images_type = rbd
429 images_rbd_pool = vms
430 images_rbd_ceph_conf = /etc/ceph/ceph.conf
431 rbd_user = cinder
432 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
433 disk_cachemodes="network=writeback"
434
435
436It is also a good practice to disable file injection. While booting an
437instance, Nova usually attempts to open the rootfs of the virtual machine.
438Then, Nova injects values such as password, ssh keys etc. directly into the
439filesystem. However, it is better to rely on the metadata service and
440``cloud-init``.
441
442On every Compute node, edit ``/etc/nova/nova.conf`` and add the following
443under the ``[libvirt]`` section::
444
445 inject_password = false
446 inject_key = false
447 inject_partition = -2
448
449To ensure a proper live-migration, use the following flags (under the ``[libvirt]`` section)::
450
451 live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
452
453Kilo
454~~~~
455
456Enable discard support for virtual machine ephemeral root disk::
457
458 [libvirt]
459 ...
460 ...
461 hw_disk_discard = unmap # enable discard support (be careful of performance)
462
463
464Restart OpenStack
465=================
466
467To activate the Ceph block device driver and load the block device pool name
468into the configuration, you must restart OpenStack. Thus, for Debian based
469systems execute these commands on the appropriate nodes::
470
471 sudo glance-control api restart
472 sudo service nova-compute restart
473 sudo service cinder-volume restart
474 sudo service cinder-backup restart
475
476For Red Hat based systems execute::
477
478 sudo service openstack-glance-api restart
479 sudo service openstack-nova-compute restart
480 sudo service openstack-cinder-volume restart
481 sudo service openstack-cinder-backup restart
482
483Once OpenStack is up and running, you should be able to create a volume
484and boot from it.
485
486
487Booting from a Block Device
488===========================
489
490You can create a volume from an image using the Cinder command line tool::
491
492 cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
493
494Note that image must be RAW format. You can use `qemu-img`_ to convert
495from one format to another. For example::
496
497 qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
498 qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
499
500When Glance and Cinder are both using Ceph block devices, the image is a
501copy-on-write clone, so it can create a new volume quickly. In the OpenStack
502dashboard, you can boot from that volume by performing the following steps:
503
504#. Launch a new instance.
505#. Choose the image associated to the copy-on-write clone.
506#. Select 'boot from volume'.
507#. Select the volume you created.
508
509.. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd
510.. _Block Devices and OpenStack (Dumpling): http://docs.ceph.com/docs/dumpling/rbd/rbd-openstack
511.. _stable/havana: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd
512.. _stable/icehouse: https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse