]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ======================== |
2 | QEMU and Block Devices | |
3 | ======================== | |
4 | ||
5 | .. index:: Ceph Block Device; QEMU KVM | |
6 | ||
7 | The most frequent Ceph Block Device use case involves providing block device | |
8 | images to virtual machines. For example, a user may create a "golden" image | |
f67539c2 TL |
9 | with an OS and any relevant software in an ideal configuration. Then the user |
10 | takes a snapshot of the image. Finally the user clones the snapshot (potentially | |
7c673cae FG |
11 | many times). See `Snapshots`_ for details. The ability to make copy-on-write |
12 | clones of a snapshot means that Ceph can provision block device images to | |
f67539c2 | 13 | virtual machines quickly, because the client doesn't have to download the entire |
7c673cae FG |
14 | image each time it spins up a new virtual machine. |
15 | ||
16 | ||
f91f0fd5 TL |
17 | .. ditaa:: |
18 | ||
19 | +---------------------------------------------------+ | |
7c673cae FG |
20 | | QEMU | |
21 | +---------------------------------------------------+ | |
22 | | librbd | | |
23 | +---------------------------------------------------+ | |
24 | | librados | | |
25 | +------------------------+-+------------------------+ | |
26 | | OSDs | | Monitors | | |
27 | +------------------------+ +------------------------+ | |
28 | ||
29 | ||
f67539c2 | 30 | Ceph Block Devices attach to QEMU virtual machines. For details on |
7c673cae FG |
31 | QEMU, see `QEMU Open Source Processor Emulator`_. For QEMU documentation, see |
32 | `QEMU Manual`_. For installation details, see `Installation`_. | |
33 | ||
34 | .. important:: To use Ceph Block Devices with QEMU, you must have access to a | |
35 | running Ceph cluster. | |
36 | ||
37 | ||
38 | Usage | |
39 | ===== | |
40 | ||
f67539c2 TL |
41 | The QEMU command line expects you to specify the Ceph pool and image name. You |
42 | may also specify a snapshot. | |
7c673cae | 43 | |
f67539c2 | 44 | QEMU will assume that Ceph configuration resides in the default |
7c673cae FG |
45 | location (e.g., ``/etc/ceph/$cluster.conf``) and that you are executing |
46 | commands as the default ``client.admin`` user unless you expressly specify | |
47 | another Ceph configuration file path or another user. When specifying a user, | |
48 | QEMU uses the ``ID`` rather than the full ``TYPE:ID``. See `User Management - | |
49 | User`_ for details. Do not prepend the client type (i.e., ``client.``) to the | |
50 | beginning of the user ``ID``, or you will receive an authentication error. You | |
51 | should have the key for the ``admin`` user or the key of another user you | |
52 | specify with the ``:id={user}`` option in a keyring file stored in default path | |
53 | (i.e., ``/etc/ceph`` or the local directory with appropriate file ownership and | |
54 | permissions. Usage takes the following form:: | |
55 | ||
56 | qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...] | |
57 | ||
58 | For example, specifying the ``id`` and ``conf`` options might look like the following:: | |
59 | ||
60 | qemu-img {command} [options] rbd:glance-pool/maipo:id=glance:conf=/etc/ceph/ceph.conf | |
61 | ||
62 | .. tip:: Configuration values containing ``:``, ``@``, or ``=`` can be escaped with a | |
63 | leading ``\`` character. | |
64 | ||
65 | ||
66 | Creating Images with QEMU | |
67 | ========================= | |
68 | ||
69 | You can create a block device image from QEMU. You must specify ``rbd``, the | |
70 | pool name, and the name of the image you wish to create. You must also specify | |
71 | the size of the image. :: | |
72 | ||
73 | qemu-img create -f raw rbd:{pool-name}/{image-name} {size} | |
74 | ||
75 | For example:: | |
76 | ||
77 | qemu-img create -f raw rbd:data/foo 10G | |
78 | ||
79 | .. important:: The ``raw`` data format is really the only sensible | |
80 | ``format`` option to use with RBD. Technically, you could use other | |
81 | QEMU-supported formats (such as ``qcow2`` or ``vmdk``), but doing | |
82 | so would add additional overhead, and would also render the volume | |
83 | unsafe for virtual machine live migration when caching (see below) | |
84 | is enabled. | |
85 | ||
86 | ||
87 | Resizing Images with QEMU | |
88 | ========================= | |
89 | ||
90 | You can resize a block device image from QEMU. You must specify ``rbd``, | |
91 | the pool name, and the name of the image you wish to resize. You must also | |
92 | specify the size of the image. :: | |
93 | ||
94 | qemu-img resize rbd:{pool-name}/{image-name} {size} | |
95 | ||
96 | For example:: | |
97 | ||
98 | qemu-img resize rbd:data/foo 10G | |
99 | ||
100 | ||
101 | Retrieving Image Info with QEMU | |
102 | =============================== | |
103 | ||
104 | You can retrieve block device image information from QEMU. You must | |
105 | specify ``rbd``, the pool name, and the name of the image. :: | |
106 | ||
107 | qemu-img info rbd:{pool-name}/{image-name} | |
108 | ||
109 | For example:: | |
110 | ||
111 | qemu-img info rbd:data/foo | |
112 | ||
113 | ||
114 | Running QEMU with RBD | |
115 | ===================== | |
116 | ||
117 | QEMU can pass a block device from the host on to a guest, but since | |
118 | QEMU 0.15, there's no need to map an image as a block device on | |
f67539c2 TL |
119 | the host. Instead, QEMU attaches an image as a virtual block |
120 | device directly via ``librbd``. This strategy increases performance | |
121 | by avoiding context switches and taking advantage of `RBD caching`_. | |
7c673cae FG |
122 | |
123 | You can use ``qemu-img`` to convert existing virtual machine images to Ceph | |
124 | block device images. For example, if you have a qcow2 image, you could run:: | |
125 | ||
126 | qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze | |
127 | ||
128 | To run a virtual machine booting from that image, you could run:: | |
129 | ||
130 | qemu -m 1024 -drive format=raw,file=rbd:data/squeeze | |
131 | ||
132 | `RBD caching`_ can significantly improve performance. | |
133 | Since QEMU 1.2, QEMU's cache options control ``librbd`` caching:: | |
134 | ||
135 | qemu -m 1024 -drive format=rbd,file=rbd:data/squeeze,cache=writeback | |
136 | ||
137 | If you have an older version of QEMU, you can set the ``librbd`` cache | |
138 | configuration (like any Ceph configuration option) as part of the | |
139 | 'file' parameter:: | |
140 | ||
141 | qemu -m 1024 -drive format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback | |
142 | ||
143 | .. important:: If you set rbd_cache=true, you must set cache=writeback | |
144 | or risk data loss. Without cache=writeback, QEMU will not send | |
145 | flush requests to librbd. If QEMU exits uncleanly in this | |
9f95a23c | 146 | configuration, file systems on top of rbd can be corrupted. |
7c673cae FG |
147 | |
148 | .. _RBD caching: ../rbd-config-ref/#rbd-cache-config-settings | |
149 | ||
150 | ||
151 | .. index:: Ceph Block Device; discard trim and libvirt | |
152 | ||
153 | Enabling Discard/TRIM | |
154 | ===================== | |
155 | ||
156 | Since Ceph version 0.46 and QEMU version 1.1, Ceph Block Devices support the | |
157 | discard operation. This means that a guest can send TRIM requests to let a Ceph | |
158 | block device reclaim unused space. This can be enabled in the guest by mounting | |
159 | ``ext4`` or ``XFS`` with the ``discard`` option. | |
160 | ||
161 | For this to be available to the guest, it must be explicitly enabled | |
162 | for the block device. To do this, you must specify a | |
163 | ``discard_granularity`` associated with the drive:: | |
164 | ||
165 | qemu -m 1024 -drive format=raw,file=rbd:data/squeeze,id=drive1,if=none \ | |
166 | -device driver=ide-hd,drive=drive1,discard_granularity=512 | |
167 | ||
168 | Note that this uses the IDE driver. The virtio driver does not | |
169 | support discard. | |
170 | ||
171 | If using libvirt, edit your libvirt domain's configuration file using ``virsh | |
172 | edit`` to include the ``xmlns:qemu`` value. Then, add a ``qemu:commandline`` | |
173 | block as a child of that domain. The following example shows how to set two | |
174 | devices with ``qemu id=`` to different ``discard_granularity`` values. | |
175 | ||
eafe8130 | 176 | .. code-block:: xml |
7c673cae FG |
177 | |
178 | <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> | |
179 | <qemu:commandline> | |
180 | <qemu:arg value='-set'/> | |
181 | <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/> | |
182 | <qemu:arg value='-set'/> | |
183 | <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/> | |
184 | </qemu:commandline> | |
185 | </domain> | |
186 | ||
187 | ||
188 | .. index:: Ceph Block Device; cache options | |
189 | ||
190 | QEMU Cache Options | |
191 | ================== | |
192 | ||
193 | QEMU's cache options correspond to the following Ceph `RBD Cache`_ settings. | |
194 | ||
195 | Writeback:: | |
196 | ||
197 | rbd_cache = true | |
198 | ||
199 | Writethrough:: | |
200 | ||
201 | rbd_cache = true | |
202 | rbd_cache_max_dirty = 0 | |
203 | ||
204 | None:: | |
205 | ||
206 | rbd_cache = false | |
207 | ||
208 | QEMU's cache settings override Ceph's cache settings (including settings that | |
209 | are explicitly set in the Ceph configuration file). | |
210 | ||
211 | .. note:: Prior to QEMU v2.4.0, if you explicitly set `RBD Cache`_ settings | |
212 | in the Ceph configuration file, your Ceph settings override the QEMU cache | |
213 | settings. | |
214 | ||
215 | .. _QEMU Open Source Processor Emulator: http://wiki.qemu.org/Main_Page | |
216 | .. _QEMU Manual: http://wiki.qemu.org/Manual | |
217 | .. _RBD Cache: ../rbd-config-ref/ | |
218 | .. _Snapshots: ../rbd-snapshot/ | |
219 | .. _Installation: ../../install | |
220 | .. _User Management - User: ../../rados/operations/user-management#user |