]> git.proxmox.com Git - ceph.git/blob - ceph/src/spdk/doc/bdev.md
import 15.2.0 Octopus source
[ceph.git] / ceph / src / spdk / doc / bdev.md
1 # Block Device User Guide {#bdev}
2
3 # Introduction {#bdev_ug_introduction}
4
5 The SPDK block device layer, often simply called *bdev*, is a C library
6 intended to be equivalent to the operating system block storage layer that
7 often sits immediately above the device drivers in a traditional kernel
8 storage stack. Specifically, this library provides the following
9 functionality:
10
11 * A pluggable module API for implementing block devices that interface with different types of block storage devices.
12 * Driver modules for NVMe, malloc (ramdisk), Linux AIO, virtio-scsi, Ceph RBD, Pmem and Vhost-SCSI Initiator and more.
13 * An application API for enumerating and claiming SPDK block devices and then performing operations (read, write, unmap, etc.) on those devices.
14 * Facilities to stack block devices to create complex I/O pipelines, including logical volume management (lvol) and partition support (GPT).
15 * Configuration of block devices via JSON-RPC.
16 * Request queueing, timeout, and reset handling.
17 * Multiple, lockless queues for sending I/O to block devices.
18
19 Bdev module creates abstraction layer that provides common API for all devices.
20 User can use available bdev modules or create own module with any type of
21 device underneath (please refer to @ref bdev_module for details). SPDK
22 provides also vbdev modules which creates block devices on existing bdev. For
23 example @ref bdev_ug_logical_volumes or @ref bdev_ug_gpt
24
25 # Prerequisites {#bdev_ug_prerequisites}
26
27 This guide assumes that you can already build the standard SPDK distribution
28 on your platform. The block device layer is a C library with a single public
29 header file named bdev.h. All SPDK configuration described in following
30 chapters is done by using JSON-RPC commands. SPDK provides a python-based
31 command line tool for sending RPC commands located at `scripts/rpc.py`. User
32 can list available commands by running this script with `-h` or `--help` flag.
33 Additionally user can retrieve currently supported set of RPC commands
34 directly from SPDK application by running `scripts/rpc.py rpc_get_methods`.
35 Detailed help for each command can be displayed by adding `-h` flag as a
36 command parameter.
37
38 # General Purpose RPCs {#bdev_ug_general_rpcs}
39
40 ## get_bdevs {#bdev_ug_get_bdevs}
41
42 List of currently available block devices including detailed information about
43 them can be get by using `get_bdevs` RPC command. User can add optional
44 parameter `name` to get details about specified by that name bdev.
45
46 Example response
47
48 ~~~
49 {
50 "num_blocks": 32768,
51 "assigned_rate_limits": {
52 "rw_ios_per_sec": 10000,
53 "rw_mbytes_per_sec": 20
54 },
55 "supported_io_types": {
56 "reset": true,
57 "nvme_admin": false,
58 "unmap": true,
59 "read": true,
60 "write_zeroes": true,
61 "write": true,
62 "flush": true,
63 "nvme_io": false
64 },
65 "driver_specific": {},
66 "claimed": false,
67 "block_size": 4096,
68 "product_name": "Malloc disk",
69 "name": "Malloc0"
70 }
71 ~~~
72
73 ## set_bdev_qos_limit {#set_bdev_qos_limit}
74
75 Users can use the `set_bdev_qos_limit` RPC command to enable, adjust, and disable
76 rate limits on an existing bdev. Two types of rate limits are supported:
77 IOPS and bandwidth. The rate limits can be enabled, adjusted, and disabled at any
78 time for the specified bdev. The bdev name is a required parameter for this
79 RPC command and at least one of `rw_ios_per_sec` and `rw_mbytes_per_sec` must be
80 specified. When both rate limits are enabled, the first met limit will
81 take effect. The value 0 may be specified to disable the corresponding rate
82 limit. Users can run this command with `-h` or `--help` for more information.
83
84 ## Histograms {#rpc_bdev_histogram}
85
86 The `enable_bdev_histogram` RPC command allows to enable or disable gathering
87 latency data for specified bdev. Histogram can be downloaded by the user by
88 calling `get_bdev_histogram` and parsed using scripts/histogram.py script.
89
90 Example command
91
92 `rpc.py enable_bdev_histogram Nvme0n1 --enable`
93
94 The command will enable gathering data for histogram on Nvme0n1 device.
95
96 `rpc.py get_bdev_histogram Nvme0n1 | histogram.py`
97
98 The command will download gathered histogram data. The script will parse
99 the data and show table containing IO count for latency ranges.
100
101 `rpc.py enable_bdev_histogram Nvme0n1 --disable`
102
103 The command will disable histogram on Nvme0n1 device.
104
105 # Ceph RBD {#bdev_config_rbd}
106
107 The SPDK RBD bdev driver provides SPDK block layer access to Ceph RADOS block
108 devices (RBD). Ceph RBD devices are accessed via librbd and librados libraries
109 to access the RADOS block device exported by Ceph. To create Ceph bdev RPC
110 command `construct_rbd_bdev` should be used.
111
112 Example command
113
114 `rpc.py construct_rbd_bdev rbd foo 512`
115
116 This command will create a bdev that represents the 'foo' image from a pool called 'rbd'.
117
118 To remove a block device representation use the delete_rbd_bdev command.
119
120 `rpc.py delete_rbd_bdev Rbd0`
121
122 # Crypto Virtual Bdev Module {#bdev_config_crypto}
123
124 The crypto virtual bdev module can be configured to provide at rest data encryption
125 for any underlying bdev. The module relies on the DPDK CryptoDev Framework to provide
126 all cryptographic functionality. The framework provides support for many different software
127 only cryptographic modules as well hardware assisted support for the Intel QAT board. The
128 framework also provides support for cipher, hash, authentication and AEAD functions. At this
129 time the SPDK virtual bdev module supports cipher only as follows:
130
131 - AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
132 - Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
133 (Note: QAT is functional however is marked as experimental until the hardware has
134 been fully integrated with the SPDK CI system.)
135
136 In order to support using the bdev block offset (LBA) as the initialization vector (IV),
137 the crypto module break up all I/O into crypto operations of a size equal to the block
138 size of the underlying bdev. For example, a 4K I/O to a bdev with a 512B block size,
139 would result in 8 cryptographic operations.
140
141 For reads, the buffer provided to the crypto module will be used as the destination buffer
142 for unencrypted data. For writes, however, a temporary scratch buffer is used as the
143 destination buffer for encryption which is then passed on to the underlying bdev as the
144 write buffer. This is done to avoid encrypting the data in the original source buffer which
145 may cause problems in some use cases.
146
147 Example command
148
149 `rpc.py construct_crypto_bdev -b NVMe1n1 -c CryNvmeA -d crypto_aesni_mb -k 0123456789123456`
150
151 This command will create a crypto vbdev called 'CryNvmeA' on top of the NVMe bdev
152 'NVMe1n1' and will use the DPDK software driver 'crypto_aesni_mb' and the key
153 '0123456789123456'.
154
155 To remove the vbdev use the delete_crypto_bdev command.
156
157 `rpc.py delete_crypto_bdev CryNvmeA`
158
159 # GPT (GUID Partition Table) {#bdev_config_gpt}
160
161 The GPT virtual bdev driver is enabled by default and does not require any configuration.
162 It will automatically detect @ref bdev_ug_gpt on any attached bdev and will create
163 possibly multiple virtual bdevs.
164
165 ## SPDK GPT partition table {#bdev_ug_gpt}
166
167 The SPDK partition type GUID is `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`. Existing SPDK bdevs
168 can be exposed as Linux block devices via NBD and then ca be partitioned with
169 standard partitioning tools. After partitioning, the bdevs will need to be deleted and
170 attached again for the GPT bdev module to see any changes. NBD kernel module must be
171 loaded first. To create NBD bdev user should use `start_nbd_disk` RPC command.
172
173 Example command
174
175 `rpc.py start_nbd_disk Malloc0 /dev/nbd0`
176
177 This will expose an SPDK bdev `Malloc0` under the `/dev/nbd0` block device.
178
179 To remove NBD device user should use `stop_nbd_disk` RPC command.
180
181 Example command
182
183 `rpc.py stop_nbd_disk /dev/nbd0`
184
185 To display full or specified nbd device list user should use `get_nbd_disks` RPC command.
186
187 Example command
188
189 `rpc.py stop_nbd_disk -n /dev/nbd0`
190
191 ## Creating a GPT partition table using NBD {#bdev_ug_gpt_create_part}
192
193 ~~~
194 # Expose bdev Nvme0n1 as kernel block device /dev/nbd0 by JSON-RPC
195 rpc.py start_nbd_disk Nvme0n1 /dev/nbd0
196
197 # Create GPT partition table.
198 parted -s /dev/nbd0 mklabel gpt
199
200 # Add a partition consuming 50% of the available space.
201 parted -s /dev/nbd0 mkpart MyPartition '0%' '50%'
202
203 # Change the partition type to the SPDK GUID.
204 # sgdisk is part of the gdisk package.
205 sgdisk -t 1:7c5222bd-8f5d-4087-9c00-bf9843c7b58c /dev/nbd0
206
207 # Stop the NBD device (stop exporting /dev/nbd0).
208 rpc.py stop_nbd_disk /dev/nbd0
209
210 # Now Nvme0n1 is configured with a GPT partition table, and
211 # the first partition will be automatically exposed as
212 # Nvme0n1p1 in SPDK applications.
213 ~~~
214
215 # iSCSI bdev {#bdev_config_iscsi}
216
217 The SPDK iSCSI bdev driver depends on libiscsi and hence is not enabled by default.
218 In order to use it, build SPDK with an extra `--with-iscsi-initiator` configure option.
219
220 The following command creates an `iSCSI0` bdev from a single LUN exposed at given iSCSI URL
221 with `iqn.2016-06.io.spdk:init` as the reported initiator IQN.
222
223 `rpc.py construct_iscsi_bdev -b iSCSI0 -i iqn.2016-06.io.spdk:init --url iscsi://127.0.0.1/iqn.2016-06.io.spdk:disk1/0`
224
225 The URL is in the following format:
226 `iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn>/<lun>`
227
228 # Linux AIO bdev {#bdev_config_aio}
229
230 The SPDK AIO bdev driver provides SPDK block layer access to Linux kernel block
231 devices or a file on a Linux filesystem via Linux AIO. Note that O_DIRECT is
232 used and thus bypasses the Linux page cache. This mode is probably as close to
233 a typical kernel based target as a user space target can get without using a
234 user-space driver. To create AIO bdev RPC command `construct_aio_bdev` should be
235 used.
236
237 Example commands
238
239 `rpc.py construct_aio_bdev /dev/sda aio0`
240
241 This command will create `aio0` device from /dev/sda.
242
243 `rpc.py construct_aio_bdev /tmp/file file 8192`
244
245 This command will create `file` device with block size 8192 from /tmp/file.
246
247 To delete an aio bdev use the delete_aio_bdev command.
248
249 `rpc.py delete_aio_bdev aio0`
250
251 # OCF Virtual bdev {#bdev_config_cas}
252
253 OCF virtual bdev module is based on [Open CAS Framework](https://github.com/Open-CAS/ocf) - a
254 high performance block storage caching meta-library.
255 To enable the module, configure SPDK using `--with-ocf` flag.
256 OCF bdev can be used to enable caching for any underlying bdev.
257
258 Below is an example command for creating OCF bdev:
259
260 `rpc.py construct_ocf_bdev Cache1 wt Malloc0 Nvme0n1`
261
262 This command will create new OCF bdev `Cache1` having bdev `Malloc0` as caching-device
263 and `Nvme0n1` as core-device and initial cache mode `Write-Through`.
264 `Malloc0` will be used as cache for `Nvme0n1`, so data written to `Cache1` will be present
265 on `Nvme0n1` eventually.
266 By default, OCF will be configured with cache line size equal 4KiB
267 and non-volatile metadata will be disabled.
268
269 To remove `Cache1`:
270
271 `rpc.py delete_ocf_bdev Cache1`
272
273 During removal OCF-cache will be stopped and all cached data will be written to the core device.
274
275 Note that OCF has a per-device RAM requirement
276 of about 56000 + _cache device size_ * 58 / _cache line size_ (in bytes).
277 To get more information on OCF
278 please visit [OCF documentation](https://open-cas.github.io/).
279
280 # Malloc bdev {#bdev_config_malloc}
281
282 Malloc bdevs are ramdisks. Because of its nature they are volatile. They are created from hugepage memory given to SPDK
283 application.
284
285 # Null {#bdev_config_null}
286
287 The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined
288 data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block
289 device overhead and for testing configurations that can't easily be created with the Malloc bdev.
290 To create Null bdev RPC command `construct_null_bdev` should be used.
291
292 Example command
293
294 `rpc.py construct_null_bdev Null0 8589934592 4096`
295
296 This command will create an 8 petabyte `Null0` device with block size 4096.
297
298 To delete a null bdev use the delete_null_bdev command.
299
300 `rpc.py delete_null_bdev Null0`
301
302 # NVMe bdev {#bdev_config_nvme}
303
304 There are two ways to create block device based on NVMe device in SPDK. First
305 way is to connect local PCIe drive and second one is to connect NVMe-oF device.
306 In both cases user should use `construct_nvme_bdev` RPC command to achieve that.
307
308 Example commands
309
310 `rpc.py construct_nvme_bdev -b NVMe1 -t PCIe -a 0000:01:00.0`
311
312 This command will create NVMe bdev of physical device in the system.
313
314 `rpc.py construct_nvme_bdev -b Nvme0 -t RDMA -a 192.168.100.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1`
315
316 This command will create NVMe bdev of NVMe-oF resource.
317
318 To remove a NVMe controller use the delete_nvme_controller command.
319
320 `rpc.py delete_nvme_controller Nvme0`
321
322 This command will remove NVMe controller named Nvme0.
323
324 # Logical volumes {#bdev_ug_logical_volumes}
325
326 The Logical Volumes library is a flexible storage space management system. It allows
327 creating and managing virtual block devices with variable size on top of other bdevs.
328 The SPDK Logical Volume library is built on top of @ref blob. For detailed description
329 please refer to @ref lvol.
330
331 ## Logical volume store {#bdev_ug_lvol_store}
332
333 Before creating any logical volumes (lvols), an lvol store has to be created first on
334 selected block device. Lvol store is lvols vessel responsible for managing underlying
335 bdev space assignment to lvol bdevs and storing metadata. To create lvol store user
336 should use using `construct_lvol_store` RPC command.
337
338 Example command
339
340 `rpc.py construct_lvol_store Malloc2 lvs -c 4096`
341
342 This will create lvol store named `lvs` with cluster size 4096, build on top of
343 `Malloc2` bdev. In response user will be provided with uuid which is unique lvol store
344 identifier.
345
346 User can get list of available lvol stores using `get_lvol_stores` RPC command (no
347 parameters available).
348
349 Example response
350
351 ~~~
352 {
353 "uuid": "330a6ab2-f468-11e7-983e-001e67edf35d",
354 "base_bdev": "Malloc2",
355 "free_clusters": 8190,
356 "cluster_size": 8192,
357 "total_data_clusters": 8190,
358 "block_size": 4096,
359 "name": "lvs"
360 }
361 ~~~
362
363 To delete lvol store user should use `destroy_lvol_store` RPC command.
364
365 Example commands
366
367 `rpc.py destroy_lvol_store -u 330a6ab2-f468-11e7-983e-001e67edf35d`
368
369 `rpc.py destroy_lvol_store -l lvs`
370
371 ## Lvols {#bdev_ug_lvols}
372
373 To create lvols on existing lvol store user should use `construct_lvol_bdev` RPC command.
374 Each created lvol will be represented by new bdev.
375
376 Example commands
377
378 `rpc.py construct_lvol_bdev lvol1 25 -l lvs`
379
380 `rpc.py construct_lvol_bdev lvol2 25 -u 330a6ab2-f468-11e7-983e-001e67edf35d`
381
382 # RAID {#bdev_ug_raid}
383
384 RAID virtual bdev module provides functionality to combine any SPDK bdevs into
385 one RAID bdev. Currently SPDK supports only RAID 0. RAID functionality does not
386 store on-disk metadata on the member disks, so user must reconstruct the RAID
387 volume when restarting application. User may specify member disks to create RAID
388 volume event if they do not exists yet - as the member disks are registered at
389 a later time, the RAID module will claim them and will surface the RAID volume
390 after all of the member disks are available. It is allowed to use disks of
391 different sizes - the smallest disk size will be the amount of space used on
392 each member disk.
393
394 Example commands
395
396 `rpc.py construct_raid_bdev -n Raid0 -z 64 -r 0 -b "lvol0 lvol1 lvol2 lvol3"`
397
398 `rpc.py get_raid_bdevs`
399
400 `rpc.py destroy_raid_bdev Raid0`
401
402 # Passthru {#bdev_config_passthru}
403
404 The SPDK Passthru virtual block device module serves as an example of how to write a
405 virtual block device module. It implements the required functionality of a vbdev module
406 and demonstrates some other basic features such as the use of per I/O context.
407
408 Example commands
409
410 `rpc.py construct_passthru_bdev -b aio -p pt`
411
412 `rpc.py delete_passthru_bdev pt`
413
414 # Pmem {#bdev_config_pmem}
415
416 The SPDK pmem bdev driver uses pmemblk pool as the target for block I/O operations. For
417 details on Pmem memory please refer to PMDK documentation on http://pmem.io website.
418 First, user needs to configure SPDK to include PMDK support:
419
420 `configure --with-pmdk`
421
422 To create pmemblk pool for use with SPDK user should use `create_pmem_pool` RPC command.
423
424 Example command
425
426 `rpc.py create_pmem_pool /path/to/pmem_pool 25 4096`
427
428 To get information on created pmem pool file user can use `pmem_pool_info` RPC command.
429
430 Example command
431
432 `rpc.py pmem_pool_info /path/to/pmem_pool`
433
434 To remove pmem pool file user can use `delete_pmem_pool` RPC command.
435
436 Example command
437
438 `rpc.py delete_pmem_pool /path/to/pmem_pool`
439
440 To create bdev based on pmemblk pool file user should use `construct_pmem_bdev ` RPC
441 command.
442
443 Example command
444
445 `rpc.py construct_pmem_bdev /path/to/pmem_pool -n pmem`
446
447 To remove a block device representation use the delete_pmem_bdev command.
448
449 `rpc.py delete_pmem_bdev pmem`
450
451 # Virtio Block {#bdev_config_virtio_blk}
452
453 The Virtio-Block driver allows creating SPDK bdevs from Virtio-Block devices.
454
455 The following command creates a Virtio-Block device named `VirtioBlk0` from a vhost-user
456 socket `/tmp/vhost.0` exposed directly by SPDK @ref vhost. Optional `vq-count` and
457 `vq-size` params specify number of request queues and queue depth to be used.
458
459 `rpc.py construct_virtio_dev --dev-type blk --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioBlk0`
460
461 The driver can be also used inside QEMU-based VMs. The following command creates a Virtio
462 Block device named `VirtioBlk0` from a Virtio PCI device at address `0000:00:01.0`.
463 The entire configuration will be read automatically from PCI Configuration Space. It will
464 reflect all parameters passed to QEMU's vhost-user-scsi-pci device.
465
466 `rpc.py construct_virtio_dev --dev-type blk --trtype pci --traddr 0000:01:00.0 VirtioBlk1`
467
468 Virtio-Block devices can be removed with the following command
469
470 `rpc.py remove_virtio_bdev VirtioBlk0`
471
472 # Virtio SCSI {#bdev_config_virtio_scsi}
473
474 The Virtio-SCSI driver allows creating SPDK block devices from Virtio-SCSI LUNs.
475
476 Virtio-SCSI bdevs are constructed the same way as Virtio-Block ones.
477
478 `rpc.py construct_virtio_dev --dev-type scsi --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioScsi0`
479
480 `rpc.py construct_virtio_dev --dev-type scsi --trtype pci --traddr 0000:01:00.0 VirtioScsi0`
481
482 Each Virtio-SCSI device may export up to 64 block devices named VirtioScsi0t0 ~ VirtioScsi0t63,
483 one LUN (LUN0) per SCSI device. The above 2 commands will output names of all exposed bdevs.
484
485 Virtio-SCSI devices can be removed with the following command
486
487 `rpc.py remove_virtio_bdev VirtioScsi0`
488
489 Removing a Virtio-SCSI device will destroy all its bdevs.