]> git.proxmox.com Git - ceph.git/blame - ceph/src/spdk/doc/bdev_module.md
import 15.2.0 Octopus source
[ceph.git] / ceph / src / spdk / doc / bdev_module.md
CommitLineData
11fdf7f2
TL
1# Writing a Custom Block Device Module {#bdev_module}
2
3## Target Audience
4
5This programming guide is intended for developers authoring their own block
6device modules to integrate with SPDK's bdev layer. For a guide on how to use
7the bdev layer, see @ref bdev_pg.
8
9## Introduction
10
11A block device module is SPDK's equivalent of a device driver in a traditional
12operating system. The module provides a set of function pointers that are
13called to service block device I/O requests. SPDK provides a number of block
14device modules including NVMe, RAM-disk, and Ceph RBD. However, some users
15will want to write their own to interact with either custom hardware or to an
16existing storage software stack. This guide is intended to demonstrate exactly
17how to write a module.
18
19## Creating A New Module
20
21Block device modules are located in subdirectories under lib/bdev today. It is not
22currently possible to place the code for a bdev module elsewhere, but updates
23to the build system could be made to enable this in the future. To create a
24module, add a new directory with a single C file and a Makefile. A great
25starting point is to copy the existing 'null' bdev module.
26
27The primary interface that bdev modules will interact with is in
9f95a23c 28include/spdk/bdev_module.h. In that header a macro is defined that registers
11fdf7f2
TL
29a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a
30pointer spdk_bdev_module structure that is used to register new bdev module.
31
32The spdk_bdev_module structure describes the module properties like
33initialization (`module_init`) and teardown (`module_fini`) functions,
34the function that returns context size (`get_ctx_size`) - scratch space that
35will be allocated in each I/O request for use by this module, and a callback
36that will be called each time a new bdev is registered by another module
9f95a23c
TL
37(`examine_config` and `examine_disk`). Please check the documentation of
38struct spdk_bdev_module for more details.
11fdf7f2
TL
39
40## Creating Bdevs
41
42New bdevs are created within the module by calling spdk_bdev_register(). The
43module must allocate a struct spdk_bdev, fill it out appropriately, and pass
44it to the register call. The most important field to fill out is `fn_table`,
45which points at this data structure:
46
47~~~{.c}
48/*
49 * Function table for a block device backend.
50 *
51 * The backend block device function table provides a set of APIs to allow
52 * communication with a backend. The main commands are read/write API
53 * calls for I/O via submit_request.
54 */
55struct spdk_bdev_fn_table {
56 /* Destroy the backend block device object */
57 int (*destruct)(void *ctx);
58
59 /* Process the IO. */
60 void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *);
61
62 /* Check if the block device supports a specific I/O type. */
63 bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type);
64
65 /* Get an I/O channel for the specific bdev for the calling thread. */
66 struct spdk_io_channel *(*get_io_channel)(void *ctx);
67
68 /*
69 * Output driver-specific configuration to a JSON stream. Optional - may be NULL.
70 *
71 * The JSON write context will be initialized with an open object, so the bdev
72 * driver should write a name (based on the driver name) followed by a JSON value
73 * (most likely another nested object).
74 */
75 int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w);
76
77 /* Get spin-time per I/O channel in microseconds.
78 * Optional - may be NULL.
79 */
80 uint64_t (*get_spin_time)(struct spdk_io_channel *ch);
81};
82~~~
83
84The bdev module must implement these function callbacks.
85
86The `destruct` function is called to tear down the device when the system no
87longer needs it. What `destruct` does is up to the module - it may just be
88freeing memory or it may be shutting down a piece of hardware.
89
90The `io_type_supported` function returns whether a particular I/O type is
91supported. The available I/O types are:
92
93~~~{.c}
94/** bdev I/O type */
95enum spdk_bdev_io_type {
96 SPDK_BDEV_IO_TYPE_INVALID = 0,
97 SPDK_BDEV_IO_TYPE_READ,
98 SPDK_BDEV_IO_TYPE_WRITE,
99 SPDK_BDEV_IO_TYPE_UNMAP,
100 SPDK_BDEV_IO_TYPE_FLUSH,
101 SPDK_BDEV_IO_TYPE_RESET,
102 SPDK_BDEV_IO_TYPE_NVME_ADMIN,
103 SPDK_BDEV_IO_TYPE_NVME_IO,
104 SPDK_BDEV_IO_TYPE_NVME_IO_MD,
105 SPDK_BDEV_IO_TYPE_WRITE_ZEROES,
106};
107~~~
108
109For the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and
110`SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often
111referred to as "trim" or "deallocate", and is a request to mark a set of
112blocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a
113request to make all previously completed writes durable. Many devices do not
114require flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular
115write, but does not provide a data buffer (it would have just contained all
1160's). If it isn't supported, the generic bdev code is capable of emulating it
117by sending regular write requests.
118
119`SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the
120underlying device to its initial state. Do not complete the reset request
121until all I/O has been completed in some way.
122
123`SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and
124`SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe
125commands through the SPDK bdev layer. They're strictly optional, and it
126probably only makes sense to implement those if the backing storage device is
127capable of handling NVMe commands.
128
129The `get_io_channel` function should return an I/O channel. For a detailed
130explanation of I/O channels, see @ref concurrency. The generic bdev layer will
131call `get_io_channel` one time per thread, cache the result, and pass that
132result to `submit_request`. It will use the corresponding channel for the
133thread it calls `submit_request` on.
134
135The `submit_request` function is called to actually submit I/O requests to the
136block device. Once the I/O request is completed, the module must call
137spdk_bdev_io_complete(). The I/O does not have to finish within the calling
138context of `submit_request`.
139
140## Creating Virtual Bdevs
141
142Block devices are considered virtual if they handle I/O requests by routing
143the I/O to other block devices. The canonical example would be a bdev module
144that implements RAID. Virtual bdevs are created in the same way as regular
145bdevs, but take one additional step. The module can look up the underlying
146bdevs it wishes to route I/O to using spdk_bdev_get_by_name(), where the string
147name is provided by the user in a configuration file or via an RPC. The module
148then may proceed is normal by opening the bdev to obtain a descriptor, and
149creating I/O channels for the bdev (probably in response to the
150`get_io_channel` callback). The final step is to have the module use its open
151descriptor to call spdk_bdev_module_claim_bdev(), indicating that it is
152consuming the underlying bdev. This prevents other users from opening
153descriptors with write permissions. This effectively 'promotes' the descriptor
154to write-exclusive and is an operation only available to bdev modules.