]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | # Writing a Custom Block Device Module {#bdev_module} |
2 | ||
3 | ## Target Audience | |
4 | ||
5 | This programming guide is intended for developers authoring their own block | |
6 | device modules to integrate with SPDK's bdev layer. For a guide on how to use | |
7 | the bdev layer, see @ref bdev_pg. | |
8 | ||
9 | ## Introduction | |
10 | ||
11 | A block device module is SPDK's equivalent of a device driver in a traditional | |
12 | operating system. The module provides a set of function pointers that are | |
13 | called to service block device I/O requests. SPDK provides a number of block | |
14 | device modules including NVMe, RAM-disk, and Ceph RBD. However, some users | |
15 | will want to write their own to interact with either custom hardware or to an | |
16 | existing storage software stack. This guide is intended to demonstrate exactly | |
17 | how to write a module. | |
18 | ||
19 | ## Creating A New Module | |
20 | ||
21 | Block device modules are located in subdirectories under lib/bdev today. It is not | |
22 | currently possible to place the code for a bdev module elsewhere, but updates | |
23 | to the build system could be made to enable this in the future. To create a | |
24 | module, add a new directory with a single C file and a Makefile. A great | |
25 | starting point is to copy the existing 'null' bdev module. | |
26 | ||
27 | The primary interface that bdev modules will interact with is in | |
9f95a23c | 28 | include/spdk/bdev_module.h. In that header a macro is defined that registers |
11fdf7f2 TL |
29 | a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a |
30 | pointer spdk_bdev_module structure that is used to register new bdev module. | |
31 | ||
32 | The spdk_bdev_module structure describes the module properties like | |
33 | initialization (`module_init`) and teardown (`module_fini`) functions, | |
34 | the function that returns context size (`get_ctx_size`) - scratch space that | |
35 | will be allocated in each I/O request for use by this module, and a callback | |
36 | that will be called each time a new bdev is registered by another module | |
9f95a23c TL |
37 | (`examine_config` and `examine_disk`). Please check the documentation of |
38 | struct spdk_bdev_module for more details. | |
11fdf7f2 TL |
39 | |
40 | ## Creating Bdevs | |
41 | ||
42 | New bdevs are created within the module by calling spdk_bdev_register(). The | |
43 | module must allocate a struct spdk_bdev, fill it out appropriately, and pass | |
44 | it to the register call. The most important field to fill out is `fn_table`, | |
45 | which points at this data structure: | |
46 | ||
47 | ~~~{.c} | |
48 | /* | |
49 | * Function table for a block device backend. | |
50 | * | |
51 | * The backend block device function table provides a set of APIs to allow | |
52 | * communication with a backend. The main commands are read/write API | |
53 | * calls for I/O via submit_request. | |
54 | */ | |
55 | struct spdk_bdev_fn_table { | |
56 | /* Destroy the backend block device object */ | |
57 | int (*destruct)(void *ctx); | |
58 | ||
59 | /* Process the IO. */ | |
60 | void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *); | |
61 | ||
62 | /* Check if the block device supports a specific I/O type. */ | |
63 | bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type); | |
64 | ||
65 | /* Get an I/O channel for the specific bdev for the calling thread. */ | |
66 | struct spdk_io_channel *(*get_io_channel)(void *ctx); | |
67 | ||
68 | /* | |
69 | * Output driver-specific configuration to a JSON stream. Optional - may be NULL. | |
70 | * | |
71 | * The JSON write context will be initialized with an open object, so the bdev | |
72 | * driver should write a name (based on the driver name) followed by a JSON value | |
73 | * (most likely another nested object). | |
74 | */ | |
75 | int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w); | |
76 | ||
77 | /* Get spin-time per I/O channel in microseconds. | |
78 | * Optional - may be NULL. | |
79 | */ | |
80 | uint64_t (*get_spin_time)(struct spdk_io_channel *ch); | |
81 | }; | |
82 | ~~~ | |
83 | ||
84 | The bdev module must implement these function callbacks. | |
85 | ||
86 | The `destruct` function is called to tear down the device when the system no | |
87 | longer needs it. What `destruct` does is up to the module - it may just be | |
88 | freeing memory or it may be shutting down a piece of hardware. | |
89 | ||
90 | The `io_type_supported` function returns whether a particular I/O type is | |
91 | supported. The available I/O types are: | |
92 | ||
93 | ~~~{.c} | |
94 | /** bdev I/O type */ | |
95 | enum spdk_bdev_io_type { | |
96 | SPDK_BDEV_IO_TYPE_INVALID = 0, | |
97 | SPDK_BDEV_IO_TYPE_READ, | |
98 | SPDK_BDEV_IO_TYPE_WRITE, | |
99 | SPDK_BDEV_IO_TYPE_UNMAP, | |
100 | SPDK_BDEV_IO_TYPE_FLUSH, | |
101 | SPDK_BDEV_IO_TYPE_RESET, | |
102 | SPDK_BDEV_IO_TYPE_NVME_ADMIN, | |
103 | SPDK_BDEV_IO_TYPE_NVME_IO, | |
104 | SPDK_BDEV_IO_TYPE_NVME_IO_MD, | |
105 | SPDK_BDEV_IO_TYPE_WRITE_ZEROES, | |
106 | }; | |
107 | ~~~ | |
108 | ||
109 | For the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and | |
110 | `SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often | |
111 | referred to as "trim" or "deallocate", and is a request to mark a set of | |
112 | blocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a | |
113 | request to make all previously completed writes durable. Many devices do not | |
114 | require flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular | |
115 | write, but does not provide a data buffer (it would have just contained all | |
116 | 0's). If it isn't supported, the generic bdev code is capable of emulating it | |
117 | by sending regular write requests. | |
118 | ||
119 | `SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the | |
120 | underlying device to its initial state. Do not complete the reset request | |
121 | until all I/O has been completed in some way. | |
122 | ||
123 | `SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and | |
124 | `SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe | |
125 | commands through the SPDK bdev layer. They're strictly optional, and it | |
126 | probably only makes sense to implement those if the backing storage device is | |
127 | capable of handling NVMe commands. | |
128 | ||
129 | The `get_io_channel` function should return an I/O channel. For a detailed | |
130 | explanation of I/O channels, see @ref concurrency. The generic bdev layer will | |
131 | call `get_io_channel` one time per thread, cache the result, and pass that | |
132 | result to `submit_request`. It will use the corresponding channel for the | |
133 | thread it calls `submit_request` on. | |
134 | ||
135 | The `submit_request` function is called to actually submit I/O requests to the | |
136 | block device. Once the I/O request is completed, the module must call | |
137 | spdk_bdev_io_complete(). The I/O does not have to finish within the calling | |
138 | context of `submit_request`. | |
139 | ||
140 | ## Creating Virtual Bdevs | |
141 | ||
142 | Block devices are considered virtual if they handle I/O requests by routing | |
143 | the I/O to other block devices. The canonical example would be a bdev module | |
144 | that implements RAID. Virtual bdevs are created in the same way as regular | |
145 | bdevs, but take one additional step. The module can look up the underlying | |
146 | bdevs it wishes to route I/O to using spdk_bdev_get_by_name(), where the string | |
147 | name is provided by the user in a configuration file or via an RPC. The module | |
148 | then may proceed is normal by opening the bdev to obtain a descriptor, and | |
149 | creating I/O channels for the bdev (probably in response to the | |
150 | `get_io_channel` callback). The final step is to have the module use its open | |
151 | descriptor to call spdk_bdev_module_claim_bdev(), indicating that it is | |
152 | consuming the underlying bdev. This prevents other users from opening | |
153 | descriptors with write permissions. This effectively 'promotes' the descriptor | |
154 | to write-exclusive and is an operation only available to bdev modules. |