]> git.proxmox.com Git - ceph.git/blob - ceph/src/spdk/doc/nvmf.md
update sources to ceph Nautilus 14.2.1
[ceph.git] / ceph / src / spdk / doc / nvmf.md
1 # NVMe over Fabrics Target {#nvmf}
2
3 @sa @ref nvme_fabrics_host
4 @sa @ref nvmf_tgt_tracepoints
5
6 # NVMe-oF Target Getting Started Guide {#nvmf_getting_started}
7
8 The NVMe over Fabrics target is a user space application that presents block devices over the
9 network using RDMA. It requires an RDMA-capable NIC with its corresponding OFED software package
10 installed to run. The target should work on all flavors of RDMA, but it is currently tested against
11 Mellanox NICs (RoCEv2) and Chelsio NICs (iWARP).
12
13 The NVMe over Fabrics specification defines subsystems that can be exported over the network. SPDK
14 has chosen to call the software that exports these subsystems a "target", which is the term used
15 for iSCSI. The specification refers to the "client" that connects to the target as a "host". Many
16 people will also refer to the host as an "initiator", which is the equivalent thing in iSCSI
17 parlance. SPDK will try to stick to the terms "target" and "host" to match the specification.
18
19 The Linux kernel also implements an NVMe-oF target and host, and SPDK is tested for
20 interoperability with the Linux kernel implementations.
21
22 If you want to kill the application using signal, make sure use the SIGTERM, then the application
23 will release all the share memory resource before exit, the SIGKILL will make the share memory
24 resource have no chance to be released by application, you may need to release the resource manually.
25
26 ## Prerequisites {#nvmf_prereqs}
27
28 This guide starts by assuming that you can already build the standard SPDK distribution on your
29 platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some
30 additional dependencies.
31
32 Fedora:
33 ~~~{.sh}
34 dnf install libibverbs-devel librdmacm-devel
35 ~~~
36
37 Ubuntu:
38 ~~~{.sh}
39 apt-get install libibverbs-dev librdmacm-dev
40 ~~~
41
42 Then build SPDK with RDMA enabled:
43
44 ~~~{.sh}
45 ./configure --with-rdma <other config parameters>
46 make
47 ~~~
48
49 Once built, the binary will be in `app/nvmf_tgt`.
50
51 ## Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
52
53 Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow
54 userspace processes to use InfiniBand/RDMA verbs directly.
55
56 ~~~{.sh}
57 modprobe ib_cm
58 modprobe ib_core
59 # Please note that ib_ucm does not exist in newer versions of the kernel and is not required.
60 modprobe ib_ucm || true
61 modprobe ib_umad
62 modprobe ib_uverbs
63 modprobe iw_cm
64 modprobe rdma_cm
65 modprobe rdma_ucm
66 ~~~
67
68 ## Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
69
70 Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
71
72 ### Finding RDMA NICs and associated network interfaces
73
74 ~~~{.sh}
75 ls /sys/class/infiniband/*/device/net
76 ~~~
77
78 ### Mellanox ConnectX-3 RDMA NICs
79
80 ~~~{.sh}
81 modprobe mlx4_core
82 modprobe mlx4_ib
83 modprobe mlx4_en
84 ~~~
85
86 ### Mellanox ConnectX-4 RDMA NICs
87
88 ~~~{.sh}
89 modprobe mlx5_core
90 modprobe mlx5_ib
91 ~~~
92
93 ### Assigning IP addresses to RDMA NICs
94
95 ~~~{.sh}
96 ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
97 ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up
98 ~~~
99
100 ## Configuring the SPDK NVMe over Fabrics Target {#nvmf_config}
101
102 An NVMe over Fabrics target can be configured using JSON RPCs.
103 The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about
104 working with NVMe over Fabrics specific RPCs can be found on the @ref jsonrpc_components_nvmf_tgt RPC page.
105
106 Using .ini style configuration files for configuration of the NVMe-oF target is deprecated and should
107 be replaced with JSON based RPCs. .ini style configuration files can be converted to json format by way
108 of the new script `scripts/config_converter.py`.
109
110 ### Using RPCs {#nvmf_config_rpc}
111
112 Start the nvmf_tgt application with elevated privileges and instruct it to wait for RPCs.
113 The set_nvmf_target_options RPC can then be used to configure basic target parameters.
114 Below is an example where the target is configured with an I/O unit size of 8192,
115 4 max qpairs per controller, and an in capsule data size of 0. The parameters controlled
116 by set_nvmf_target_options may only be modified before the SPDK NVMe-oF subsystem is initialized.
117 Once the target options are configured. You need to start the NVMe-oF subsystem with start_subsystem_init.
118
119 ~~~{.sh}
120 app/nvmf_tgt/nvmf_tgt --wait-for-rpc
121 scripts/rpc.py set_nvmf_target_options -u 8192 -p 4 -c 0
122 scripts/rpc.py start_subsystem_init
123 ~~~
124
125 Note: The start_subsystem_init rpc is referring to SPDK application subsystems and not the NVMe over Fabrics concept.
126
127 Below is an example of creating a malloc bdev and assigning it to a subsystem. Adjust the bdevs,
128 NQN, serial number, and IP address to your own circumstances.
129
130 ~~~{.sh}
131 scripts/rpc.py construct_malloc_bdev -b Malloc0 512 512
132 scripts/rpc.py nvmf_subsystem_create nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001
133 scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0
134 scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t rdma -a 192.168.100.8 -s 4420
135 ~~~
136
137 ### NQN Formal Definition
138
139 NVMe qualified names or NQNs are defined in section 7.9 of the
140 [NVMe specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf). SPDK has attempted to
141 formalize that definition using [Extended Backus-Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form).
142 SPDK modules use this formal definition (provided below) when validating NQNs.
143
144 ~~~{.sh}
145
146 Basic Types
147 year = 4 * digit ;
148 month = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' ;
149 digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
150 hex digit = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
151
152 NQN Definition
153 NVMe Qualified Name = ( NVMe-oF Discovery NQN | NVMe UUID NQN | NVMe Domain NQN ), '\0' ;
154 NVMe-oF Discovery NQN = "nqn.2014-08.org.nvmexpress.discovery" ;
155 NVMe UUID NQN = "nqn.2014-08.org.nvmexpress:uuid:", string UUID ;
156 string UUID = 8 * hex digit, '-', 3 * (4 * hex digit, '-'), 12 * hex digit ;
157 NVMe Domain NQN = "nqn.", year, '-', month, '.', reverse domain, ':', utf-8 string ;
158
159 ~~~
160
161 Please note that the following types from the definition above are defined elsewhere:
162 1. utf-8 string: Defined in [rfc 3629](https://tools.ietf.org/html/rfc3629).
163 2. reverse domain: Equivalent to domain name as defined in [rfc 1034](https://tools.ietf.org/html/rfc1034).
164
165 While not stated in the formal definition, SPDK enforces the requirement from the spec that the
166 "maximum name is 223 bytes in length". SPDK does not include the null terminating character when
167 defining the length of an nqn, and will accept an nqn containing up to 223 valid bytes with an
168 additional null terminator. To be precise, SPDK follows the same conventions as the c standard
169 library function [strlen()](http://man7.org/linux/man-pages/man3/strlen.3.html).
170
171 #### NQN Comparisons
172
173 SPDK compares NQNs byte for byte without case matching or unicode normalization. This has specific implications for
174 uuid based NQNs. The following pair of NQNs, for example, would not match when compared in the SPDK NVMe-oF Target:
175
176 nqn.2014-08.org.nvmexpress:uuid:11111111-aaaa-bbdd-ffee-123456789abc
177 nqn.2014-08.org.nvmexpress:uuid:11111111-AAAA-BBDD-FFEE-123456789ABC
178
179 In order to ensure the consistency of uuid based NQNs while using SPDK, users should use lowercase when representing
180 alphabetic hex digits in their NQNs.
181
182 ### Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
183
184 SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
185 to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
186 functions to assign threads to specific cores.
187 To ensure the SPDK NVMe-oF target has the best performance, configure the NICs and NVMe devices to
188 be located on the same NUMA node.
189
190 The `-m` core mask option specifies a bit mask of the CPU cores that
191 SPDK is allowed to execute work items on.
192 For example, to allow SPDK to use cores 24, 25, 26 and 27:
193 ~~~{.sh}
194 app/nvmf_tgt/nvmf_tgt -m 0xF000000
195 ~~~
196
197 ## Configuring the Linux NVMe over Fabrics Host {#nvmf_host}
198
199 Both the Linux kernel and SPDK implement an NVMe over Fabrics host.
200 The Linux kernel NVMe-oF RDMA host support is provided by the `nvme-rdma` driver.
201
202 ~~~{.sh}
203 modprobe nvme-rdma
204 ~~~
205
206 The nvme-cli tool may be used to interface with the Linux kernel NVMe over Fabrics host.
207
208 Discovery:
209 ~~~{.sh}
210 nvme discover -t rdma -a 192.168.100.8 -s 4420
211 ~~~
212
213 Connect:
214 ~~~{.sh}
215 nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 192.168.100.8 -s 4420
216 ~~~
217
218 Disconnect:
219 ~~~{.sh}
220 nvme disconnect -n "nqn.2016-06.io.spdk:cnode1"
221 ~~~
222
223 ## Enabling NVMe-oF target tracepoints for offline analysis and debug {#nvmf_trace}
224
225 SPDK has a tracing framework for capturing low-level event information at runtime.
226 @ref nvmf_tgt_tracepoints enable analysis of both performance and application crashes.