]> git.proxmox.com Git - mirror_qemu.git/blame - docs/system/devices/nvme.rst
docs: update hw/nvme documentation for TP4146
[mirror_qemu.git] / docs / system / devices / nvme.rst
CommitLineData
a3d9f3a9
KJ
1==============
2NVMe Emulation
3==============
4
5QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
6``nvme-subsys`` devices.
7
8See the following sections for specific information on
9
10 * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
11 * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
12 `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
13 Protection`_,
14
15Adding NVMe Devices
16===================
17
18Controller Emulation
19--------------------
20
21The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
22specification. All mandatory features are implement with a couple of exceptions
23and limitations:
24
25 * Accounting numbers in the SMART/Health log page are reset when the device
26 is power cycled.
27 * Interrupt Coalescing is not supported and is disabled by default.
28
29The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
30following parameters:
31
32.. code-block:: console
33
34 -drive file=nvm.img,if=none,id=nvm
35 -device nvme,serial=deadbeef,drive=nvm
36
37There are a number of optional general parameters for the ``nvme`` device. Some
38are mentioned here, but see ``-device nvme,help`` to list all possible
39parameters.
40
41``max_ioqpairs=UINT32`` (default: ``64``)
42 Set the maximum number of allowed I/O queue pairs. This replaces the
43 deprecated ``num_queues`` parameter.
44
45``msix_qsize=UINT16`` (default: ``65``)
46 The number of MSI-X vectors that the device should support.
47
48``mdts=UINT8`` (default: ``7``)
49 Set the Maximum Data Transfer Size of the device.
50
51``use-intel-id`` (default: ``off``)
52 Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
53 Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
54 previously used.
55
56Additional Namespaces
57---------------------
58
59In the simplest possible invocation sketched above, the device only support a
60single namespace with the namespace identifier ``1``. To support multiple
61namespaces and additional features, the ``nvme-ns`` device must be used.
62
63.. code-block:: console
64
65 -device nvme,id=nvme-ctrl-0,serial=deadbeef
66 -drive file=nvm-1.img,if=none,id=nvm-1
67 -device nvme-ns,drive=nvm-1
68 -drive file=nvm-2.img,if=none,id=nvm-2
69 -device nvme-ns,drive=nvm-2
70
71The namespaces defined by the ``nvme-ns`` device will attach to the most
72recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
b980c1ae 73identifiers are allocated automatically, starting from ``1``.
a3d9f3a9
KJ
74
75There are a number of parameters available:
76
77``nsid`` (default: ``0``)
78 Explicitly set the namespace identifier.
79
80``uuid`` (default: *autogenerated*)
81 Set the UUID of the namespace. This will be reported as a "Namespace UUID"
82 descriptor in the Namespace Identification Descriptor List.
83
6870cfb8
HS
84``eui64``
85 Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended
86 Unique Identifier" descriptor in the Namespace Identification Descriptor List.
3276dde4
HS
87 Since machine type 6.1 a non-zero default value is used if the parameter
88 is not provided. For earlier machine types the field defaults to 0.
6870cfb8 89
a3d9f3a9
KJ
90``bus``
91 If there are more ``nvme`` devices defined, this parameter may be used to
92 attach the namespace to a specific ``nvme`` device (identified by an ``id``
93 parameter on the controller device).
94
95NVM Subsystems
96--------------
97
98Additional features becomes available if the controller device (``nvme``) is
99linked to an NVM Subsystem device (``nvme-subsys``).
100
101The NVM Subsystem emulation allows features such as shared namespaces and
102multipath I/O.
103
104.. code-block:: console
105
106 -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0
146b5fa5
NC
107 -device nvme,serial=deadbeef,subsys=nvme-subsys-0
108 -device nvme,serial=deadbeef,subsys=nvme-subsys-0
a3d9f3a9
KJ
109
110This will create an NVM subsystem with two controllers. Having controllers
111linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters:
112
916b0f0b 113``shared`` (default: ``on`` since 6.2)
a3d9f3a9 114 Specifies that the namespace will be attached to all controllers in the
916b0f0b
KJ
115 subsystem. If set to ``off``, the namespace will remain a private namespace
116 and may only be attached to a single controller at a time. Shared namespaces
117 are always automatically attached to all controllers (also when controllers
118 are hotplugged).
a3d9f3a9
KJ
119
120``detached`` (default: ``off``)
121 If set to ``on``, the namespace will be be available in the subsystem, but
916b0f0b
KJ
122 not attached to any controllers initially. A shared namespace with this set
123 to ``on`` will never be automatically attached to controllers.
a3d9f3a9
KJ
124
125Thus, adding
126
127.. code-block:: console
128
129 -drive file=nvm-1.img,if=none,id=nvm-1
916b0f0b 130 -device nvme-ns,drive=nvm-1,nsid=1
a3d9f3a9 131 -drive file=nvm-2.img,if=none,id=nvm-2
916b0f0b 132 -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on
a3d9f3a9 133
916b0f0b
KJ
134will cause NSID 1 will be a shared namespace that is initially attached to both
135controllers. NSID 3 will be a private namespace due to ``shared=off`` and only
136attachable to a single controller at a time. Additionally it will not be
137attached to any controller initially (due to ``detached=on``) or to hotplugged
138controllers.
a3d9f3a9
KJ
139
140Optional Features
141=================
142
143Controller Memory Buffer
144------------------------
145
146``nvme`` device parameters related to the Controller Memory Buffer support:
147
148``cmb_size_mb=UINT32`` (default: ``0``)
149 This adds a Controller Memory Buffer of the given size at offset zero in BAR
150 2.
151
152``legacy-cmb`` (default: ``off``)
153 By default, the device uses the "v1.4 scheme" for the Controller Memory
154 Buffer support (i.e, the CMB is initially disabled and must be explicitly
155 enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the
156 CMB.
157
158Simple Copy
159-----------
160
161The device includes support for TP 4065 ("Simple Copy Command"). A number of
162additional ``nvme-ns`` device parameters may be used to control the Copy
163command limits:
164
165``mssrl=UINT16`` (default: ``128``)
166 Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum
167 number of logical blocks that may be specified in each source range.
168
169``mcl=UINT32`` (default: ``128``)
170 Set the Maximum Copy Length (``MCL``). This is the maximum number of logical
171 blocks that may be specified in a Copy command (the total for all source
172 ranges).
173
174``msrc=UINT8`` (default: ``127``)
175 Set the Maximum Source Range Count (``MSRC``). This is the maximum number of
176 source ranges that may be used in a Copy command. This is a 0's based value.
177
178Zoned Namespaces
179----------------
180
181A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set
182``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace.
183
184The namespace may be configured with additional parameters
185
186``zoned.zone_size=SIZE`` (default: ``128MiB``)
187 Define the zone size (``ZSZE``).
188
189``zoned.zone_capacity=SIZE`` (default: ``0``)
190 Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone
191 capacity will equal the zone size.
192
193``zoned.descr_ext_size=UINT32`` (default: ``0``)
194 Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64
195 bytes.
196
197``zoned.cross_read=BOOL`` (default: ``off``)
198 Set to ``on`` to allow reads to cross zone boundaries.
199
200``zoned.max_active=UINT32`` (default: ``0``)
201 Set the maximum number of active resources (``MAR``). The default (``0``)
202 allows all zones to be active.
203
204``zoned.max_open=UINT32`` (default: ``0``)
205 Set the maximum number of open resources (``MOR``). The default (``0``)
206 allows all zones to be open. If ``zoned.max_active`` is specified, this value
207 must be less than or equal to that.
208
176c0a49
KB
209``zoned.zasl=UINT8`` (default: ``0``)
210 Set the maximum data transfer size for the Zone Append command. Like
211 ``mdts``, the value is specified as a power of two (2^n) and is in units of
212 the minimum memory page size (CAP.MPSMIN). The default value (``0``)
213 has this property inherit the ``mdts`` value.
214
e409c905
KJ
215Flexible Data Placement
216-----------------------
217
218The device may be configured to support TP4146 ("Flexible Data Placement") by
219configuring it (``fdp=on``) on the subsystem::
220
221 -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0,fdp=on,fdp.nruh=16
222
223The subsystem emulates a single Endurance Group, on which Flexible Data
224Placement will be supported. Also note that the device emulation deviates
225slightly from the specification, by always enabling the "FDP Mode" feature on
226the controller if the subsystems is configured for Flexible Data Placement.
227
228Enabling Flexible Data Placement on the subsyste enables the following
229parameters:
230
231``fdp.nrg`` (default: ``1``)
232 Set the number of Reclaim Groups.
233
234``fdp.nruh`` (default: ``0``)
235 Set the number of Reclaim Unit Handles. This is a mandatory paramater and
236 must be non-zero.
237
238``fdp.runs`` (default: ``96M``)
239 Set the Reclaim Unit Nominal Size. Defaults to 96 MiB.
240
241Namespaces within this subsystem may requests Reclaim Unit Handles::
242
243 -device nvme-ns,drive=nvm-1,fdp.ruhs=RUHLIST
244
245The ``RUHLIST`` is a semicolon separated list (i.e. ``0;1;2;3``) and may
246include ranges (i.e. ``0;8-15``). If no reclaim unit handle list is specified,
247the controller will assign the controller-specified reclaim unit handle to
248placement handle identifier 0.
249
a3d9f3a9
KJ
250Metadata
251--------
252
253The virtual namespace device supports LBA metadata in the form separate
254metadata (``MPTR``-based) and extended LBAs.
255
256``ms=UINT16`` (default: ``0``)
257 Defines the number of metadata bytes per LBA.
258
259``mset=UINT8`` (default: ``0``)
260 Set to ``1`` to enable extended LBAs.
261
262End-to-End Data Protection
263--------------------------
264
265The virtual namespace device supports DIF- and DIX-based protection information
266(depending on ``mset``).
267
268``pi=UINT8`` (default: ``0``)
269 Enable protection information of the specified type (type ``1``, ``2`` or
270 ``3``).
271
272``pil=UINT8`` (default: ``0``)
273 Controls the location of the protection information within the metadata. Set
274 to ``1`` to transfer protection information as the first eight bytes of
275 metadata. Otherwise, the protection information is transferred as the last
276 eight bytes.
751babf5
LM
277
278Virtualization Enhancements and SR-IOV (Experimental Support)
279-------------------------------------------------------------
280
281The ``nvme`` device supports Single Root I/O Virtualization and Sharing
282along with Virtualization Enhancements. The controller has to be linked to
283an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV.
284
285A number of parameters are present (**please note, that they may be
286subject to change**):
287
288``sriov_max_vfs`` (default: ``0``)
289 Indicates the maximum number of PCIe virtual functions supported
290 by the controller. Specifying a non-zero value enables reporting of both
291 SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities
292 by the NVMe device. Virtual function controllers will not report SR-IOV.
293
294``sriov_vq_flexible``
295 Indicates the total number of flexible queue resources assignable to all
296 the secondary controllers. Implicitly sets the number of primary
297 controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``.
298
299``sriov_vi_flexible``
300 Indicates the total number of flexible interrupt resources assignable to
301 all the secondary controllers. Implicitly sets the number of primary
302 controller's private resources to ``(msix_qsize - sriov_vi_flexible)``.
303
304``sriov_max_vi_per_vf`` (default: ``0``)
305 Indicates the maximum number of virtual interrupt resources assignable
306 to a secondary controller. The default ``0`` resolves to
307 ``(sriov_vi_flexible / sriov_max_vfs)``
308
309``sriov_max_vq_per_vf`` (default: ``0``)
310 Indicates the maximum number of virtual queue resources assignable to
311 a secondary controller. The default ``0`` resolves to
312 ``(sriov_vq_flexible / sriov_max_vfs)``
313
314The simplest possible invocation enables the capability to set up one VF
315controller and assign an admin queue, an IO queue, and a MSI-X interrupt.
316
317.. code-block:: console
318
319 -device nvme-subsys,id=subsys0
320 -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,
321 sriov_vq_flexible=2,sriov_vi_flexible=1
322
323The minimum steps required to configure a functional NVMe secondary
324controller are:
325
326 * unbind flexible resources from the primary controller
327
328.. code-block:: console
329
330 nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
331 nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
332
333 * perform a Function Level Reset on the primary controller to actually
334 release the resources
335
336.. code-block:: console
337
338 echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
339
340 * enable VF
341
342.. code-block:: console
343
344 echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
345
346 * assign the flexible resources to the VF and set it ONLINE
347
348.. code-block:: console
349
350 nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
351 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
352 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
353
354 * bind the NVMe driver to the VF
355
356.. code-block:: console
357
e409c905 358 echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind