5 QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
6 ``nvme-subsys`` devices.
8 See the following sections for specific information on
10 * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
11 * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
12 `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
21 The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
22 specification. All mandatory features are implement with a couple of exceptions
25 * Accounting numbers in the SMART/Health log page are reset when the device
27 * Interrupt Coalescing is not supported and is disabled by default.
29 The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
32 .. code-block:: console
34 -drive file=nvm.img,if=none,id=nvm
35 -device nvme,serial=deadbeef,drive=nvm
37 There are a number of optional general parameters for the ``nvme`` device. Some
38 are mentioned here, but see ``-device nvme,help`` to list all possible
41 ``max_ioqpairs=UINT32`` (default: ``64``)
42 Set the maximum number of allowed I/O queue pairs. This replaces the
43 deprecated ``num_queues`` parameter.
45 ``msix_qsize=UINT16`` (default: ``65``)
46 The number of MSI-X vectors that the device should support.
48 ``mdts=UINT8`` (default: ``7``)
49 Set the Maximum Data Transfer Size of the device.
51 ``use-intel-id`` (default: ``off``)
52 Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
53 Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
59 In the simplest possible invocation sketched above, the device only support a
60 single namespace with the namespace identifier ``1``. To support multiple
61 namespaces and additional features, the ``nvme-ns`` device must be used.
63 .. code-block:: console
65 -device nvme,id=nvme-ctrl-0,serial=deadbeef
66 -drive file=nvm-1.img,if=none,id=nvm-1
67 -device nvme-ns,drive=nvm-1
68 -drive file=nvm-2.img,if=none,id=nvm-2
69 -device nvme-ns,drive=nvm-2
71 The namespaces defined by the ``nvme-ns`` device will attach to the most
72 recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
73 identifiers are allocated automatically, starting from ``1``.
75 There are a number of parameters available:
77 ``nsid`` (default: ``0``)
78 Explicitly set the namespace identifier.
80 ``uuid`` (default: *autogenerated*)
81 Set the UUID of the namespace. This will be reported as a "Namespace UUID"
82 descriptor in the Namespace Identification Descriptor List.
85 Set the NGUID of the namespace. This will be reported as a "Namespace Globally
86 Unique Identifier" descriptor in the Namespace Identification Descriptor List.
87 It is specified as a string of hexadecimal digits containing exactly 16 bytes
88 or "auto" for a random value. An optional '-' separator could be used to group
89 bytes. If not specified the NGUID will remain all zeros.
92 Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended
93 Unique Identifier" descriptor in the Namespace Identification Descriptor List.
94 Since machine type 6.1 a non-zero default value is used if the parameter
95 is not provided. For earlier machine types the field defaults to 0.
98 If there are more ``nvme`` devices defined, this parameter may be used to
99 attach the namespace to a specific ``nvme`` device (identified by an ``id``
100 parameter on the controller device).
105 Additional features becomes available if the controller device (``nvme``) is
106 linked to an NVM Subsystem device (``nvme-subsys``).
108 The NVM Subsystem emulation allows features such as shared namespaces and
111 .. code-block:: console
113 -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0
114 -device nvme,serial=deadbeef,subsys=nvme-subsys-0
115 -device nvme,serial=deadbeef,subsys=nvme-subsys-0
117 This will create an NVM subsystem with two controllers. Having controllers
118 linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters:
120 ``shared`` (default: ``on`` since 6.2)
121 Specifies that the namespace will be attached to all controllers in the
122 subsystem. If set to ``off``, the namespace will remain a private namespace
123 and may only be attached to a single controller at a time. Shared namespaces
124 are always automatically attached to all controllers (also when controllers
127 ``detached`` (default: ``off``)
128 If set to ``on``, the namespace will be be available in the subsystem, but
129 not attached to any controllers initially. A shared namespace with this set
130 to ``on`` will never be automatically attached to controllers.
134 .. code-block:: console
136 -drive file=nvm-1.img,if=none,id=nvm-1
137 -device nvme-ns,drive=nvm-1,nsid=1
138 -drive file=nvm-2.img,if=none,id=nvm-2
139 -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on
141 will cause NSID 1 will be a shared namespace that is initially attached to both
142 controllers. NSID 3 will be a private namespace due to ``shared=off`` and only
143 attachable to a single controller at a time. Additionally it will not be
144 attached to any controller initially (due to ``detached=on``) or to hotplugged
150 Controller Memory Buffer
151 ------------------------
153 ``nvme`` device parameters related to the Controller Memory Buffer support:
155 ``cmb_size_mb=UINT32`` (default: ``0``)
156 This adds a Controller Memory Buffer of the given size at offset zero in BAR
159 ``legacy-cmb`` (default: ``off``)
160 By default, the device uses the "v1.4 scheme" for the Controller Memory
161 Buffer support (i.e, the CMB is initially disabled and must be explicitly
162 enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the
168 The device includes support for TP 4065 ("Simple Copy Command"). A number of
169 additional ``nvme-ns`` device parameters may be used to control the Copy
172 ``mssrl=UINT16`` (default: ``128``)
173 Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum
174 number of logical blocks that may be specified in each source range.
176 ``mcl=UINT32`` (default: ``128``)
177 Set the Maximum Copy Length (``MCL``). This is the maximum number of logical
178 blocks that may be specified in a Copy command (the total for all source
181 ``msrc=UINT8`` (default: ``127``)
182 Set the Maximum Source Range Count (``MSRC``). This is the maximum number of
183 source ranges that may be used in a Copy command. This is a 0's based value.
188 A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set
189 ``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace.
191 The namespace may be configured with additional parameters
193 ``zoned.zone_size=SIZE`` (default: ``128MiB``)
194 Define the zone size (``ZSZE``).
196 ``zoned.zone_capacity=SIZE`` (default: ``0``)
197 Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone
198 capacity will equal the zone size.
200 ``zoned.descr_ext_size=UINT32`` (default: ``0``)
201 Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64
204 ``zoned.cross_read=BOOL`` (default: ``off``)
205 Set to ``on`` to allow reads to cross zone boundaries.
207 ``zoned.max_active=UINT32`` (default: ``0``)
208 Set the maximum number of active resources (``MAR``). The default (``0``)
209 allows all zones to be active.
211 ``zoned.max_open=UINT32`` (default: ``0``)
212 Set the maximum number of open resources (``MOR``). The default (``0``)
213 allows all zones to be open. If ``zoned.max_active`` is specified, this value
214 must be less than or equal to that.
216 ``zoned.zasl=UINT8`` (default: ``0``)
217 Set the maximum data transfer size for the Zone Append command. Like
218 ``mdts``, the value is specified as a power of two (2^n) and is in units of
219 the minimum memory page size (CAP.MPSMIN). The default value (``0``)
220 has this property inherit the ``mdts`` value.
222 Flexible Data Placement
223 -----------------------
225 The device may be configured to support TP4146 ("Flexible Data Placement") by
226 configuring it (``fdp=on``) on the subsystem::
228 -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0,fdp=on,fdp.nruh=16
230 The subsystem emulates a single Endurance Group, on which Flexible Data
231 Placement will be supported. Also note that the device emulation deviates
232 slightly from the specification, by always enabling the "FDP Mode" feature on
233 the controller if the subsystems is configured for Flexible Data Placement.
235 Enabling Flexible Data Placement on the subsyste enables the following
238 ``fdp.nrg`` (default: ``1``)
239 Set the number of Reclaim Groups.
241 ``fdp.nruh`` (default: ``0``)
242 Set the number of Reclaim Unit Handles. This is a mandatory parameter and
245 ``fdp.runs`` (default: ``96M``)
246 Set the Reclaim Unit Nominal Size. Defaults to 96 MiB.
248 Namespaces within this subsystem may requests Reclaim Unit Handles::
250 -device nvme-ns,drive=nvm-1,fdp.ruhs=RUHLIST
252 The ``RUHLIST`` is a semicolon separated list (i.e. ``0;1;2;3``) and may
253 include ranges (i.e. ``0;8-15``). If no reclaim unit handle list is specified,
254 the controller will assign the controller-specified reclaim unit handle to
255 placement handle identifier 0.
260 The virtual namespace device supports LBA metadata in the form separate
261 metadata (``MPTR``-based) and extended LBAs.
263 ``ms=UINT16`` (default: ``0``)
264 Defines the number of metadata bytes per LBA.
266 ``mset=UINT8`` (default: ``0``)
267 Set to ``1`` to enable extended LBAs.
269 End-to-End Data Protection
270 --------------------------
272 The virtual namespace device supports DIF- and DIX-based protection information
273 (depending on ``mset``).
275 ``pi=UINT8`` (default: ``0``)
276 Enable protection information of the specified type (type ``1``, ``2`` or
279 ``pil=UINT8`` (default: ``0``)
280 Controls the location of the protection information within the metadata. Set
281 to ``1`` to transfer protection information as the first bytes of metadata.
282 Otherwise, the protection information is transferred as the last bytes of
285 ``pif=UINT8`` (default: ``0``)
286 By default, the namespace device uses 16 bit guard protection information
287 format (``pif=0``). Set to ``2`` to enable 64 bit guard protection
288 information format. This requires at least 16 bytes of metadata. Note that
289 ``pif=1`` (32 bit guards) is currently not supported.
291 Virtualization Enhancements and SR-IOV (Experimental Support)
292 -------------------------------------------------------------
294 The ``nvme`` device supports Single Root I/O Virtualization and Sharing
295 along with Virtualization Enhancements. The controller has to be linked to
296 an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV.
298 A number of parameters are present (**please note, that they may be
299 subject to change**):
301 ``sriov_max_vfs`` (default: ``0``)
302 Indicates the maximum number of PCIe virtual functions supported
303 by the controller. Specifying a non-zero value enables reporting of both
304 SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities
305 by the NVMe device. Virtual function controllers will not report SR-IOV.
307 ``sriov_vq_flexible``
308 Indicates the total number of flexible queue resources assignable to all
309 the secondary controllers. Implicitly sets the number of primary
310 controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``.
312 ``sriov_vi_flexible``
313 Indicates the total number of flexible interrupt resources assignable to
314 all the secondary controllers. Implicitly sets the number of primary
315 controller's private resources to ``(msix_qsize - sriov_vi_flexible)``.
317 ``sriov_max_vi_per_vf`` (default: ``0``)
318 Indicates the maximum number of virtual interrupt resources assignable
319 to a secondary controller. The default ``0`` resolves to
320 ``(sriov_vi_flexible / sriov_max_vfs)``
322 ``sriov_max_vq_per_vf`` (default: ``0``)
323 Indicates the maximum number of virtual queue resources assignable to
324 a secondary controller. The default ``0`` resolves to
325 ``(sriov_vq_flexible / sriov_max_vfs)``
327 The simplest possible invocation enables the capability to set up one VF
328 controller and assign an admin queue, an IO queue, and a MSI-X interrupt.
330 .. code-block:: console
332 -device nvme-subsys,id=subsys0
333 -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,
334 sriov_vq_flexible=2,sriov_vi_flexible=1
336 The minimum steps required to configure a functional NVMe secondary
339 * unbind flexible resources from the primary controller
341 .. code-block:: console
343 nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
344 nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
346 * perform a Function Level Reset on the primary controller to actually
347 release the resources
349 .. code-block:: console
351 echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
355 .. code-block:: console
357 echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
359 * assign the flexible resources to the VF and set it ONLINE
361 .. code-block:: console
363 nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
364 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
365 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
367 * bind the NVMe driver to the VF
369 .. code-block:: console
371 echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind