way to create a vNVDIMM device at startup time is done via the
following command line options:
- -machine pc,nvdimm
+ -machine pc,nvdimm=on
-m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
- -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
- -device nvdimm,id=nvdimm1,memdev=mem1
+ -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE,readonly=off
+ -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off
Where,
of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be
>= $RAM_SIZE + $NVDIMM_SIZE here.
- - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE"
- creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All
- accesses to the virtual NVDIMM device go to the file $PATH.
+ - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,
+ size=$NVDIMM_SIZE,readonly=off" creates a backend storage of size
+ $NVDIMM_SIZE on a file $PATH. All accesses to the virtual NVDIMM device go
+ to the file $PATH.
"share=on/off" controls the visibility of guest writes. If
"share=on", then guest writes will be applied to the backend
"share=off", then guest writes won't be applied to the backend
file and thus will be invisible to other guests.
- - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM
- device whose storage is provided by above memory backend device.
+ "readonly=on/off" controls whether the file $PATH is opened read-only or
+ read/write (default).
+
+ - "device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off" creates a read/write
+ virtual NVDIMM device whose storage is provided by above memory backend
+ device.
+
+ "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
+ State Flags" Bit 3 indicating that the device is "unarmed" and cannot accept
+ persistent writes. Linux guest drivers set the device to read-only when this
+ bit is present. Set unarmed to on when the memdev has readonly=on.
Multiple vNVDIMM devices can be created if multiple pairs of "-object"
and "-device" are provided.
For above command line options, if the guest OS has the proper NVDIMM
-driver, it should be able to detect a NVDIMM device which is in the
-persistent memory mode and whose size is $NVDIMM_SIZE.
+driver (e.g. "CONFIG_ACPI_NFIT=y" under Linux), it should be able to
+detect a NVDIMM device which is in the persistent memory mode and whose
+size is $NVDIMM_SIZE.
Note:
types of backends may require an alignment different than the page
size. In that case, QEMU v2.12.0 and later provide 'align' option to
memory-backend-file to allow users to specify the proper alignment.
+For device dax (e.g., /dev/dax0.0), this alignment needs to match the
+alignment requirement of the device dax. The NUM of 'align=NUM' option
+must be larger than or equal to the 'align' of device dax.
+We can use one of the following commands to show the 'align' of device dax.
+
+ ndctl list -X
+ daxctl list -R
+
+In order to get the proper 'align' of device dax, you need to install
+the library 'libdaxctl'.
For example, device dax require the 2 MB alignment, so we can use
following QEMU command line options to use it (/dev/dax0.0) as the
----------------------
Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are ignored if not satisfied. Currently, no way is
+provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
When using other types of backends, it's suggested to set 'unarmed'
option of '-device nvdimm' to 'on', which sets the unarmed flag of the
accept persistent writes. In result, for example, the guest Linux
NVDIMM driver, marks such vNVDIMM device as read-only.
+Backend File Setup Example
+--------------------------
+
+Here are two examples showing how to setup these persistent backends on
+linux using the tool ndctl [3].
+
+A. DAX device
+
+Use the following command to set up /dev/dax0.0 so that the entirety of
+namespace0.0 can be exposed as an emulated NVDIMM to the guest:
+
+ ndctl create-namespace -f -e namespace0.0 -m devdax
+
+The /dev/dax0.0 could be used directly in "mem-path" option.
+
+B. DAX file
+
+Individual files on a DAX host file system can be exposed as emulated
+NVDIMMS. First an fsdax block device is created, partitioned, and then
+mounted with the "dax" mount option:
+
+ ndctl create-namespace -f -e namespace0.0 -m fsdax
+ (partition /dev/pmem0 with name pmem0p1)
+ mount -o dax /dev/pmem0p1 /mnt
+ (create or copy a disk image file with qemu-img(1), cp(1), or dd(1)
+ in /mnt)
+
+Then the new file in /mnt could be used in "mem-path" option.
+
NVDIMM Persistence
------------------
the NVDIMMs in the event of power loss. This implies that the
platform also supports flushing dirty data through the memory
controller on power loss.
+
+If the vNVDIMM backend is in host persistent memory that can be accessed in
+SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's suggested to set
+the 'pmem' option of memory-backend-file to 'on'. When 'pmem' is 'on' and QEMU
+is built with libpmem [2] support (configured with --enable-libpmem), QEMU
+will take necessary operations to guarantee the persistence of its own writes
+to the vNVDIMM backend(e.g., in vNVDIMM label emulation and live migration).
+If 'pmem' is 'on' while there is no libpmem support, qemu will exit and report
+a "lack of libpmem support" message to ensure the persistence is available.
+For example, if we want to ensure the persistence for some backend file,
+use the QEMU command line:
+
+ -object memory-backend-file,id=nv_mem,mem-path=/XXX/yyy,size=4G,pmem=on
+
+References
+----------
+
+[1] NVM Programming Model (NPM)
+ Version 1.2
+ https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] Persistent Memory Development Kit (PMDK), formerly known as NVML project, home page:
+ http://pmem.io/pmdk/
+[3] ndctl-create-namespace - provision or reconfigure a namespace
+ http://pmem.io/ndctl/ndctl-create-namespace.html