docs/nvdimm.txt

   1 QEMU Virtual NVDIMM
   2 ===================
   3
   4 This document explains the usage of virtual NVDIMM (vNVDIMM) feature
   5 which is available since QEMU v2.6.0.
   6
   7 The current QEMU only implements the persistent memory mode of vNVDIMM
   8 device and not the block window mode.
   9
  10 Basic Usage
  11 -----------
  12
  13 The storage of a vNVDIMM device in QEMU is provided by the memory
  14 backend (i.e. memory-backend-file and memory-backend-ram). A simple
  15 way to create a vNVDIMM device at startup time is done via the
  16 following command line options:
  17
  18  -machine pc,nvdimm
  19  -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
  20  -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
  21  -device nvdimm,id=nvdimm1,memdev=mem1
  22
  23 Where,
  24
  25  - the "nvdimm" machine option enables vNVDIMM feature.
  26
  27  - "slots=$N" should be equal to or larger than the total amount of
  28    normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here.
  29
  30  - "maxmem=$MAX_SIZE" should be equal to or larger than the total size
  31    of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be
  32    >= $RAM_SIZE + $NVDIMM_SIZE here.
  33
  34  - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE"
  35    creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All
  36    accesses to the virtual NVDIMM device go to the file $PATH.
  37
  38    "share=on/off" controls the visibility of guest writes. If
  39    "share=on", then guest writes will be applied to the backend
  40    file. If another guest uses the same backend file with option
  41    "share=on", then above writes will be visible to it as well. If
  42    "share=off", then guest writes won't be applied to the backend
  43    file and thus will be invisible to other guests.
  44
  45  - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM
  46    device whose storage is provided by above memory backend device.
  47
  48 Multiple vNVDIMM devices can be created if multiple pairs of "-object"
  49 and "-device" are provided.
  50
  51 For above command line options, if the guest OS has the proper NVDIMM
  52 driver, it should be able to detect a NVDIMM device which is in the
  53 persistent memory mode and whose size is $NVDIMM_SIZE.
  54
  55 Note:
  56
  57 1. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual
  58    backend file size is not equal to the size given by "size" option,
  59    QEMU will truncate the backend file by ftruncate(2), which will
  60    corrupt the existing data in the backend file, especially for the
  61    shrink case.
  62
  63    QEMU v2.8.0 and later check the backend file size and the "size"
  64    option. If they do not match, QEMU will report errors and abort in
  65    order to avoid the data corruption.
  66
  67 2. QEMU v2.6.0 only puts a basic alignment requirement on the "size"
  68    option of memory-backend-file, e.g. 4KB alignment on x86.  However,
  69    QEMU v.2.7.0 puts an additional alignment requirement, which may
  70    require a larger value than the basic one, e.g. 2MB on x86. This
  71    change breaks the usage of memory-backend-file that only satisfies
  72    the basic alignment.
  73
  74    QEMU v2.8.0 and later remove the additional alignment on non-s390x
  75    architectures, so the broken memory-backend-file can work again.
  76
  77 Label
  78 -----
  79
  80 QEMU v2.7.0 and later implement the label support for vNVDIMM devices.
  81 To enable label on vNVDIMM devices, users can simply add
  82 "label-size=$SZ" option to "-device nvdimm", e.g.
  83
  84  -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K
  85
  86 Note:
  87
  88 1. The minimal label size is 128KB.
  89
  90 2. QEMU v2.7.0 and later store labels at the end of backend storage.
  91    If a memory backend file, which was previously used as the backend
  92    of a vNVDIMM device without labels, is now used for a vNVDIMM
  93    device with label, the data in the label area at the end of file
  94    will be inaccessible to the guest. If any useful data (e.g. the
  95    meta-data of the file system) was stored there, the latter usage
  96    may result guest data corruption (e.g. breakage of guest file
  97    system).
  98
  99 Hotplug
 100 -------
 101
 102 QEMU v2.8.0 and later implement the hotplug support for vNVDIMM
 103 devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is
 104 accomplished by two monitor commands "object_add" and "device_add".
 105
 106 For example, the following commands add another 4GB vNVDIMM device to
 107 the guest:
 108
 109  (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G
 110  (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2
 111
 112 Note:
 113
 114 1. Each hotplugged vNVDIMM device consumes one memory slot. Users
 115    should always ensure the memory option "-m ...,slots=N" specifies
 116    enough number of slots, i.e.
 117      N >= number of RAM devices +
 118           number of statically plugged vNVDIMM devices +
 119           number of hotplugged vNVDIMM devices
 120
 121 2. The similar is required for the memory option "-m ...,maxmem=M", i.e.
 122      M >= size of RAM devices +
 123           size of statically plugged vNVDIMM devices +
 124           size of hotplugged vNVDIMM devices
 125
 126 Alignment
 127 ---------
 128
 129 QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
 130 address to the page size (getpagesize(2)) by default. However, some
 131 types of backends may require an alignment different than the page
 132 size. In that case, QEMU v2.12.0 and later provide 'align' option to
 133 memory-backend-file to allow users to specify the proper alignment.
 134
 135 For example, device dax require the 2 MB alignment, so we can use
 136 following QEMU command line options to use it (/dev/dax0.0) as the
 137 backend of vNVDIMM:
 138
 139  -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
 140  -device nvdimm,id=nvdimm1,memdev=mem1
 141
 142 Guest Data Persistence
 143 ----------------------
 144
 145 Though QEMU supports multiple types of vNVDIMM backends on Linux,
 146 currently the only one that can guarantee the guest write persistence
 147 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 148 which all guest access do not involve any host-side kernel cache.
 149
 150 When using other types of backends, it's suggested to set 'unarmed'
 151 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 152 guest NVDIMM region mapping structure.  This unarmed flag indicates
 153 guest software that this vNVDIMM device contains a region that cannot
 154 accept persistent writes. In result, for example, the guest Linux
 155 NVDIMM driver, marks such vNVDIMM device as read-only.
 156
 157 Platform Capabilities
 158 ---------------------
 159
 160 ACPI 6.2 Errata A added support for a new Platform Capabilities Structure
 161 which allows the platform to communicate what features it supports related to
 162 NVDIMM data durability.  Users can provide a capabilities value to a guest via
 163 the optional "nvdimm-cap" machine command line option:
 164
 165     -machine pc,accel=kvm,nvdimm,nvdimm-cap=2
 166
 167 This "nvdimm-cap" field is an integer, and is the combined value of the
 168 various capability bits defined in table 5-137 of the ACPI 6.2 Errata A spec.
 169
 170 Here is a quick summary of the three bits that are defined as of that spec:
 171
 172 Bit[0] - CPU Cache Flush to NVDIMM Durability on Power Loss Capable.
 173 Bit[1] - Memory Controller Flush to NVDIMM Durability on Power Loss Capable.
 174          Note: If bit 0 is set to 1 then this bit shall be set to 1 as well.
 175 Bit[2] - Byte Addressable Persistent Memory Hardware Mirroring Capable.
 176
 177 So, a "nvdimm-cap" value of 2 would mean that the platform supports Memory
 178 Controller Flush on Power Loss, a value of 3 would mean that the platform
 179 supports CPU Cache Flush and Memory Controller Flush on Power Loss, etc.
 180
 181 For a complete list of the flags available and for more detailed descriptions,
 182 please consult the ACPI spec.