]> git.proxmox.com Git - mirror_qemu.git/blame - docs/nvdimm.txt
hostmem-file: add readonly=on|off option
[mirror_qemu.git] / docs / nvdimm.txt
CommitLineData
79c0f397
HZ
1QEMU Virtual NVDIMM
2===================
3
4This document explains the usage of virtual NVDIMM (vNVDIMM) feature
5which is available since QEMU v2.6.0.
6
7The current QEMU only implements the persistent memory mode of vNVDIMM
8device and not the block window mode.
9
10Basic Usage
11-----------
12
13The storage of a vNVDIMM device in QEMU is provided by the memory
14backend (i.e. memory-backend-file and memory-backend-ram). A simple
15way to create a vNVDIMM device at startup time is done via the
16following command line options:
17
18 -machine pc,nvdimm
19 -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
20 -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
21 -device nvdimm,id=nvdimm1,memdev=mem1
22
23Where,
24
25 - the "nvdimm" machine option enables vNVDIMM feature.
26
27 - "slots=$N" should be equal to or larger than the total amount of
28 normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here.
29
30 - "maxmem=$MAX_SIZE" should be equal to or larger than the total size
31 of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be
32 >= $RAM_SIZE + $NVDIMM_SIZE here.
33
34 - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE"
35 creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All
36 accesses to the virtual NVDIMM device go to the file $PATH.
37
38 "share=on/off" controls the visibility of guest writes. If
39 "share=on", then guest writes will be applied to the backend
40 file. If another guest uses the same backend file with option
41 "share=on", then above writes will be visible to it as well. If
42 "share=off", then guest writes won't be applied to the backend
43 file and thus will be invisible to other guests.
44
45 - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM
46 device whose storage is provided by above memory backend device.
47
48Multiple vNVDIMM devices can be created if multiple pairs of "-object"
49and "-device" are provided.
50
51For above command line options, if the guest OS has the proper NVDIMM
bd54b110
KC
52driver (e.g. "CONFIG_ACPI_NFIT=y" under Linux), it should be able to
53detect a NVDIMM device which is in the persistent memory mode and whose
54size is $NVDIMM_SIZE.
79c0f397
HZ
55
56Note:
57
581. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual
59 backend file size is not equal to the size given by "size" option,
60 QEMU will truncate the backend file by ftruncate(2), which will
61 corrupt the existing data in the backend file, especially for the
62 shrink case.
63
64 QEMU v2.8.0 and later check the backend file size and the "size"
65 option. If they do not match, QEMU will report errors and abort in
66 order to avoid the data corruption.
67
682. QEMU v2.6.0 only puts a basic alignment requirement on the "size"
69 option of memory-backend-file, e.g. 4KB alignment on x86. However,
70 QEMU v.2.7.0 puts an additional alignment requirement, which may
71 require a larger value than the basic one, e.g. 2MB on x86. This
72 change breaks the usage of memory-backend-file that only satisfies
73 the basic alignment.
74
75 QEMU v2.8.0 and later remove the additional alignment on non-s390x
76 architectures, so the broken memory-backend-file can work again.
77
78Label
79-----
80
81QEMU v2.7.0 and later implement the label support for vNVDIMM devices.
82To enable label on vNVDIMM devices, users can simply add
83"label-size=$SZ" option to "-device nvdimm", e.g.
84
85 -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K
86
87Note:
88
891. The minimal label size is 128KB.
90
912. QEMU v2.7.0 and later store labels at the end of backend storage.
92 If a memory backend file, which was previously used as the backend
93 of a vNVDIMM device without labels, is now used for a vNVDIMM
94 device with label, the data in the label area at the end of file
95 will be inaccessible to the guest. If any useful data (e.g. the
96 meta-data of the file system) was stored there, the latter usage
97 may result guest data corruption (e.g. breakage of guest file
98 system).
99
100Hotplug
101-------
102
103QEMU v2.8.0 and later implement the hotplug support for vNVDIMM
104devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is
105accomplished by two monitor commands "object_add" and "device_add".
106
107For example, the following commands add another 4GB vNVDIMM device to
108the guest:
109
110 (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G
111 (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2
112
113Note:
114
1151. Each hotplugged vNVDIMM device consumes one memory slot. Users
116 should always ensure the memory option "-m ...,slots=N" specifies
117 enough number of slots, i.e.
118 N >= number of RAM devices +
119 number of statically plugged vNVDIMM devices +
120 number of hotplugged vNVDIMM devices
121
1222. The similar is required for the memory option "-m ...,maxmem=M", i.e.
123 M >= size of RAM devices +
124 size of statically plugged vNVDIMM devices +
125 size of hotplugged vNVDIMM devices
98376843
HZ
126
127Alignment
128---------
129
130QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
131address to the page size (getpagesize(2)) by default. However, some
132types of backends may require an alignment different than the page
133size. In that case, QEMU v2.12.0 and later provide 'align' option to
134memory-backend-file to allow users to specify the proper alignment.
5f509751
JL
135For device dax (e.g., /dev/dax0.0), this alignment needs to match the
136alignment requirement of the device dax. The NUM of 'align=NUM' option
137must be larger than or equal to the 'align' of device dax.
138We can use one of the following commands to show the 'align' of device dax.
139
140 ndctl list -X
141 daxctl list -R
142
143In order to get the proper 'align' of device dax, you need to install
144the library 'libdaxctl'.
98376843
HZ
145
146For example, device dax require the 2 MB alignment, so we can use
147following QEMU command line options to use it (/dev/dax0.0) as the
148backend of vNVDIMM:
149
150 -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
151 -device nvdimm,id=nvdimm1,memdev=mem1
cb836434
HZ
152
153Guest Data Persistence
154----------------------
155
156Though QEMU supports multiple types of vNVDIMM backends on Linux,
119906af
ZY
157the only backend that can guarantee the guest write persistence is:
158
159A. DAX device (e.g., /dev/dax0.0, ) or
160B. DAX file(mounted with dax option)
161
162When using B (A file supporting direct mapping of persistent memory)
163as a backend, write persistence is guaranteed if the host kernel has
164support for the MAP_SYNC flag in the mmap system call (available
165since Linux 4.15 and on certain distro kernels) and additionally
166both 'pmem' and 'share' flags are set to 'on' on the backend.
167
168If these conditions are not satisfied i.e. if either 'pmem' or 'share'
169are not set, if the backend file does not support DAX or if MAP_SYNC
170is not supported by the host kernel, write persistence is not
171guaranteed after a system crash. For compatibility reasons, these
172conditions are ignored if not satisfied. Currently, no way is
173provided to test for them.
174For more details, please reference mmap(2) man page:
175http://man7.org/linux/man-pages/man2/mmap.2.html.
cb836434
HZ
176
177When using other types of backends, it's suggested to set 'unarmed'
178option of '-device nvdimm' to 'on', which sets the unarmed flag of the
179guest NVDIMM region mapping structure. This unarmed flag indicates
180guest software that this vNVDIMM device contains a region that cannot
181accept persistent writes. In result, for example, the guest Linux
182NVDIMM driver, marks such vNVDIMM device as read-only.
9ab3aad2 183
d8b92bd4
WY
184Backend File Setup Example
185--------------------------
186
187Here are two examples showing how to setup these persistent backends on
188linux using the tool ndctl [3].
189
190A. DAX device
191
192Use the following command to set up /dev/dax0.0 so that the entirety of
193namespace0.0 can be exposed as an emulated NVDIMM to the guest:
194
195 ndctl create-namespace -f -e namespace0.0 -m devdax
196
197The /dev/dax0.0 could be used directly in "mem-path" option.
198
199B. DAX file
200
201Individual files on a DAX host file system can be exposed as emulated
202NVDIMMS. First an fsdax block device is created, partitioned, and then
203mounted with the "dax" mount option:
204
205 ndctl create-namespace -f -e namespace0.0 -m fsdax
206 (partition /dev/pmem0 with name pmem0p1)
207 mount -o dax /dev/pmem0p1 /mnt
208 (create or copy a disk image file with qemu-img(1), cp(1), or dd(1)
209 in /mnt)
210
211Then the new file in /mnt could be used in "mem-path" option.
212
11c39b5c
RZ
213NVDIMM Persistence
214------------------
9ab3aad2
RZ
215
216ACPI 6.2 Errata A added support for a new Platform Capabilities Structure
217which allows the platform to communicate what features it supports related to
11c39b5c
RZ
218NVDIMM data persistence. Users can provide a persistence value to a guest via
219the optional "nvdimm-persistence" machine command line option:
9ab3aad2 220
11c39b5c 221 -machine pc,accel=kvm,nvdimm,nvdimm-persistence=cpu
9ab3aad2 222
11c39b5c 223There are currently two valid values for this option:
9ab3aad2 224
11c39b5c
RZ
225"mem-ctrl" - The platform supports flushing dirty data from the memory
226 controller to the NVDIMMs in the event of power loss.
9ab3aad2 227
11c39b5c
RZ
228"cpu" - The platform supports flushing dirty data from the CPU cache to
229 the NVDIMMs in the event of power loss. This implies that the
230 platform also supports flushing dirty data through the memory
231 controller on power loss.
a4de8552
JH
232
233If the vNVDIMM backend is in host persistent memory that can be accessed in
234SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's suggested to set
235the 'pmem' option of memory-backend-file to 'on'. When 'pmem' is 'on' and QEMU
236is built with libpmem [2] support (configured with --enable-libpmem), QEMU
237will take necessary operations to guarantee the persistence of its own writes
238to the vNVDIMM backend(e.g., in vNVDIMM label emulation and live migration).
239If 'pmem' is 'on' while there is no libpmem support, qemu will exit and report
240a "lack of libpmem support" message to ensure the persistence is available.
241For example, if we want to ensure the persistence for some backend file,
242use the QEMU command line:
243
244 -object memory-backend-file,id=nv_mem,mem-path=/XXX/yyy,size=4G,pmem=on
245
246References
247----------
248
249[1] NVM Programming Model (NPM)
250 Version 1.2
251 https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
252[2] Persistent Memory Development Kit (PMDK), formerly known as NVML project, home page:
253 http://pmem.io/pmdk/
d8b92bd4
WY
254[3] ndctl-create-namespace - provision or reconfigure a namespace
255 http://pmem.io/ndctl/ndctl-create-namespace.html