]> git.proxmox.com Git - ceph.git/blame - ceph/src/dpdk/doc/guides/sample_app_ug/vm_power_management.rst
bump version to 12.2.12-pve1
[ceph.git] / ceph / src / dpdk / doc / guides / sample_app_ug / vm_power_management.rst
CommitLineData
7c673cae
FG
1.. BSD LICENSE
2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3 All rights reserved.
4
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
7 are met:
8
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
14 distribution.
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
18
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31VM Power Management Application
32===============================
33
34Introduction
35------------
36
37Applications running in Virtual Environments have an abstract view of
38the underlying hardware on the Host, in particular applications cannot see
39the binding of virtual to physical hardware.
40When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to
41Host Physical CPUs(pCPUS) is not apparent to an application
42and this pinning may change over time.
43Furthermore, Operating Systems on virtual machines do not have the ability
44to govern their own power policy; the Machine Specific Registers (MSRs)
45for enabling P-State transitions are not exposed to Operating Systems
46running on Virtual Machines(VMs).
47
48The Virtual Machine Power Management solution shows an example of
49how a DPDK application can indicate its processing requirements using VM local
50only information(vCPU/lcore) to a Host based Monitor which is responsible
51for accepting requests for frequency changes for a vCPU, translating the vCPU
52to a pCPU via libvirt and affecting the change in frequency.
53
54The solution is comprised of two high-level components:
55
56#. Example Host Application
57
58 Using a Command Line Interface(CLI) for VM->Host communication channel management
59 allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning,
60 inspecting and manually changing the frequency for each CPU.
61 The CLI runs on a single lcore while the thread responsible for managing
62 VM requests runs on a second lcore.
63
64 VM requests arriving on a channel for frequency changes are passed
65 to the librte_power ACPI cpufreq sysfs based library.
66 The Host Application relies on both qemu-kvm and libvirt to function.
67
68#. librte_power for Virtual Machines
69
70 Using an alternate implementation for the librte_power API, requests for
71 frequency changes are forwarded to the host monitor rather than
72 the APCI cpufreq sysfs interface used on the host.
73
74 The l3fwd-power application will use this implementation when deployed on a VM
75 (see :doc:`l3_forward_power_man`).
76
77.. _figure_vm_power_mgr_highlevel:
78
79.. figure:: img/vm_power_mgr_highlevel.*
80
81 Highlevel Solution
82
83
84Overview
85--------
86
87VM Power Management employs qemu-kvm to provide communications channels
88between the host and VMs in the form of Virtio-Serial which appears as
89a paravirtualized serial device on a VM and can be configured to use
90various backends on the host. For this example each Virtio-Serial endpoint
91on the host is configured as AF_UNIX file socket, supporting poll/select
92and epoll for event notification.
93In this example each channel endpoint on the host is monitored via
94epoll for EPOLLIN events.
95Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM,
96where each VM can have a number of channels up to a maximum of 64 per VM,
97in this example each DPDK lcore on a VM has exclusive access to a channel.
98
99To enable frequency changes from within a VM, a request via the librte_power interface
100is forwarded via Virtio-Serial to the host, each request contains the vCPU
101and power command(scale up/down/min/max).
102The API for host and guest librte_power is consistent across environments,
103with the selection of VM or Host Implementation determined at automatically
104at runtime based on the environment.
105
106Upon receiving a request, the host translates the vCPU to a pCPU via
107the libvirt API before forwarding to the host librte_power.
108
109.. _figure_vm_power_mgr_vm_request_seq:
110
111.. figure:: img/vm_power_mgr_vm_request_seq.*
112
113 VM request to scale frequency
114
115
116Performance Considerations
117~~~~~~~~~~~~~~~~~~~~~~~~~~
118
119While Haswell Microarchitecture allows for independent power control for each core,
120earlier Microarchtectures do not offer such fine grained control.
121When deployed on pre-Haswell platforms greater care must be taken in selecting
122which cores are assigned to a VM, for instance a core will not scale down
123until its sibling is similarly scaled.
124
125Configuration
126-------------
127
128BIOS
129~~~~
130
131Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS
132if the power management feature of DPDK is to be used.
133Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist,
134and the CPU frequency-based power management cannot be used.
135Consult the relevant BIOS documentation to determine how these settings
136can be accessed.
137
138Host Operating System
139~~~~~~~~~~~~~~~~~~~~~
140
141The Host OS must also have the *apci_cpufreq* module installed, in some cases
142the *intel_pstate* driver may be the default Power Management environment.
143To enable *acpi_cpufreq* and disable *intel_pstate*, add the following
144to the grub Linux command line:
145
146.. code-block:: console
147
148 intel_pstate=disable
149
150Upon rebooting, load the *acpi_cpufreq* module:
151
152.. code-block:: console
153
154 modprobe acpi_cpufreq
155
156Hypervisor Channel Configuration
157~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158
159Virtio-Serial channels are configured via libvirt XML:
160
161
162.. code-block:: xml
163
164 <name>{vm_name}</name>
165 <controller type='virtio-serial' index='0'>
166 <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
167 </controller>
168 <channel type='unix'>
169 <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/>
170 <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}'/>
171 <address type='virtio-serial' controller='0' bus='0' port='{N}'/>
172 </channel>
173
174
175Where a single controller of type *virtio-serial* is created and up to 32 channels
176can be associated with a single controller and multiple controllers can be specified.
177The convention is to use the name of the VM in the host path *{vm_name}* and
178to increment *{channel_num}* for each channel, likewise the port value *{N}*
179must be incremented for each channel.
180
181Each channel on the host will appear in *path*, the directory */tmp/powermonitor/*
182must first be created and given qemu permissions
183
184.. code-block:: console
185
186 mkdir /tmp/powermonitor/
187 chown qemu:qemu /tmp/powermonitor
188
189Note that files and directories within /tmp are generally removed upon
190rebooting the host and the above steps may need to be carried out after each reboot.
191
192The serial device as it appears on a VM is configured with the *target* element attribute *name*
193and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*,
194where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications.
195
196Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}*
197
198Compiling and Running the Host Application
199------------------------------------------
200
201Compiling
202~~~~~~~~~
203
204#. export RTE_SDK=/path/to/rte_sdk
205#. cd ${RTE_SDK}/examples/vm_power_manager
206#. make
207
208Running
209~~~~~~~
210
211The application does not have any specific command line options other than *EAL*:
212
213.. code-block:: console
214
215 ./build/vm_power_mgr [EAL options]
216
217The application requires exactly two cores to run, one core is dedicated to the CLI,
218while the other is dedicated to the channel endpoint monitor, for example to run
219on cores 0 & 1 on a system with 4 memory channels:
220
221.. code-block:: console
222
223 ./build/vm_power_mgr -c 0x3 -n 4
224
225After successful initialization the user is presented with VM Power Manager CLI:
226
227.. code-block:: console
228
229 vm_power>
230
231Virtual Machines can now be added to the VM Power Manager:
232
233.. code-block:: console
234
235 vm_power> add_vm {vm_name}
236
237When a {vm_name} is specified with the *add_vm* command a lookup is performed
238with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier
239to associate channels with a particular VM and for executing operations on a VM within the CLI.
240VMs do not have to be running in order to add them.
241
242A number of commands can be issued via the CLI in relation to VMs:
243
244 Remove a Virtual Machine identified by {vm_name} from the VM Power Manager.
245
246 .. code-block:: console
247
248 rm_vm {vm_name}
249
250 Add communication channels for the specified VM, the virtio channels must be enabled
251 in the VM configuration(qemu/libvirt) and the associated VM must be active.
252 {list} is a comma-separated list of channel numbers to add, using the keyword 'all'
253 will attempt to add all channels for the VM:
254
255 .. code-block:: console
256
257 add_channels {vm_name} {list}|all
258
259 Enable or disable the communication channels in {list}(comma-separated)
260 for the specified VM, alternatively list can be replaced with keyword 'all'.
261 Disabled channels will still receive packets on the host, however the commands
262 they specify will be ignored. Set status to 'enabled' to begin processing requests again:
263
264 .. code-block:: console
265
266 set_channel_status {vm_name} {list}|all enabled|disabled
267
268 Print to the CLI the information on the specified VM, the information
269 lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with
270 any communication channels associated with each VM, along with the status of each channel:
271
272 .. code-block:: console
273
274 show_vm {vm_name}
275
276 Set the binding of Virtual CPU on VM with name {vm_name} to the Physical CPU mask:
277
278 .. code-block:: console
279
280 set_pcpu_mask {vm_name} {vcpu} {pcpu}
281
282 Set the binding of Virtual CPU on VM to the Physical CPU:
283
284 .. code-block:: console
285
286 set_pcpu {vm_name} {vcpu} {pcpu}
287
288Manual control and inspection can also be carried in relation CPU frequency scaling:
289
290 Get the current frequency for each core specified in the mask:
291
292 .. code-block:: console
293
294 show_cpu_freq_mask {mask}
295
296 Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max:
297
298 .. code-block:: console
299
300 set_cpu_freq {core_mask} up|down|min|max
301
302 Get the current frequency for the specified core:
303
304 .. code-block:: console
305
306 show_cpu_freq {core_num}
307
308 Set the current frequency for the specified core by scaling up/down/min/max:
309
310 .. code-block:: console
311
312 set_cpu_freq {core_num} up|down|min|max
313
314Compiling and Running the Guest Applications
315--------------------------------------------
316
317For compiling and running l3fwd-power, see :doc:`l3_forward_power_man`.
318
319A guest CLI is also provided for validating the setup.
320
321For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the
322host application using the *add_channels* command on the host.
323
324Compiling
325~~~~~~~~~
326
327#. export RTE_SDK=/path/to/rte_sdk
328#. cd ${RTE_SDK}/examples/vm_power_manager/guest_cli
329#. make
330
331Running
332~~~~~~~
333
334The application does not have any specific command line options other than *EAL*:
335
336.. code-block:: console
337
338 ./build/vm_power_mgr [EAL options]
339
340The application for example purposes uses a channel for each lcore enabled,
341for example to run on cores 0,1,2,3 on a system with 4 memory channels:
342
343.. code-block:: console
344
345 ./build/guest_vm_power_mgr -c 0xf -n 4
346
347
348After successful initialization the user is presented with VM Power Manager Guest CLI:
349
350.. code-block:: console
351
352 vm_power(guest)>
353
354To change the frequency of a lcore, use the set_cpu_freq command.
355Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max.
356
357.. code-block:: console
358
359 set_cpu_freq {core_num} up|down|min|max