]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. BSD LICENSE |
2 | Copyright(c) 2010-2014 Intel Corporation. All rights reserved. | |
3 | All rights reserved. | |
4 | ||
5 | Redistribution and use in source and binary forms, with or without | |
6 | modification, are permitted provided that the following conditions | |
7 | are met: | |
8 | ||
9 | * Redistributions of source code must retain the above copyright | |
10 | notice, this list of conditions and the following disclaimer. | |
11 | * Redistributions in binary form must reproduce the above copyright | |
12 | notice, this list of conditions and the following disclaimer in | |
13 | the documentation and/or other materials provided with the | |
14 | distribution. | |
15 | * Neither the name of Intel Corporation nor the names of its | |
16 | contributors may be used to endorse or promote products derived | |
17 | from this software without specific prior written permission. | |
18 | ||
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
20 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
21 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
22 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | |
23 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | |
24 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | |
25 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
26 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | |
27 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
28 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
30 | ||
31 | VM Power Management Application | |
32 | =============================== | |
33 | ||
34 | Introduction | |
35 | ------------ | |
36 | ||
37 | Applications running in Virtual Environments have an abstract view of | |
38 | the underlying hardware on the Host, in particular applications cannot see | |
39 | the binding of virtual to physical hardware. | |
40 | When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to | |
41 | Host Physical CPUs(pCPUS) is not apparent to an application | |
42 | and this pinning may change over time. | |
43 | Furthermore, Operating Systems on virtual machines do not have the ability | |
44 | to govern their own power policy; the Machine Specific Registers (MSRs) | |
45 | for enabling P-State transitions are not exposed to Operating Systems | |
46 | running on Virtual Machines(VMs). | |
47 | ||
48 | The Virtual Machine Power Management solution shows an example of | |
49 | how a DPDK application can indicate its processing requirements using VM local | |
50 | only information(vCPU/lcore) to a Host based Monitor which is responsible | |
51 | for accepting requests for frequency changes for a vCPU, translating the vCPU | |
52 | to a pCPU via libvirt and affecting the change in frequency. | |
53 | ||
54 | The solution is comprised of two high-level components: | |
55 | ||
56 | #. Example Host Application | |
57 | ||
58 | Using a Command Line Interface(CLI) for VM->Host communication channel management | |
59 | allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning, | |
60 | inspecting and manually changing the frequency for each CPU. | |
61 | The CLI runs on a single lcore while the thread responsible for managing | |
62 | VM requests runs on a second lcore. | |
63 | ||
64 | VM requests arriving on a channel for frequency changes are passed | |
65 | to the librte_power ACPI cpufreq sysfs based library. | |
66 | The Host Application relies on both qemu-kvm and libvirt to function. | |
67 | ||
68 | #. librte_power for Virtual Machines | |
69 | ||
70 | Using an alternate implementation for the librte_power API, requests for | |
71 | frequency changes are forwarded to the host monitor rather than | |
72 | the APCI cpufreq sysfs interface used on the host. | |
73 | ||
74 | The l3fwd-power application will use this implementation when deployed on a VM | |
75 | (see :doc:`l3_forward_power_man`). | |
76 | ||
77 | .. _figure_vm_power_mgr_highlevel: | |
78 | ||
79 | .. figure:: img/vm_power_mgr_highlevel.* | |
80 | ||
81 | Highlevel Solution | |
82 | ||
83 | ||
84 | Overview | |
85 | -------- | |
86 | ||
87 | VM Power Management employs qemu-kvm to provide communications channels | |
88 | between the host and VMs in the form of Virtio-Serial which appears as | |
89 | a paravirtualized serial device on a VM and can be configured to use | |
90 | various backends on the host. For this example each Virtio-Serial endpoint | |
91 | on the host is configured as AF_UNIX file socket, supporting poll/select | |
92 | and epoll for event notification. | |
93 | In this example each channel endpoint on the host is monitored via | |
94 | epoll for EPOLLIN events. | |
95 | Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM, | |
96 | where each VM can have a number of channels up to a maximum of 64 per VM, | |
97 | in this example each DPDK lcore on a VM has exclusive access to a channel. | |
98 | ||
99 | To enable frequency changes from within a VM, a request via the librte_power interface | |
100 | is forwarded via Virtio-Serial to the host, each request contains the vCPU | |
101 | and power command(scale up/down/min/max). | |
102 | The API for host and guest librte_power is consistent across environments, | |
103 | with the selection of VM or Host Implementation determined at automatically | |
104 | at runtime based on the environment. | |
105 | ||
106 | Upon receiving a request, the host translates the vCPU to a pCPU via | |
107 | the libvirt API before forwarding to the host librte_power. | |
108 | ||
109 | .. _figure_vm_power_mgr_vm_request_seq: | |
110 | ||
111 | .. figure:: img/vm_power_mgr_vm_request_seq.* | |
112 | ||
113 | VM request to scale frequency | |
114 | ||
115 | ||
116 | Performance Considerations | |
117 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
118 | ||
119 | While Haswell Microarchitecture allows for independent power control for each core, | |
120 | earlier Microarchtectures do not offer such fine grained control. | |
121 | When deployed on pre-Haswell platforms greater care must be taken in selecting | |
122 | which cores are assigned to a VM, for instance a core will not scale down | |
123 | until its sibling is similarly scaled. | |
124 | ||
125 | Configuration | |
126 | ------------- | |
127 | ||
128 | BIOS | |
129 | ~~~~ | |
130 | ||
131 | Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS | |
132 | if the power management feature of DPDK is to be used. | |
133 | Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist, | |
134 | and the CPU frequency-based power management cannot be used. | |
135 | Consult the relevant BIOS documentation to determine how these settings | |
136 | can be accessed. | |
137 | ||
138 | Host Operating System | |
139 | ~~~~~~~~~~~~~~~~~~~~~ | |
140 | ||
141 | The Host OS must also have the *apci_cpufreq* module installed, in some cases | |
142 | the *intel_pstate* driver may be the default Power Management environment. | |
143 | To enable *acpi_cpufreq* and disable *intel_pstate*, add the following | |
144 | to the grub Linux command line: | |
145 | ||
146 | .. code-block:: console | |
147 | ||
148 | intel_pstate=disable | |
149 | ||
150 | Upon rebooting, load the *acpi_cpufreq* module: | |
151 | ||
152 | .. code-block:: console | |
153 | ||
154 | modprobe acpi_cpufreq | |
155 | ||
156 | Hypervisor Channel Configuration | |
157 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
158 | ||
159 | Virtio-Serial channels are configured via libvirt XML: | |
160 | ||
161 | ||
162 | .. code-block:: xml | |
163 | ||
164 | <name>{vm_name}</name> | |
165 | <controller type='virtio-serial' index='0'> | |
166 | <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> | |
167 | </controller> | |
168 | <channel type='unix'> | |
169 | <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/> | |
170 | <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}'/> | |
171 | <address type='virtio-serial' controller='0' bus='0' port='{N}'/> | |
172 | </channel> | |
173 | ||
174 | ||
175 | Where a single controller of type *virtio-serial* is created and up to 32 channels | |
176 | can be associated with a single controller and multiple controllers can be specified. | |
177 | The convention is to use the name of the VM in the host path *{vm_name}* and | |
178 | to increment *{channel_num}* for each channel, likewise the port value *{N}* | |
179 | must be incremented for each channel. | |
180 | ||
181 | Each channel on the host will appear in *path*, the directory */tmp/powermonitor/* | |
182 | must first be created and given qemu permissions | |
183 | ||
184 | .. code-block:: console | |
185 | ||
186 | mkdir /tmp/powermonitor/ | |
187 | chown qemu:qemu /tmp/powermonitor | |
188 | ||
189 | Note that files and directories within /tmp are generally removed upon | |
190 | rebooting the host and the above steps may need to be carried out after each reboot. | |
191 | ||
192 | The serial device as it appears on a VM is configured with the *target* element attribute *name* | |
193 | and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*, | |
194 | where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications. | |
195 | ||
196 | Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}* | |
197 | ||
198 | Compiling and Running the Host Application | |
199 | ------------------------------------------ | |
200 | ||
201 | Compiling | |
202 | ~~~~~~~~~ | |
203 | ||
204 | #. export RTE_SDK=/path/to/rte_sdk | |
205 | #. cd ${RTE_SDK}/examples/vm_power_manager | |
206 | #. make | |
207 | ||
208 | Running | |
209 | ~~~~~~~ | |
210 | ||
211 | The application does not have any specific command line options other than *EAL*: | |
212 | ||
213 | .. code-block:: console | |
214 | ||
215 | ./build/vm_power_mgr [EAL options] | |
216 | ||
217 | The application requires exactly two cores to run, one core is dedicated to the CLI, | |
218 | while the other is dedicated to the channel endpoint monitor, for example to run | |
219 | on cores 0 & 1 on a system with 4 memory channels: | |
220 | ||
221 | .. code-block:: console | |
222 | ||
223 | ./build/vm_power_mgr -c 0x3 -n 4 | |
224 | ||
225 | After successful initialization the user is presented with VM Power Manager CLI: | |
226 | ||
227 | .. code-block:: console | |
228 | ||
229 | vm_power> | |
230 | ||
231 | Virtual Machines can now be added to the VM Power Manager: | |
232 | ||
233 | .. code-block:: console | |
234 | ||
235 | vm_power> add_vm {vm_name} | |
236 | ||
237 | When a {vm_name} is specified with the *add_vm* command a lookup is performed | |
238 | with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier | |
239 | to associate channels with a particular VM and for executing operations on a VM within the CLI. | |
240 | VMs do not have to be running in order to add them. | |
241 | ||
242 | A number of commands can be issued via the CLI in relation to VMs: | |
243 | ||
244 | Remove a Virtual Machine identified by {vm_name} from the VM Power Manager. | |
245 | ||
246 | .. code-block:: console | |
247 | ||
248 | rm_vm {vm_name} | |
249 | ||
250 | Add communication channels for the specified VM, the virtio channels must be enabled | |
251 | in the VM configuration(qemu/libvirt) and the associated VM must be active. | |
252 | {list} is a comma-separated list of channel numbers to add, using the keyword 'all' | |
253 | will attempt to add all channels for the VM: | |
254 | ||
255 | .. code-block:: console | |
256 | ||
257 | add_channels {vm_name} {list}|all | |
258 | ||
259 | Enable or disable the communication channels in {list}(comma-separated) | |
260 | for the specified VM, alternatively list can be replaced with keyword 'all'. | |
261 | Disabled channels will still receive packets on the host, however the commands | |
262 | they specify will be ignored. Set status to 'enabled' to begin processing requests again: | |
263 | ||
264 | .. code-block:: console | |
265 | ||
266 | set_channel_status {vm_name} {list}|all enabled|disabled | |
267 | ||
268 | Print to the CLI the information on the specified VM, the information | |
269 | lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with | |
270 | any communication channels associated with each VM, along with the status of each channel: | |
271 | ||
272 | .. code-block:: console | |
273 | ||
274 | show_vm {vm_name} | |
275 | ||
276 | Set the binding of Virtual CPU on VM with name {vm_name} to the Physical CPU mask: | |
277 | ||
278 | .. code-block:: console | |
279 | ||
280 | set_pcpu_mask {vm_name} {vcpu} {pcpu} | |
281 | ||
282 | Set the binding of Virtual CPU on VM to the Physical CPU: | |
283 | ||
284 | .. code-block:: console | |
285 | ||
286 | set_pcpu {vm_name} {vcpu} {pcpu} | |
287 | ||
288 | Manual control and inspection can also be carried in relation CPU frequency scaling: | |
289 | ||
290 | Get the current frequency for each core specified in the mask: | |
291 | ||
292 | .. code-block:: console | |
293 | ||
294 | show_cpu_freq_mask {mask} | |
295 | ||
296 | Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max: | |
297 | ||
298 | .. code-block:: console | |
299 | ||
300 | set_cpu_freq {core_mask} up|down|min|max | |
301 | ||
302 | Get the current frequency for the specified core: | |
303 | ||
304 | .. code-block:: console | |
305 | ||
306 | show_cpu_freq {core_num} | |
307 | ||
308 | Set the current frequency for the specified core by scaling up/down/min/max: | |
309 | ||
310 | .. code-block:: console | |
311 | ||
312 | set_cpu_freq {core_num} up|down|min|max | |
313 | ||
314 | Compiling and Running the Guest Applications | |
315 | -------------------------------------------- | |
316 | ||
317 | For compiling and running l3fwd-power, see :doc:`l3_forward_power_man`. | |
318 | ||
319 | A guest CLI is also provided for validating the setup. | |
320 | ||
321 | For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the | |
322 | host application using the *add_channels* command on the host. | |
323 | ||
324 | Compiling | |
325 | ~~~~~~~~~ | |
326 | ||
327 | #. export RTE_SDK=/path/to/rte_sdk | |
328 | #. cd ${RTE_SDK}/examples/vm_power_manager/guest_cli | |
329 | #. make | |
330 | ||
331 | Running | |
332 | ~~~~~~~ | |
333 | ||
334 | The application does not have any specific command line options other than *EAL*: | |
335 | ||
336 | .. code-block:: console | |
337 | ||
338 | ./build/vm_power_mgr [EAL options] | |
339 | ||
340 | The application for example purposes uses a channel for each lcore enabled, | |
341 | for example to run on cores 0,1,2,3 on a system with 4 memory channels: | |
342 | ||
343 | .. code-block:: console | |
344 | ||
345 | ./build/guest_vm_power_mgr -c 0xf -n 4 | |
346 | ||
347 | ||
348 | After successful initialization the user is presented with VM Power Manager Guest CLI: | |
349 | ||
350 | .. code-block:: console | |
351 | ||
352 | vm_power(guest)> | |
353 | ||
354 | To change the frequency of a lcore, use the set_cpu_freq command. | |
355 | Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max. | |
356 | ||
357 | .. code-block:: console | |
358 | ||
359 | set_cpu_freq {core_num} up|down|min|max |