]>
Commit | Line | Data |
---|---|---|
a093bf00 PWJ |
1 | |
2 | HOWTO for multiqueue network device support | |
3 | =========================================== | |
4 | ||
5 | Section 1: Base driver requirements for implementing multiqueue support | |
6 | Section 2: Qdisc support for multiqueue devices | |
7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | |
8 | ||
9 | ||
10 | Intro: Kernel support for multiqueue devices | |
11 | --------------------------------------------------------- | |
12 | ||
13 | Kernel support for multiqueue devices is only an API that is presented to the | |
14 | netdevice layer for base drivers to implement. This feature is part of the | |
15 | core networking stack, and all network devices will be running on the | |
16 | multiqueue-aware stack. If a base driver only has one queue, then these | |
17 | changes are transparent to that driver. | |
18 | ||
19 | ||
20 | Section 1: Base driver requirements for implementing multiqueue support | |
21 | ----------------------------------------------------------------------- | |
22 | ||
23 | Base drivers are required to use the new alloc_etherdev_mq() or | |
24 | alloc_netdev_mq() functions to allocate the subqueues for the device. The | |
25 | underlying kernel API will take care of the allocation and deallocation of | |
26 | the subqueue memory, as well as netdev configuration of where the queues | |
27 | exist in memory. | |
28 | ||
29 | The base driver will also need to manage the queues as it does the global | |
30 | netdev->queue_lock today. Therefore base drivers should use the | |
31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | |
32 | device is still operational. netdev->queue_lock is still used when the device | |
33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | |
34 | ||
35 | Finally, the base driver should indicate that it is a multiqueue device. The | |
36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | |
37 | bitmap on device initialization. Below is an example from e1000: | |
38 | ||
39 | #ifdef CONFIG_E1000_MQ | |
40 | if ( (adapter->hw.mac.type == e1000_82571) || | |
41 | (adapter->hw.mac.type == e1000_82572) || | |
42 | (adapter->hw.mac.type == e1000_80003es2lan)) | |
43 | netdev->features |= NETIF_F_MULTI_QUEUE; | |
44 | #endif | |
45 | ||
46 | ||
47 | Section 2: Qdisc support for multiqueue devices | |
48 | ----------------------------------------------- | |
49 | ||
50 | Currently two qdiscs support multiqueue devices. A new round-robin qdisc, | |
51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | |
52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | |
53 | Use this field in the base driver to determine which queue to send the skb | |
54 | to. | |
55 | ||
56 | sch_rr has been added for hardware that doesn't want scheduling policies from | |
57 | software, so it's a straight round-robin qdisc. It uses the same syntax and | |
58 | classification priomap that sch_prio uses, so it should be intuitive to | |
59 | configure for people who've used sch_prio. | |
60 | ||
fdd8a532 PWJ |
61 | In order to utilitize the multiqueue features of the qdiscs, the network |
62 | device layer needs to enable multiple queue support. This can be done by | |
63 | selecting NETDEVICES_MULTIQUEUE under Drivers. | |
64 | ||
65 | The PRIO qdisc naturally plugs into a multiqueue device. If | |
66 | NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of | |
67 | bands requested is compared to the number of queues on the hardware. If they | |
a093bf00 PWJ |
68 | are equal, it sets a one-to-one mapping up between the queues and bands. If |
69 | they're not equal, it will not load the qdisc. This is the same behavior | |
70 | for RR. Once the association is made, any skb that is classified will have | |
71 | skb->queue_mapping set, which will allow the driver to properly queue skb's | |
72 | to multiple queues. | |
73 | ||
74 | ||
75 | Section 3: Brief howto using PRIO and RR for multiqueue devices | |
76 | --------------------------------------------------------------- | |
77 | ||
78 | The userspace command 'tc,' part of the iproute2 package, is used to configure | |
79 | qdiscs. To add the PRIO qdisc to your network device, assuming the device is | |
80 | called eth0, run the following command: | |
81 | ||
82 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | |
83 | ||
84 | This will create 4 bands, 0 being highest priority, and associate those bands | |
85 | to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping | |
86 | would look like: | |
87 | ||
88 | band 0 => queue 0 | |
89 | band 1 => queue 1 | |
90 | band 2 => queue 2 | |
91 | band 3 => queue 3 | |
92 | ||
93 | Traffic will begin flowing through each queue if your TOS values are assigning | |
94 | traffic across the various bands. For example, ssh traffic will always try to | |
95 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | |
96 | so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" | |
97 | traffic classification, which is band 1. Therefore pings will be send out | |
98 | queue 1 on the NIC. | |
99 | ||
100 | Note the use of the multiqueue keyword. This is only in versions of iproute2 | |
101 | that support multiqueue networking devices; if this is omitted when loading | |
102 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | |
103 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | |
104 | queue 0). | |
105 | ||
106 | Another alternative to multiqueue band allocation can be done by using the | |
107 | multiqueue option and specify 0 bands. If this is the case, the qdisc will | |
108 | allocate the number of bands to equal the number of queues that the device | |
109 | reports, and bring the qdisc online. | |
110 | ||
111 | The behavior of tc filters remains the same, where it will override TOS priority | |
112 | classification. | |
113 | ||
114 | ||
115 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> |