]>
Commit | Line | Data |
---|---|---|
985794ad SH |
1 | .TH CBQ 8 "16 December 2001" "iproute2" "Linux" |
2 | .SH NAME | |
3 | CBQ \- Class Based Queueing | |
4 | .SH SYNOPSIS | |
5 | .B tc qdisc ... dev | |
6 | dev | |
7 | .B ( parent | |
8 | classid | |
9 | .B | root) [ handle | |
10 | major: | |
11 | .B ] cbq [ allot | |
12 | bytes | |
13 | .B ] avpkt | |
14 | bytes | |
15 | .B bandwidth | |
16 | rate | |
17 | .B [ cell | |
18 | bytes | |
19 | .B ] [ ewma | |
20 | log | |
21 | .B ] [ mpu | |
22 | bytes | |
23 | .B ] | |
24 | ||
25 | .B tc class ... dev | |
26 | dev | |
27 | .B parent | |
28 | major:[minor] | |
29 | .B [ classid | |
30 | major:minor | |
31 | .B ] cbq allot | |
32 | bytes | |
33 | .B [ bandwidth | |
34 | rate | |
35 | .B ] [ rate | |
36 | rate | |
37 | .B ] prio | |
38 | priority | |
39 | .B [ weight | |
40 | weight | |
41 | .B ] [ minburst | |
42 | packets | |
43 | .B ] [ maxburst | |
44 | packets | |
45 | .B ] [ ewma | |
46 | log | |
47 | .B ] [ cell | |
48 | bytes | |
49 | .B ] avpkt | |
50 | bytes | |
51 | .B [ mpu | |
52 | bytes | |
53 | .B ] [ bounded isolated ] [ split | |
54 | handle | |
55 | .B & defmap | |
56 | defmap | |
57 | .B ] [ estimator | |
58 | interval timeconstant | |
59 | .B ] | |
60 | ||
61 | .SH DESCRIPTION | |
62 | Class Based Queueing is a classful qdisc that implements a rich | |
63 | linksharing hierarchy of classes. It contains shaping elements as | |
64 | well as prioritizing capabilities. Shaping is performed using link | |
65 | idle time calculations based on the timing of dequeue events and | |
66 | underlying link bandwidth. | |
67 | ||
68 | .SH SHAPING ALGORITHM | |
69 | When shaping a 10mbit/s connection to 1mbit/s, the link will | |
70 | be idle 90% of the time. If it isn't, it needs to be throttled so that it | |
71 | IS idle 90% of the time. | |
72 | ||
73 | During operations, the effective idletime is measured using an | |
74 | exponential weighted moving average (EWMA), which considers recent | |
75 | packets to be exponentially more important than past ones. The Unix | |
76 | loadaverage is calculated in the same way. | |
77 | ||
78 | The calculated idle time is subtracted from the EWMA measured one, | |
79 | the resulting number is called 'avgidle'. A perfectly loaded link has | |
80 | an avgidle of zero: packets arrive exactly at the calculated | |
81 | interval. | |
82 | ||
83 | An overloaded link has a negative avgidle and if it gets too negative, | |
84 | CBQ throttles and is then 'overlimit'. | |
85 | ||
86 | Conversely, an idle link might amass a huge avgidle, which would then | |
87 | allow infinite bandwidths after a few hours of silence. To prevent | |
88 | this, avgidle is capped at | |
89 | .B maxidle. | |
90 | ||
91 | If overlimit, in theory, the CBQ could throttle itself for exactly the | |
92 | amount of time that was calculated to pass between packets, and then | |
93 | pass one packet, and throttle again. Due to timer resolution constraints, | |
94 | this may not be feasible, see the | |
95 | .B minburst | |
96 | parameter below. | |
97 | ||
98 | .SH CLASSIFICATION | |
99 | Within the one CBQ instance many classes may exist. Each of these classes | |
100 | contains another qdisc, by default | |
101 | .BR tc-pfifo (8). | |
102 | ||
103 | When enqueueing a packet, CBQ starts at the root and uses various methods to | |
104 | determine which class should receive the data. | |
105 | ||
106 | In the absence of uncommon configuration options, the process is rather easy. | |
107 | At each node we look for an instruction, and then go to the class the | |
108 | instruction refers us to. If the class found is a barren leaf-node (without | |
109 | children), we enqueue the packet there. If it is not yet a leaf node, we do | |
110 | the whole thing over again starting from that node. | |
111 | ||
112 | The following actions are performed, in order at each node we visit, until one | |
113 | sends us to another node, or terminates the process. | |
114 | .TP | |
115 | (i) | |
116 | Consult filters attached to the class. If sent to a leafnode, we are done. | |
117 | Otherwise, restart. | |
118 | .TP | |
119 | (ii) | |
120 | Consult the defmap for the priority assigned to this packet, which depends | |
121 | on the TOS bits. Check if the referral is leafless, otherwise restart. | |
122 | .TP | |
123 | (iii) | |
124 | Ask the defmap for instructions for the 'best effort' priority. Check the | |
125 | answer for leafness, otherwise restart. | |
126 | .TP | |
127 | (iv) | |
128 | If none of the above returned with an instruction, enqueue at this node. | |
129 | .P | |
130 | This algorithm makes sure that a packet always ends up somewhere, even while | |
131 | you are busy building your configuration. | |
132 | ||
133 | For more details, see | |
134 | .BR tc-cbq-details(8). | |
135 | ||
136 | .SH LINK SHARING ALGORITHM | |
137 | When dequeuing for sending to the network device, CBQ decides which of its | |
138 | classes will be allowed to send. It does so with a Weighted Round Robin process | |
139 | in which each class with packets gets a chance to send in turn. The WRR process | |
140 | starts by asking the highest priority classes (lowest numerically - | |
141 | highest semantically) for packets, and will continue to do so until they | |
142 | have no more data to offer, in which case the process repeats for lower | |
143 | priorities. | |
144 | ||
145 | Classes by default borrow bandwidth from their siblings. A class can be | |
146 | prevented from doing so by declaring it 'bounded'. A class can also indicate | |
147 | its unwillingness to lend out bandwidth by being 'isolated'. | |
148 | ||
149 | .SH QDISC | |
150 | The root of a CBQ qdisc class tree has the following parameters: | |
151 | ||
152 | .TP | |
153 | parent major:minor | root | |
154 | This mandatory parameter determines the place of the CBQ instance, either at the | |
155 | .B root | |
156 | of an interface or within an existing class. | |
157 | .TP | |
158 | handle major: | |
159 | Like all other qdiscs, the CBQ can be assigned a handle. Should consist only | |
160 | of a major number, followed by a colon. Optional, but very useful if classes | |
161 | will be generated within this qdisc. | |
162 | .TP | |
163 | allot bytes | |
164 | This allotment is the 'chunkiness' of link sharing and is used for determining packet | |
165 | transmission time tables. The qdisc allot differs slightly from the class allot discussed | |
166 | below. Optional. Defaults to a reasonable value, related to avpkt. | |
167 | .TP | |
168 | avpkt bytes | |
169 | The average size of a packet is needed for calculating maxidle, and is also used | |
170 | for making sure 'allot' has a safe value. Mandatory. | |
171 | .TP | |
172 | bandwidth rate | |
173 | To determine the idle time, CBQ must know the bandwidth of your underlying | |
174 | physical interface, or parent qdisc. This is a vital parameter, more about it | |
175 | later. Mandatory. | |
176 | .TP | |
177 | cell | |
178 | The cell size determines he granularity of packet transmission time calculations. Has a sensible default. | |
179 | .TP | |
180 | mpu | |
181 | A zero sized packet may still take time to transmit. This value is the lower | |
182 | cap for packet transmission time calculations - packets smaller than this value | |
183 | are still deemed to have this size. Defaults to zero. | |
184 | .TP | |
185 | ewma log | |
186 | When CBQ needs to measure the average idle time, it does so using an | |
6274b0b7 | 187 | Exponentially Weighted Moving Average which smooths out measurements into |
985794ad SH |
188 | a moving average. The EWMA LOG determines how much smoothing occurs. Lower |
189 | values imply greater sensitivity. Must be between 0 and 31. Defaults | |
190 | to 5. | |
191 | .P | |
192 | A CBQ qdisc does not shape out of its own accord. It only needs to know certain | |
193 | parameters about the underlying link. Actual shaping is done in classes. | |
194 | ||
195 | .SH CLASSES | |
196 | Classes have a host of parameters to configure their operation. | |
197 | ||
198 | .TP | |
199 | parent major:minor | |
200 | Place of this class within the hierarchy. If attached directly to a qdisc | |
201 | and not to another class, minor can be omitted. Mandatory. | |
202 | .TP | |
203 | classid major:minor | |
204 | Like qdiscs, classes can be named. The major number must be equal to the | |
205 | major number of the qdisc to which it belongs. Optional, but needed if this | |
206 | class is going to have children. | |
207 | .TP | |
208 | weight weight | |
209 | When dequeuing to the interface, classes are tried for traffic in a | |
210 | round-robin fashion. Classes with a higher configured qdisc will generally | |
211 | have more traffic to offer during each round, so it makes sense to allow | |
212 | it to dequeue more traffic. All weights under a class are normalized, so | |
213 | only the ratios matter. Defaults to the configured rate, unless the priority | |
214 | of this class is maximal, in which case it is set to 1. | |
215 | .TP | |
216 | allot bytes | |
217 | Allot specifies how many bytes a qdisc can dequeue | |
218 | during each round of the process. This parameter is weighted using the | |
219 | renormalized class weight described above. Silently capped at a minimum of | |
220 | 3/2 avpkt. Mandatory. | |
221 | ||
222 | .TP | |
223 | prio priority | |
224 | In the round-robin process, classes with the lowest priority field are tried | |
225 | for packets first. Mandatory. | |
226 | ||
227 | .TP | |
228 | avpkt | |
229 | See the QDISC section. | |
230 | ||
231 | .TP | |
232 | rate rate | |
233 | Maximum rate this class and all its children combined can send at. Mandatory. | |
234 | ||
235 | .TP | |
236 | bandwidth rate | |
237 | This is different from the bandwidth specified when creating a CBQ disc! Only | |
238 | used to determine maxidle and offtime, which are only calculated when | |
239 | specifying maxburst or minburst. Mandatory if specifying maxburst or minburst. | |
240 | ||
241 | .TP | |
242 | maxburst | |
243 | This number of packets is used to calculate maxidle so that when | |
244 | avgidle is at maxidle, this number of average packets can be burst | |
245 | before avgidle drops to 0. Set it higher to be more tolerant of | |
246 | bursts. You can't set maxidle directly, only via this parameter. | |
247 | ||
248 | .TP | |
249 | minburst | |
250 | As mentioned before, CBQ needs to throttle in case of | |
251 | overlimit. The ideal solution is to do so for exactly the calculated | |
252 | idle time, and pass 1 packet. However, Unix kernels generally have a | |
253 | hard time scheduling events shorter than 10ms, so it is better to | |
254 | throttle for a longer period, and then pass minburst packets in one | |
255 | go, and then sleep minburst times longer. | |
256 | ||
257 | The time to wait is called the offtime. Higher values of minburst lead | |
258 | to more accurate shaping in the long term, but to bigger bursts at | |
259 | millisecond timescales. Optional. | |
260 | ||
261 | .TP | |
262 | minidle | |
263 | If avgidle is below 0, we are overlimits and need to wait until | |
264 | avgidle will be big enough to send one packet. To prevent a sudden | |
265 | burst from shutting down the link for a prolonged period of time, | |
266 | avgidle is reset to minidle if it gets too low. | |
267 | ||
268 | Minidle is specified in negative microseconds, so 10 means that | |
269 | avgidle is capped at -10us. Optional. | |
270 | ||
271 | .TP | |
272 | bounded | |
273 | Signifies that this class will not borrow bandwidth from its siblings. | |
274 | .TP | |
275 | isolated | |
276 | Means that this class will not borrow bandwidth to its siblings | |
277 | ||
278 | .TP | |
279 | split major:minor & defmap bitmap[/bitmap] | |
280 | If consulting filters attached to a class did not give a verdict, | |
281 | CBQ can also classify based on the packet's priority. There are 16 | |
282 | priorities available, numbered from 0 to 15. | |
283 | ||
284 | The defmap specifies which priorities this class wants to receive, | |
285 | specified as a bitmap. The Least Significant Bit corresponds to priority | |
286 | zero. The | |
287 | .B split | |
288 | parameter tells CBQ at which class the decision must be made, which should | |
289 | be a (grand)parent of the class you are adding. | |
290 | ||
291 | As an example, 'tc class add ... classid 10:1 cbq .. split 10:0 defmap c0' | |
292 | configures class 10:0 to send packets with priorities 6 and 7 to 10:1. | |
293 | ||
294 | The complimentary configuration would then | |
295 | be: 'tc class add ... classid 10:2 cbq ... split 10:0 defmap 3f' | |
296 | Which would send all packets 0, 1, 2, 3, 4 and 5 to 10:1. | |
297 | .TP | |
298 | estimator interval timeconstant | |
299 | CBQ can measure how much bandwidth each class is using, which tc filters | |
300 | can use to classify packets with. In order to determine the bandwidth | |
301 | it uses a very simple estimator that measures once every | |
302 | .B interval | |
303 | microseconds how much traffic has passed. This again is a EWMA, for which | |
304 | the time constant can be specified, also in microseconds. The | |
305 | .B time constant | |
306 | corresponds to the sluggishness of the measurement or, conversely, to the | |
307 | sensitivity of the average to short bursts. Higher values mean less | |
308 | sensitivity. | |
309 | ||
310 | .SH BUGS | |
311 | The actual bandwidth of the underlying link may not be known, for example | |
312 | in the case of PPoE or PPTP connections which in fact may send over a | |
313 | pipe, instead of over a physical device. CBQ is quite resilient to major | |
314 | errors in the configured bandwidth, probably a the cost of coarser shaping. | |
315 | ||
316 | Default kernels rely on coarse timing information for making decisions. These | |
317 | may make shaping precise in the long term, but inaccurate on second long scales. | |
318 | ||
319 | See | |
320 | .BR tc-cbq-details(8) | |
321 | for hints on how to improve this. | |
322 | ||
323 | .SH SOURCES | |
324 | .TP | |
325 | o | |
326 | Sally Floyd and Van Jacobson, "Link-sharing and Resource | |
327 | Management Models for Packet Networks", | |
328 | IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995 | |
329 | ||
330 | .TP | |
331 | o | |
332 | Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995 | |
333 | ||
334 | .TP | |
335 | o | |
336 | Sally Floyd, "Notes on Class-Based Queueing: Setting | |
337 | Parameters", 1996 | |
338 | ||
339 | .TP | |
340 | o | |
341 | Sally Floyd and Michael Speer, "Experimental Results | |
342 | for Class-Based Queueing", 1998, not published. | |
343 | ||
344 | ||
345 | ||
346 | .SH SEE ALSO | |
347 | .BR tc (8) | |
348 | ||
349 | .SH AUTHOR | |
350 | Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>. This manpage maintained by | |
351 | bert hubert <ahu@ds9a.nl> | |
352 | ||
353 |