]> git.proxmox.com Git - iproute2.git/blob - debian/doc/htb/userg.htm
Initial import
[iproute2.git] / debian / doc / htb / userg.htm
1 <html><head><title>HTB manual - user guide</title></head>
2 <body>
3 <h1><center>HTB Linux queuing discipline manual - user guide</center></h1>
4 <center><address>
5 Martin Devera aka devik (devik@cdi.cz)<br>
6 Manual: devik and Don Cohen<br>
7 Last updated: 5.5.2002
8 </address></center>
9 <br>
10 New text is in red color. Coloring is removed on new text
11 after 3 months. Currently they depicts HTB3 changes<p>
12 <p>
13 <ul>
14 <li><a href=#intro>1. Introduction</a>
15 <li><a href=#sharing>2. Link sharing</a>
16 <li><a href=#hsharing>3. Sharing hierarchy</a>
17 <li><a href=#ceiling>4. Rate ceiling</a>
18 <li><a href=#burst>5. Burst</a>
19 <li><a href=#prio>6. Priorizing bandwidth share</a>
20 <li><a href=#stats>7. Understanding statistics</a>
21 <li><a href=#err>8. Making, debugging and sending error reports</a>
22 </ul>
23 <a name=intro><h2>1. Introduction</h2>
24
25 HTB is meant as a more understandable, intuitive and faster replacement for the
26 CBQ qdisc in Linux. Both CBQ and HTB help you to control the
27 use of the outbound bandwidth on a given link. Both allow you to use
28 one physical link to simulate several slower links and to send different
29 kinds of traffic on different simulated links. In both cases, you have
30 to specify how to divide the physical link into simulated links and how
31 to decide which simulated link to use for a given packet to be sent.
32 <p>
33 This document shows you how to use HTB.
34 Most sections have examples, charts (with measured data) and
35 discussion of particular problems.
36 <p>
37 This release of HTB should be also much more scalable. See
38 comparison at HTB home page.
39 <p>
40 <b>Please read:</b> tc tool (not only HTB) uses shortcuts to denote units
41 of rate. <b>kbps</b> means kilo<b>bytes</b> and <b>kbit</b> means
42 <b>kilobits</b> ! This is the most FAQ about tc in linux.
43 <p>
44
45 <a name=sharing><h2>2. Link sharing</h2>
46 <img src=Ag2Leaf3flat.gif align=right>
47
48 <i>Problem: We have two customers, A and B, both connected to the
49 internet via eth0. We want to allocate 60 kbps to B and 40 kbps to A.
50 Next we want to subdivide A's bandwidth 30kbps for WWW and 10kbps
51 for everything else. Any unused bandwidth can be used by any class
52 which needs it (in proportion of its allocated share).</i>
53 <p>
54 HTB ensures that <b> the amount of service provided to each class is
55 at least the minimum of the amount it requests and the amount assigned
56 to it</b>. When a class requests less than the amount assigned, the
57 remaining (excess) bandwidth is distributed to other classes which request
58 service.<p>
59 Also see document about HTB internals - it
60 describes goal above in greater details.
61 <p>
62 <i>Note: In the literature this is called "borrowing" the excess bandwidth.
63 We use that term below to conform with the literature. We mention, however,
64 that this seems like a bad term since there is no obligation to repay the
65 resource that was "borrowed".
66 </i>
67 <p>
68 The different kinds of traffic above are represented by classes in
69 HTB. The simplest approach is shown in the picture at the right.
70 <br>
71 Let's see what commands to use:
72 <pre>
73 tc qdisc add dev eth0 root handle 1: htb default 12
74 </pre>
75 This command attaches queue discipline HTB to eth0 and gives it the
76 "handle" <b>1:</b>.
77 This is just a name or identifier with which to refer to it below.
78 The <b>default 12</b>
79 means that any traffic that is not otherwise classified will be assigned
80 to class 1:12.
81 <p>
82 <i>Note:
83 In general (not just for HTB but for all qdiscs and classes in tc),
84 handles are written x:y where x is an integer identifying a qdisc and
85 y is an integer identifying a class belonging to that qdisc. The handle
86 for a qdisc must have zero for its y value and the handle for a class
87 must have a non-zero value for its y value. The "1:" above is treated
88 as "1:0".
89 </i>
90 <p>
91 <pre>
92 tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
93 tc class add dev eth0 parent 1:1 classid 1:10 htb rate 30kbps ceil 100kbps
94 tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10kbps ceil 100kbps
95 tc class add dev eth0 parent 1:1 classid 1:12 htb rate 60kbps ceil 100kbps
96 </pre>
97 <p>
98 The first line creates a "root" class, 1:1 under the qdisc 1:.
99 The definition of a root class is one with the htb qdisc as its parent.
100 A root class, like other classes under an htb qdisc allows its children
101 to borrow from each other, but one root class cannot borrow from another.
102 We could have created the other three classes directly under the htb qdisc,
103 but then the excess bandwidth from one would not be available to the others.
104 In this case we do want to allow borrowing, so we have to create an extra
105 class to serve as the root and put the classes that will carry the real data
106 under that. These are defined by the next three lines.
107 The <b>ceil</b> parameter is described below.
108 <p><i>Note: Sometimes people ask me why they have to repeat <b>dev eth0</b>
109 when they have already used <b>handle</b> or <b>parent</b>. The reason
110 is that handles are local to an interface, e.g., eth0 and eth1 could each
111 have classes with handle 1:1.</i>
112 <p>
113 We also have to describe which packets belong in which class.
114 This is really not related to the HTB qdisc. See the tc filter
115 documentation for details. The commands will look something like this:
116 <pre>
117 tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
118 match ip src 1.2.3.4 match ip dport 80 0xffff flowid 1:10
119 tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
120 match ip src 1.2.3.4 flowid 1:11
121 </pre>
122 (We identify A by its IP address which we imagine here to be 1.2.3.4.)
123 <p><i>Note: The U32 classifier has an undocumented design bug which causes
124 duplicate entries to be listed by "tc filter show" when you use U32
125 classifiers with different prio values.</i>
126 <img src=flatnp.gif align=right>
127 <p>
128 You may notice that we didn't create a filter for the 1:12 class.
129 It might be more clear to do so, but this illustrates the use of the default.
130 Any packet not classified by the two rules above (any packet
131 not from source address 1.2.3.4) will be put in class 1:12.
132 <p>
133 Now we can optionally attach queuing disciplines to the leaf classes.
134 If none is specified the default is pfifo.
135 <pre>
136 tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 5
137 tc qdisc add dev eth0 parent 1:11 handle 30: pfifo limit 5
138 tc qdisc add dev eth0 parent 1:12 handle 40: sfq perturb 10
139 </pre>
140 That's all the commands we need. Let's see what happens if we send
141 packets of each class at 90kbps and then stop sending packets of one
142 class at a time. Along the bottom of the graph are annotations
143 like "0:90k". The horizontal position at the center of the label
144 (in this case near the 9, also marked with a red "1") indicates the
145 time at which the rate of some traffic class changes.
146 Before the colon is an identifier for
147 the class (0 for class 1:10, 1 for class 1:11, 2 for class 1:12) and
148 after the colon is the new rate starting at the time where the
149 annotation appears. For example, the rate of class 0 is changed to
150 90k at time 0, 0 (= 0k) at time 3, and back to 90k at time 6.
151 <p>
152 Initially all classes generate 90kb. Since this is higher than any
153 of the rates specified, each class is limited to its
154 specified rate. At time 3 when we stop sending class 0 packets, the
155 rate allocated to class 0 is reallocated to the other two
156 classes in proportion to their allocations, 1 part class 1 to 6 parts class 2.
157 (The increase in class 1 is hard to see because it's only 4 kbps.)
158 Similarly at time 9 when class 1 traffic stops its bandwidth is
159 reallocated to the other two (and the increase in class 0 is similarly hard
160 to see.) At time 15 it's easier to see that the allocation to class 2 is
161 divided 3 parts for class 0 to 1 part for class 1. At time 18 both class 1 and
162 class 2 stop so class 0 gets all 90 kbps it requests.
163 <p>
164 It might be good time to touch concept of <b>quantums</b> now. In fact when
165 more classes want to borrow bandwidth they are each given some number of
166 bytes before serving other competing class. This number is called quantum.
167 You should see that if several classes are competing for parent's bandwidth
168 then they get it in proportion of their quantums. It is important to know
169 that for precise operation quantums need to be as small as possible and
170 larger than MTU.
171 <br>
172 Normaly you don't need to specify quantums manualy as HTB chooses precomputed
173 values. It computes classe's quantum (when you add or change it) as its
174 rate divided by <b>r2q</b> global parameter. Its default value is 10
175 and because typical MTU is 1500 the default is good for rates from
176 15 kBps (120 kbit). For smaller minimal rates specify r2q 1 when
177 creating qdisc - it is good from 12 kbit which should be enough. If
178 you will need you can specify quantum manualy when adding or changing
179 the class. You can avoid warnings in log if precomputed value would be
180 bad. When you specify quantum on command line the r2q is ignored for
181 that class.
182 <p>
183 This might seem like a good solution if A and B were not different
184 customers. However, if A is paying for 40kbps then he would probably
185 prefer his unused WWW bandwidth to go to his own other service rather
186 than to B. This requirement is represented in HTB by the class hierarchy.
187
188 <img src=Ag2Leaf3hier.gif align=right>
189 <a name=hsharing><h2>3. Sharing hierarchy</h2>
190 The problem from the previous chapter is solved by the class hierarchy
191 in this picture. Customer A is now explicitly represented by its own
192 class. Recall from above that
193 <b> the amount of service provided to each class is at least the
194 minimum of the amount it requests and the amount assigned to it</b>.
195 This applies to htb classes that are not parents of other htb classes.
196 We call these leaf classes.
197 For htb classes that are parents of other htb classes, which we call
198 interior classes, the rule is that
199 <b> the amount of service is at least the minumum of the amount assigned
200 to it and the sum of the amount requested by its children</b>.
201 In this case we assign 40kbps to customer A. That means that if A
202 requests less than the allocated rate for WWW, the excess will be used
203 for A's other traffic (if there is demand for it), at least until the sum is
204 40kbps.
205 <p>
206 <i>Notes: Packet classification rules can assign to inner nodes too. Then
207 you have to attach other filter list to inner node. Finally you should
208 reach leaf or special 1:0 class. The rate supplied for a parent should be the sum
209 of the rates of its children. </i>
210 <p>The commands are now as follows:
211 <pre>
212 tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
213 tc class add dev eth0 parent 1:1 classid 1:2 htb rate 40kbps ceil 100kbps
214 tc class add dev eth0 parent 1:2 classid 1:10 htb rate 30kbps ceil 100kbps
215 tc class add dev eth0 parent 1:2 classid 1:11 htb rate 10kbps ceil 100kbps
216 tc class add dev eth0 parent 1:1 classid 1:12 htb rate 60kbps ceil 100kbps
217 </pre>
218 <img src=hiernp.gif align=right>
219 <p>
220 We now turn to the graph showing the results of the hierarchical solution.
221 When A's WWW traffic stops, its assigned bandwidth is reallocated to A's
222 other traffic so that A's total bandwidth is still the assigned 40kbps.<br>
223 If A were to request less than 40kbs in total then the excess would be given to B.
224
225 <a name=ceiling><h2>4. Rate ceiling</h2>
226 The <b>ceil</b> argument specifies the maximum bandwidth that a class
227 can use. This limits how much bandwidth that class can borrow.
228 The default ceil is the same as the rate. (That's why we had to specify
229 it in the examples above to show borrowing.)
230 We now change the <b>ceil 100kbps</b> for classes 1:2 (A) and 1:11
231 (A's other) from the previous chapter to <b>ceil 60kbps</b> and
232 <b>ceil 20kbps</b>.
233 <p>
234 The graph at right differs from the previous one at time 3 (when WWW
235 traffic stops) because A/other is limited to 20kbps. Therefore customer
236 A gets only 20kbps in total and the unused 20kbps is allocated to B.<br>
237 The second difference is at time 15 when B stops. Without the ceil,
238 all of its bandwidth was given to A, but now A is only allowed to use
239 60kbps, so the remaining 40kbps goes unused.
240 <img src=hiernpceil.gif align=right>
241 <p>
242 This feature should be useful for ISPs because they probably want to
243 limit the amount of service a given customer gets even when other
244 customers are not requesting service. (ISPs probably want customers
245 to pay more money for better service.)
246 Note that root classes are not allowed to borrow, so there's really no
247 point in specifying a ceil for them.
248 <p>
249 <i>Notes: The ceil for a class should always be at least as high as the rate.
250 Also, the ceil for a class should always be at least as high as the ceil of
251 any of its children.</i>
252
253 <a name=burst><h2>5. Burst</h2>
254
255 Networking hardware can only send one packet at a time and only at
256 a hardware dependent rate. Link sharing software can only use this
257 ability to approximate the effects of multiple links running at
258 different (lower) speeds. Therefore the rate and ceil are not really
259 instantaneous measures but averages over the time that it takes to send
260 many packets. What really happens is that the traffic from one class
261 is sent a few packets at a time at the maximum speed and then other
262 classes are served for a while.
263
264 The <b>burst</b> and <b>cburst</b> parameters control the amount of data
265 that can be sent at the maximum (hardware) speed without trying to serve
266 another class.
267 <p>
268 If <b>cburst</b> is smaller (ideally one packet size) it shapes bursts to not exceed
269 <b>ceil</b> rate in the same way as TBF's peakrate does.<p>
270 When you set <b>burst</b> for parent class smaller than for some child
271 then you should expect the parent class to get stuck sometimes (because
272 child will drain more than parent can handle). HTB will remember these
273 negative bursts up to 1 minute.
274 <p>
275 You can ask <b>why I want bursts</b>. Well it is cheap and simple way
276 how to improve response times on congested link. For example www traffic
277 is bursty. You ask for page, get it in burst and then read it. During
278 that idle period burst will "charge" again.
279 <p>
280 <i>Note: The burst and cburst of a class should always be at least
281 as high as that of any of it children.</i>
282 <p>
283 <img src=hiernpburst.gif align=right>
284 On graph you can see case from previous chapter where I changed burst
285 for red and yellow (agency A) class to 20kb but cburst remained
286 default (cca 2 kb).<br>
287 Green hill is at time 13 due to burst setting on SMTP class.
288 A class. It has underlimit since time 9 and accumulated 20 kb of burst.
289 The hill is high up to 20 kbps (limited by ceil because it has cburst
290 near packet size).<br>
291 Clever reader can think why there is not red and yellow hill at time
292 7. It is because yellow is already at ceil limit so it has no space
293 for furtner bursts.<br>
294 There is at least one unwanted artifact - magenta crater at time 4. It
295 is because I intentionaly "forgot" to add burst to root link (1:1) class.
296 It remembered hill from time 1 and when at time 4 blue class wanted to
297 borrow yellow's rate it denied it and compensated itself.
298 <p>
299 <b>Limitation:</b> when you operate with high rates on computer with low
300 resolution timer you need some minimal <b>burst</b> and <b>cburst</b> to
301 be set for all classes. Timer resolution on i386 systems is 10ms and
302 1ms on Alphas.
303 The minimal burst can be computed as max_rate*timer_resolution. So that
304 for 10Mbit on plain i386 you needs burst 12kb.<p>
305 If you set too small burst you will encounter smaller rate than you set.
306 Latest tc tool will compute and set the smallest possible burst when it
307 is not specified.
308
309 <img src=hierprio.gif align=right>
310 <a name=prio><h2>6. Priorizing bandwidth share</h2>
311 Priorizing traffic has two sides. First it affects how the excess bandwidth
312 is distributed among siblings. Up to now we have seen that excess bandwidth
313 was distibuted according to rate ratios. Now I used basic configuration from
314 chapter 3 (hierarchy without ceiling and bursts) and changed priority of all
315 classes to 1 except SMTP (green) which I set to 0 (higher).<br>
316 From sharing view you see that the class got all the excess bandwidth. The
317 rule is that <b>classes with higher priority are offered excess bandwidth
318 first</b>. But rules about guaranted <b>rate</b> and <b>ceil</b> are still
319 met.<p>
320 There is also second face of problem. It is total delay of packet. It is relatively
321 hard to measure on ethernet which is too fast (delay is so neligible). But
322 there is simple help. We can add simple HTB with one class rate limiting to
323 less then 100 kbps and add second HTB (the one we are measuring) as child. Then we
324 can simulate slower link with larger delays.<br>
325 For simplicity sake I use simple two class scenario:
326 <pre>
327 # qdisc for delay simulation
328 tc qdisc add dev eth0 root handle 100: htb
329 tc class add dev eth0 parent 100: classid 100:1 htb rate 90kbps
330
331 # real measured qdisc
332 tc qdisc add dev eth0 parent 100:1 handle 1: htb
333 AC="tc class add dev eth0 parent"
334 $AC 1: classid 1:1 htb rate 100kbps
335 $AC 1:2 classid 1:10 htb rate 50kbps ceil 100kbps prio 1
336 $AC 1:2 classid 1:11 htb rate 50kbps ceil 100kbps prio 1
337 tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 2
338 tc qdisc add dev eth0 parent 1:11 handle 21: pfifo limit 2
339 </pre>
340 <img src=priotime.gif align=right>
341 <i>Note: HTB as child of another HTB is NOT the same as class under
342 another class within the same HTB. It is because when class in HTB can send
343 it will send as soon as hardware equipment can. So that delay of underlimit
344 class is limited only by equipment and not by ancestors.<br>
345 In HTB under HTB case the outer HTB simulates new hardware equipment with
346 all consequences (larger delay)</i>
347 <p>
348 Simulator is set to generate 50 kbps for both classes and at time 3s it
349 executes command:
350 <pre>
351 tc class change dev eth0 parent 1:2 classid 1:10 htb \
352 rate 50kbps ceil 100kbps burst 2k prio 0
353 </pre>
354 As you see the delay of WWW class dropped nearly to the zero while
355 SMTP's delay increased. When you priorize to get better delay it always
356 makes other class delays worse.<br>
357 Later (time 7s) the simulator starts to generate WWW at 60 kbps and SMTP at 40 kbps.
358 There you can observe next interesting behaviour. When class is overlimit
359 (WWW) then HTB priorizes underlimit part of bandwidth first.<p>
360 <b>What class should you priorize ?</b> Generaly those classes where
361 you really need low delays. The example could be video or audio
362 traffic (and you will really need to use correct <b>rate</b> here
363 to prevent traffic to kill other ones) or interactive (telnet, SSH)
364 traffic which is bursty in nature and will not negatively affect
365 other flows.<br>
366 Common trick is to priorize ICMP to get nice ping delays even on fully
367 utilized links (but from technical point of view it is not what you want when
368 measuring connectivity).
369
370 <a name=stats><h2>7. Understanding statistics</h2>
371 The <b>tc</b> tool allows you to gather statistics of queuing disciplines in Linux.
372 Unfortunately statistic results are not explained by authors so that you often can't
373 use them. Here I try to help you to understand HTB's stats.<br>
374 First whole HTB stats. The snippet bellow is taken during simulation from chapter 3.
375 <pre>
376 # tc -s -d qdisc show dev eth0
377 qdisc pfifo 22: limit 5p
378 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
379
380 qdisc pfifo 21: limit 5p
381 Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0)
382
383 qdisc pfifo 20: limit 5p
384 Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0)
385
386 qdisc htb 1: r2q 10 default 1 direct_packets_stat 0
387 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251)
388 </pre>
389 First three disciplines are HTB's children. Let's ignore them as PFIFO
390 stats are self explanatory.<br>
391 <i>overlimits</i> tells you how many times the discipline delayed a packet.
392 <i>direct_packets_stat</i> tells you how many packets was sent thru direct queue.
393 Other stats are sefl explanatory. Let's look at class' stats:
394 <pre>
395 tc -s -d class show dev eth0
396 class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b
397 cburst 2Kb/8 mpu 0b quantum 10240 level 3
398 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0)
399 rate 70196bps 141pps
400 lended: 6872 borrowed: 0 giants: 0
401
402 class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b
403 cburst 2Kb/8 mpu 0b quantum 4096 level 2
404 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0)
405 rate 70196bps 141pps
406 lended: 1017 borrowed: 6872 giants: 0
407
408 class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b
409 cburst 2Kb/8 mpu 0b quantum 2867 level 0
410 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358)
411 rate 14635bps 29pps
412 lended: 2939 borrowed: 1599 giants: 0
413 </pre>
414 I deleted 1:11 and 1:12 class to make output shorter. As you see there
415 are parameters we set. Also there are <i>level</i> and DRR <i>quantum</i>
416 informations.<br>
417 <i>overlimits</i> shows how many times class was asked to send packet
418 but he can't due to rate/ceil constraints (currently counted for leaves only).<br>
419 <i>rate, pps</i> tells you actual (10 sec averaged) rate going thru class. It
420 is the same rate as used by gating.<br>
421 <i>lended</i> is # of packets donated by this class (from its <b>rate</b>) and
422 <i>borrowed</i> are packets for whose we borrowed from parent. Lends are always
423 computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which
424 in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented).<br>
425 <i>giants</i> is number of packets larger than mtu set in tc command. HTB will
426 work with these but rates will not be accurate at all. Add mtu to your tc (defaults
427 to 1600 bytes).<br>
428
429 <a name=err><h2>8. Making, debugging and sending error reports</h2>
430 <font color=red date=30.12.2002>
431 If you have kernel 2.4.20 or newer you don't need to patch it - all
432 is in vanilla tarball. The only thing you need is <b>tc</b> tool.
433 Download HTB 3.6 tarball and use tc from it.
434 </font><p>
435 You have to patch to make it work with older kernels. Download kernel source and
436 use <b>patch -p1 -i htb3_2.X.X.diff</b> to apply the patch. Then use
437 <b>make menuconfig;make bzImage</b> as before. Don't forget to enable QoS and HTB.<br>
438 Also you will have to use patched <b>tc</b> tool. The patch is also
439 in downloads or you can download precompiled binary.<p>
440 If you think that you found an error I will appreciate error report.
441 For oopses I need ksymoops output. For weird qdisc behaviour add
442 parameter <b>debug 3333333</b> to your <b>tc qdisc add .... htb</b>.
443 It will log many megabytes to syslog facility kern level debug. You
444 will probably want to add line like:<br>
445 <b>kern.debug -/var/log/debug</b><br>
446 to your /etc/syslog.conf. Then bzip and send me the log via email
447 (up to 10MB after bzipping) along with description of problem and
448 its time.
449 </body></html>