]>
Commit | Line | Data |
---|---|---|
143969f2 SH |
1 | |
2 | This documented is slightly dated but should give you idea of how things | |
3 | work. | |
4 | ||
5 | What is it? | |
6 | ----------- | |
7 | ||
8 | An extension to the filtering/classification architecture of Linux Traffic | |
e5cd5a51 SH |
9 | Control. |
10 | Up to 2.6.8 the only action that could be "attached" to a filter was policing. | |
143969f2 SH |
11 | i.e you could say something like: |
12 | ||
13 | ----- | |
14 | tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \ | |
15 | 127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k | |
16 | ----- | |
17 | ||
18 | which implies "if a packet is seen on the ingress of the lo device with | |
19 | a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and | |
e5cd5a51 | 20 | we execute a policing action which rate limits its bandwidth utilization |
143969f2 SH |
21 | to 1.5Mbps". |
22 | ||
23 | The new extensions allow for more than just policing actions to be added. | |
9d715cf6 | 24 | They are also fully backward compatible. If you have a kernel that doesn't |
143969f2 SH |
25 | understand them, then the effect is null i.e if you have a newer tc |
26 | but older kernel, the actions are not installed. Likewise if you | |
27 | have a newer kernel but older tc, obviously the tc will use current | |
28 | syntax which will work fine. Of course to get the required effect you need | |
29 | both newer tc and kernel. If you are reading this you have the | |
30 | right tc ;-> | |
31 | ||
e5cd5a51 | 32 | A side effect is that we can now get stateless firewalling to work with tc. |
143969f2 | 33 | Essentially this is now an alternative to iptables. |
e5cd5a51 | 34 | I won't go into details of my dislike for iptables at times, but |
143969f2 SH |
35 | scalability is one of the main issues; however, if you need stateful |
36 | classification - use netfilter (for now). | |
37 | ||
38 | This stuff works on both ingress and egress qdiscs. | |
39 | ||
40 | Features | |
41 | -------- | |
42 | ||
43 | 1) new additional syntax and actions enabled. Note old syntax is still valid. | |
44 | ||
45 | Essentially this is still the same syntax as tc with a new construct | |
46 | "action". The syntax is of the form: | |
47 | tc filter add <DEVICE> parent 1:0 protocol ip prio 10 <Filter description> | |
48 | flowid 1:1 action <ACTION description>* | |
49 | ||
50 | You can have as many actions as you want (within sensible reasoning). | |
51 | ||
52 | In the past the only real action was the policer; i.e you could do something | |
53 | along the lines of: | |
54 | tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ | |
55 | match ip src 127.0.0.1/32 flowid 1:1 \ | |
56 | police mtu 4000 rate 1500kbit burst 90k | |
57 | ||
58 | Although you can still use the same syntax, now you can say: | |
59 | ||
60 | tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \ | |
61 | match ip src 127.0.0.1/32 flowid 1:1 \ | |
62 | action police mtu 4000 rate 1500kbit burst 90k | |
63 | ||
e5cd5a51 | 64 | " generic Actions" (gact) at the moment are: |
143969f2 SH |
65 | { drop, pass, reclassify, continue} |
66 | (If you have others, no listed here give me a reason and we will add them) | |
67 | +drop says to drop the packet | |
a3572a76 | 68 | +pass and ok (are equivalent) says to accept it |
143969f2 SH |
69 | +reclassify requests for reclassification of the packet |
70 | +continue requests for next lookup to match | |
71 | ||
72 | 2)In order to take advantage of some of the targets written by the | |
73 | iptables people, a classifier can have a packet being massaged by an | |
74 | iptable target. I have only tested with mangler targets up to now. | |
75 | (infact anything that is not in the mangling table is disabled right now) | |
76 | ||
77 | In terms of hooks: | |
78 | *ingress is mapped to pre-routing hook | |
79 | *egress is mapped to post-routing hook | |
9d715cf6 | 80 | I don't see much value in the other hooks, if you see it and email me good |
143969f2 SH |
81 | reasons, the addition is trivial. |
82 | ||
83 | Example syntax for iptables targets usage becomes: | |
84 | tc filter add ..... u32 <u32 syntax> action ipt -j <iptables target syntax> | |
85 | ||
86 | example: | |
87 | tc filter add dev lo parent ffff: protocol ip prio 8 u32 \ | |
88 | match ip dst 127.0.0.8/32 flowid 1:12 \ | |
89 | action ipt -j mark --set-mark 2 | |
90 | ||
e9acc242 PW |
91 | NOTE: flowid 1:12 is parsed flowid 0x1:0x12. Make sure if you want flowid |
92 | decimal 12, then use flowid 1:c. | |
93 | ||
143969f2 SH |
94 | 3) A feature i call pipe |
95 | The motivation is derived from Unix pipe mechanism but applied to packets. | |
e5cd5a51 | 96 | Essentially take a matching packet and pass it through |
143969f2 SH |
97 | action1 | action2 | action3 etc. |
98 | You could do something similar to this with the tc policer and the "continue" | |
e5cd5a51 SH |
99 | operator but this rather restricts it to just the policer and requires |
100 | multiple rules (and lookups, hence quiet inefficient); | |
143969f2 | 101 | |
e5cd5a51 | 102 | as an example -- and please note that this is just an example _not_ The |
143969f2 SH |
103 | Word Youve Been Waiting For (yes i have had problems giving examples |
104 | which ended becoming dogma in documents and people modifying them a little | |
e5cd5a51 | 105 | to look clever); |
143969f2 | 106 | |
e5cd5a51 | 107 | i selected the metering rates to be small so that i can show better how |
143969f2 | 108 | things work. |
143969f2 | 109 | |
e5cd5a51 SH |
110 | The script below does the following: |
111 | - an incoming packet from 10.0.0.21 is first given a firewall mark of 1. | |
112 | ||
113 | - It is then metered to make sure it does not exceed its allocated rate of | |
9d715cf6 | 114 | 1Kbps. If it doesn't exceed rate, this is where we terminate action execution. |
143969f2 | 115 | |
e5cd5a51 | 116 | - If it does exceed its rate, its "color" changes to a mark of 2 and it is |
143969f2 SH |
117 | then passed through a second meter. |
118 | ||
e5cd5a51 SH |
119 | -The second meter is shared across all flows on that device [i am surpised |
120 | that this seems to be not a well know feature of the policer; Bert was telling | |
143969f2 SH |
121 | me that someone was writing a qdisc just to do sharing across multiple devices; |
122 | it must be the summer heat again; weve had someone doing that every year around | |
e5cd5a51 SH |
123 | summer -- the key to sharing is to use a operator "index" in your policer |
124 | rules (example "index 20"). All your rules have to use the same index to | |
143969f2 | 125 | share.] |
e5cd5a51 | 126 | |
143969f2 SH |
127 | -If the second meter is exceeded the color of the flow changes further to 3. |
128 | ||
129 | -We then pass the packet to another meter which is shared across all devices | |
130 | in the system. If this meter is exceeded we drop the packet. | |
131 | ||
e5cd5a51 | 132 | Note the mark can be used further up the system to do things like policy |
143969f2 SH |
133 | or more interesting things on the egress. |
134 | ||
135 | ------------------ cut here ------------------------------- | |
136 | # | |
137 | # Add an ingress qdisc on eth0 | |
138 | tc qdisc add dev eth0 ingress | |
139 | # | |
140 | #if you see an incoming packet from 10.0.0.21 | |
141 | tc filter add dev eth0 parent ffff: protocol ip prio 1 \ | |
142 | u32 match ip src 10.0.0.21/32 flowid 1:15 \ | |
143 | # | |
144 | # first give it a mark of 1 | |
145 | action ipt -j mark --set-mark 1 index 2 \ | |
146 | # | |
147 | # then pass it through a policer which allows 1kbps; if the flow | |
9d715cf6 | 148 | # doesn't exceed that rate, this is where we stop, if it exceeds we |
143969f2 SH |
149 | # pipe the packet to the next action |
150 | action police rate 1kbit burst 9k pipe \ | |
151 | # | |
152 | # which marks the packet fwmark as 2 and pipes | |
153 | action ipt -j mark --set-mark 2 \ | |
154 | # | |
155 | # next attempt to borrow b/width from a meter | |
156 | # used across all flows incoming on eth0("index 30") | |
157 | # and if that is exceeded we pipe to the next action | |
158 | action police index 30 mtu 5000 rate 1kbit burst 10k pipe \ | |
159 | # mark it as fwmark 3 if exceeded | |
160 | action ipt -j mark --set-mark 3 \ | |
161 | # and then attempt to borrow from a meter used by all devices in the | |
162 | # system. Should this be exceeded, drop the packet on the floor. | |
163 | action police index 20 mtu 5000 rate 1kbit burst 90k drop | |
e5cd5a51 | 164 | --------------------------------- |
143969f2 | 165 | |
e5cd5a51 | 166 | Now lets see the actions installed with |
143969f2 SH |
167 | "tc filter show parent ffff: dev eth0" |
168 | ||
169 | -------- output ----------- | |
170 | jroot# tc filter show parent ffff: dev eth0 | |
e5cd5a51 SH |
171 | filter protocol ip pref 1 u32 |
172 | filter protocol ip pref 1 u32 fh 800: ht divisor 1 | |
173 | filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 | |
143969f2 | 174 | |
e5cd5a51 | 175 | action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING |
143969f2 SH |
176 | target MARK set 0x1 index 2 |
177 | ||
e5cd5a51 | 178 | action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb |
143969f2 | 179 | |
e5cd5a51 | 180 | action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING |
143969f2 SH |
181 | target MARK set 0x2 index 1 |
182 | ||
e5cd5a51 | 183 | action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b |
143969f2 | 184 | |
e5cd5a51 | 185 | action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING |
143969f2 SH |
186 | target MARK set 0x3 index 3 |
187 | ||
e5cd5a51 | 188 | action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b |
143969f2 SH |
189 | |
190 | match 0a000015/ffffffff at 12 | |
191 | ------------------------------- | |
192 | ||
193 | Note the ordering of the actions is based on the order in which we entered | |
194 | them. In the future i will add explicit priorities. | |
195 | ||
196 | Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after | |
197 | you see a few lines of dots | |
198 | ||
199 | ---- | |
200 | [root@jzny hadi]# ping -f 10.0.0.22 | |
201 | PING 10.0.0.22 (10.0.0.22): 56 data bytes | |
202 | .................................................................................................................................................................................................................................................................................................................................................................................................................................................... | |
203 | --- 10.0.0.22 ping statistics --- | |
204 | 2248 packets transmitted, 1811 packets received, 19% packet loss | |
205 | round-trip min/avg/max = 0.7/9.3/20.1 ms | |
206 | ----------------------------- | |
207 | ||
208 | Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0" | |
209 | ||
210 | -------------- | |
211 | jroot# tc -s filter show parent ffff: dev eth0 | |
e5cd5a51 SH |
212 | filter protocol ip pref 1 u32 |
213 | filter protocol ip pref 1 u32 fh 800: ht divisor 1 | |
143969f2 | 214 | filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 |
e5cd5a51 | 215 | 5 |
143969f2 | 216 | |
e5cd5a51 | 217 | action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING |
143969f2 | 218 | target MARK set 0x1 index 2 |
e5cd5a51 | 219 | Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0) |
143969f2 | 220 | |
e5cd5a51 SH |
221 | action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb |
222 | Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122) | |
143969f2 | 223 | |
e5cd5a51 | 224 | action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING |
143969f2 | 225 | target MARK set 0x2 index 1 |
e5cd5a51 | 226 | Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0) |
143969f2 | 227 | |
e5cd5a51 SH |
228 | action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b |
229 | Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945) | |
143969f2 | 230 | |
e5cd5a51 | 231 | action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING |
143969f2 | 232 | target MARK set 0x3 index 3 |
e5cd5a51 | 233 | Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0) |
143969f2 | 234 | |
e5cd5a51 SH |
235 | action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b |
236 | Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437) | |
143969f2 SH |
237 | |
238 | match 0a000015/ffffffff at 12 | |
239 | ------------------------------- | |
240 | ||
241 | Neat, eh? | |
242 | ||
243 | ||
9d715cf6 | 244 | Want to write an action module? |
143969f2 SH |
245 | ------------------------------ |
246 | Its easy. Either look at the code or send me email. I will document at | |
247 | some point; will also accept documentation. | |
248 | ||
249 | TODO | |
250 | ---- | |
251 | ||
252 | Lotsa goodies/features coming. Requests also being accepted. | |
253 | At the moment the focus has been on getting the architecture in place. | |
254 | Expect new things in the spurious time i have to work on this | |
255 | (particularly around end of year when i have typically get time off | |
256 | from work). |