]>
Commit | Line | Data |
---|---|---|
aba5acdf SH |
1 | \documentstyle[12pt,twoside]{article} |
2 | \def\TITLE{Tunnels over IP} | |
3 | \input preamble | |
4 | \begin{center} | |
5 | \Large\bf Tunnels over IP in Linux-2.2 | |
6 | \end{center} | |
7 | ||
8 | ||
9 | \begin{center} | |
10 | { \large Alexey~N.~Kuznetsov } \\ | |
11 | \em Institute for Nuclear Research, Moscow \\ | |
12 | \verb|kuznet@ms2.inr.ac.ru| \\ | |
13 | \rm March 17, 1999 | |
14 | \end{center} | |
15 | ||
16 | \vspace{5mm} | |
17 | ||
18 | \tableofcontents | |
19 | ||
20 | ||
21 | \section{Instead of introduction: micro-FAQ.} | |
22 | ||
23 | \begin{itemize} | |
24 | ||
25 | \item | |
26 | Q: In linux-2.0.36 I used: | |
27 | \begin{verbatim} | |
28 | ifconfig tunl1 10.0.0.1 pointopoint 193.233.7.65 | |
29 | \end{verbatim} | |
30 | to create tunnel. It does not work in 2.2.0! | |
31 | ||
32 | A: You are right, it does not work. The command written above is split to two commands. | |
33 | \begin{verbatim} | |
34 | ip tunnel add MY-TUNNEL mode ipip remote 193.233.7.65 | |
35 | \end{verbatim} | |
36 | will create tunnel device with name \verb|MY-TUNNEL|. Now you may configure | |
37 | it with: | |
38 | \begin{verbatim} | |
39 | ifconfig MY-TUNNEL 10.0.0.1 | |
40 | \end{verbatim} | |
41 | Certainly, if you prefer name \verb|tunl1| to \verb|MY-TUNNEL|, | |
42 | you still may use it. | |
43 | ||
44 | \item | |
45 | Q: In linux-2.0.36 I used: | |
46 | \begin{verbatim} | |
47 | ifconfig tunl0 10.0.0.1 | |
48 | route add -net 10.0.0.0 gw 193.233.7.65 dev tunl0 | |
49 | \end{verbatim} | |
50 | to tunnel net 10.0.0.0 via router 193.233.7.65. It does not | |
51 | work in 2.2.0! Moreover, \verb|route| prints a funny error sort of | |
52 | ``network unreachable'' and after this I found a strange direct route | |
53 | to 10.0.0.0 via \verb|tunl0| in routing table. | |
54 | ||
55 | A: Yes, in 2.2 the rule that {\em normal} gateway must reside on directly | |
56 | connected network has not any exceptions. You may tell kernel, that | |
57 | this particular route is {\em abnormal}: | |
58 | \begin{verbatim} | |
59 | ifconfig tunl0 10.0.0.1 netmask 255.255.255.255 | |
60 | ip route add 10.0.0.0/8 via 193.233.7.65 dev tunl0 onlink | |
61 | \end{verbatim} | |
62 | Note keyword \verb|onlink|, it is the magic key that orders kernel | |
63 | not to check for consistency of gateway address. | |
64 | Probably, after this explanation you have already guessed another method | |
65 | to cheat kernel: | |
66 | \begin{verbatim} | |
67 | ifconfig tunl0 10.0.0.1 netmask 255.255.255.255 | |
68 | route add -host 193.233.7.65 dev tunl0 | |
69 | route add -net 10.0.0.0 netmask 255.0.0.0 gw 193.233.7.65 | |
70 | route del -host 193.233.7.65 dev tunl0 | |
71 | \end{verbatim} | |
72 | Well, if you like such tricks, nobody may prohibit you to use them. | |
73 | Only do not forget | |
74 | that between \verb|route add| and \verb|route del| host 193.233.7.65 is | |
75 | unreachable. | |
76 | ||
77 | \item | |
78 | Q: In 2.0.36 I used to load \verb|tunnel| device module and \verb|ipip| module. | |
79 | I cannot find any \verb|tunnel| in 2.2! | |
80 | ||
81 | A: Linux-2.2 has single module \verb|ipip| for both directions of tunneling | |
82 | and for all IPIP tunnel devices. | |
83 | ||
84 | \item | |
85 | Q: \verb|traceroute| does not work over tunnel! Well, stop... It works, | |
86 | only skips some number of hops. | |
87 | ||
88 | A: Yes. By default tunnel driver copies \verb|ttl| value from | |
89 | inner packet to outer one. It means that path traversed by tunneled | |
90 | packets to another endpoint is not hidden. If you dislike this, or if you | |
91 | are going to use some routing protocol expecting that packets | |
92 | with ttl 1 will reach peering host (f.e.\ RIP, OSPF or EBGP) | |
93 | and you are not afraid of | |
94 | tunnel loops, you may append option \verb|ttl 64|, when creating tunnel | |
95 | with \verb|ip tunnel add|. | |
96 | ||
97 | \item | |
98 | Q: ... Well, list of things, which 2.0 was able to do finishes. | |
99 | ||
100 | \end{itemize} | |
101 | ||
102 | \paragraph{Summary of differences between 2.2 and 2.0.} | |
103 | ||
104 | \begin{itemize} | |
105 | ||
106 | \item {\bf In 2.0} you could compile tunnel device into kernel | |
107 | and got set of 4 devices \verb|tunl0| ... \verb|tunl3| or, | |
108 | alternatively, compile it as module and load new module | |
109 | for each new tunnel. Also, module \verb|ipip| was necessary | |
110 | to receive tunneled packets. | |
111 | ||
112 | {\bf 2.2} has {\em one\/} module \verb|ipip|. Loading it you get base | |
113 | tunnel device \verb|tunl0| and another tunnels may be created with command | |
114 | \verb|ip tunnel add|. These new devices may have arbitrary names. | |
115 | ||
116 | ||
117 | \item {\bf In 2.0} you set remote tunnel endpoint address with | |
118 | the command \verb|ifconfig| ... \verb|pointopoint A|. | |
119 | ||
120 | {\bf In 2.2} this command has the same semantics on all | |
121 | the interfaces, namely it sets not tunnel endpoint, | |
122 | but address of peering host, which is directly reachable | |
123 | via this tunnel, | |
124 | rather than via Internet. Actual tunnel endpoint address \verb|A| | |
125 | should be set with \verb|ip tunnel add ... remote A|. | |
126 | ||
127 | \item {\bf In 2.0} you create tunnel routes with the command: | |
128 | \begin{verbatim} | |
129 | route add -net 10.0.0.0 gw A dev tunl0 | |
130 | \end{verbatim} | |
131 | ||
132 | {\bf 2.2} interprets this command equally for all device | |
133 | kinds and gateway is required to be directly reachable via this tunnel, | |
134 | rather than via Internet. You still may use \verb|ip route add ... onlink| | |
135 | to override this behaviour. | |
136 | ||
137 | \end{itemize} | |
138 | ||
139 | ||
140 | \section{Tunnel setup: basics} | |
141 | ||
142 | Standard Linux-2.2 kernel supports three flavor of tunnels, | |
143 | listed in the following table: | |
144 | \vspace{2mm} | |
145 | ||
146 | \begin{tabular}{lll} | |
147 | \vrule depth 0.8ex width 0pt\relax | |
148 | Mode & Description & Base device \\ | |
149 | ipip & IP over IP & tunl0 \\ | |
150 | sit & IPv6 over IP & sit0 \\ | |
151 | gre & ANY over GRE over IP & gre0 | |
152 | \end{tabular} | |
153 | ||
154 | \vspace{2mm} | |
155 | ||
156 | \noindent All the kinds of tunnels are created with one command: | |
157 | \begin{verbatim} | |
158 | ip tunnel add <NAME> mode <MODE> [ local <S> ] [ remote <D> ] | |
159 | \end{verbatim} | |
160 | ||
161 | This command creates new tunnel device with name \verb|<NAME>|. | |
162 | The \verb|<NAME>| is an arbitrary string. Particularly, | |
163 | it may be even \verb|eth0|. The rest of parameters set | |
164 | different tunnel characteristics. | |
165 | ||
166 | \begin{itemize} | |
167 | ||
168 | \item | |
169 | \verb|mode <MODE>| sets tunnel mode. Three modes are available now | |
170 | \verb|ipip|, \verb|sit| and \verb|gre|. | |
171 | ||
172 | \item | |
173 | \verb|remote <D>| sets remote endpoint of the tunnel to IP | |
174 | address \verb|<D>|. | |
175 | \item | |
176 | \verb|local <S>| sets fixed local address for tunneled | |
177 | packets. It must be an address on another interface of this host. | |
178 | ||
179 | \end{itemize} | |
180 | ||
181 | \let\thefootnote\oldthefootnote | |
182 | ||
183 | Both \verb|remote| and \verb|local| may be omitted. In this case we | |
184 | say that they are zero or wildcard. Two tunnels of one mode cannot | |
185 | have the same \verb|remote| and \verb|local|. Particularly it means | |
186 | that base device or fallback tunnel cannot be replicated.\footnote{ | |
187 | This restriction is relaxed for keyed GRE tunnels.} | |
188 | ||
189 | Tunnels are divided to two classes: {\bf pointopoint} tunnels, which | |
190 | have some not wildcard \verb|remote| address and deliver all the packets | |
191 | to this destination, and {\bf NBMA} (i.e. Non-Broadcast Multi-Access) tunnels, | |
192 | which have no \verb|remote|. Particularly, base devices (f.e.\ \verb|tunl0|) | |
193 | are NBMA, because they have neither \verb|remote| nor | |
194 | \verb|local| addresses. | |
195 | ||
196 | ||
197 | After tunnel device is created you should configure it as you did | |
198 | it with another devices. Certainly, the configuration of tunnels has | |
199 | some features related to the fact that they work over existing Internet | |
200 | routing infrastructure and simultaneously create new virtual links, | |
201 | which changes this infrastructure. The danger that not enough careful | |
202 | tunnel setup will result in formation of tunnel loops, | |
203 | collapse of routing or flooding network with exponentially | |
204 | growing number of tunneled fragments is very real. | |
205 | ||
206 | ||
207 | Protocol setup on pointopoint tunnels does not differ of configuration | |
208 | of another devices. You should set a protocol address with \verb|ifconfig| | |
209 | and add routes with \verb|route| utility. | |
210 | ||
211 | NBMA tunnels are different. To route something via NBMA tunnel | |
212 | you have to explain to driver, where it should deliver packets to. | |
213 | The only way to make it is to create special routes with gateway | |
214 | address pointing to desired endpoint. F.e.\ | |
215 | \begin{verbatim} | |
216 | ip route add 10.0.0.0/24 via <A> dev tunl0 onlink | |
217 | \end{verbatim} | |
218 | It is important to use option \verb|onlink|, otherwise | |
219 | kernel will refuse request to create route via gateway not directly | |
220 | reachable over device \verb|tunl0|. With IPv6 the situation is much simpler: | |
221 | when you start device \verb|sit0|, it automatically configures itself | |
222 | with all IPv4 addresses mapped to IPv6 space, so that all IPv4 | |
223 | Internet is {\em really reachable} via \verb|sit0|! Excellent, the command | |
224 | \begin{verbatim} | |
225 | ip route add 3FFE::/16 via ::193.233.7.65 dev sit0 | |
226 | \end{verbatim} | |
227 | will route \verb|3FFE::/16| via \verb|sit0|, sending all the packets | |
228 | destined to this prefix to 193.233.7.65. | |
229 | ||
230 | \section{Tunnel setup: options} | |
231 | ||
232 | Command \verb|ip tunnel add| has several additional options. | |
233 | \begin{itemize} | |
234 | ||
235 | \item \verb|ttl N| --- set fixed TTL \verb|N| on tunneled packets. | |
236 | \verb|N| is number in the range 1--255. 0 is special value, | |
237 | meaning that packets inherit TTL value. | |
238 | Default value is: \verb|inherit|. | |
239 | ||
240 | \item \verb|tos T| --- set fixed tos \verb|T| on tunneled packets. | |
241 | Default value is: \verb|inherit|. | |
242 | ||
243 | \item \verb|dev DEV| --- bind tunnel to device \verb|DEV|, so that | |
244 | tunneled packets will be routed only via this device and will | |
245 | not be able to escape to another device, when route to endpoint changes. | |
246 | ||
247 | \item \verb|nopmtudisc| --- disable Path MTU Discovery on this tunnel. | |
248 | It is enabled by default. Note that fixed ttl is incompatible | |
249 | with this option: tunnels with fixed ttl always make pmtu discovery. | |
250 | ||
251 | \end{itemize} | |
252 | ||
253 | \verb|ipip| and \verb|sit| tunnels have no more options. \verb|gre| | |
254 | tunnels are more complicated: | |
255 | ||
256 | \begin{itemize} | |
257 | ||
258 | \item \verb|key K| --- use keyed GRE with key \verb|K|. \verb|K| is | |
259 | either number or IP address-like dotted quad. | |
260 | ||
261 | \item \verb|csum| --- checksum tunneled packets. | |
262 | ||
263 | \item \verb|seq| --- serialize packets. | |
264 | \begin{NB} | |
265 | I think this option does not | |
266 | work. At least, I did not test it, did not debug it and | |
267 | even do not understand, how it is supposed to work and for what | |
268 | purpose Cisco planned to use it. | |
269 | \end{NB} | |
270 | ||
271 | \end{itemize} | |
272 | ||
273 | ||
274 | Actually, these GRE options can be set separately for input and | |
275 | output directions by prefixing corresponding keywords with letter | |
276 | \verb|i| or \verb|o|. F.e.\ \verb|icsum| orders to accept only | |
277 | packets with correct checksum and \verb|ocsum| means, that | |
278 | our host will calculate and send checksum. | |
279 | ||
280 | Command \verb|ip tunnel add| is not the only operation, | |
281 | which can be made with tunnels. Certainly, you may get short help page | |
282 | with: | |
283 | \begin{verbatim} | |
284 | ip tunnel help | |
285 | \end{verbatim} | |
286 | ||
287 | Besides that, you may view list of installed tunnels with the help of command: | |
288 | \begin{verbatim} | |
289 | ip tunnel ls | |
290 | \end{verbatim} | |
291 | Also you may look at statistics: | |
292 | \begin{verbatim} | |
293 | ip -s tunnel ls Cisco | |
294 | \end{verbatim} | |
295 | where \verb|Cisco| is name of tunnel device. Command | |
296 | \begin{verbatim} | |
297 | ip tunnel del Cisco | |
298 | \end{verbatim} | |
299 | destroys tunnel \verb|Cisco|. And, finally, | |
300 | \begin{verbatim} | |
301 | ip tunnel change Cisco mode sit local ME remote HE ttl 32 | |
302 | \end{verbatim} | |
303 | changes its parameters. | |
304 | ||
305 | \section{Differences 2.2 and 2.0 tunnels revisited.} | |
306 | ||
307 | Now we can discuss more subtle differences between tunneling in 2.0 | |
308 | and 2.2. | |
309 | ||
310 | \begin{itemize} | |
311 | ||
312 | \item In 2.0 all tunneled packets were received promiscuously | |
313 | as soon as you loaded module \verb|ipip|. 2.2 tries to select the best | |
314 | tunnel device and packet looks as received on this. F.e.\ if host | |
315 | received \verb|ipip| packet from host \verb|D| destined to our | |
316 | local address \verb|S|, kernel searches for matching tunnels | |
317 | in order: | |
318 | ||
319 | \begin{tabular}{ll} | |
320 | 1 & \verb|remote| is \verb|D| and \verb|local| is \verb|S| \\ | |
321 | 2 & \verb|remote| is \verb|D| and \verb|local| is wildcard \\ | |
322 | 3 & \verb|remote| is wildcard and \verb|local| is \verb|S| \\ | |
323 | 4 & \verb|tunl0| | |
324 | \end{tabular} | |
325 | ||
326 | If tunnel exists, but it is not in \verb|UP| state, the tunnel is ignored. | |
327 | Note, that if \verb|tunl0| is \verb|UP| it receives all the IPIP packets, | |
328 | not acknowledged by more specific tunnels. | |
329 | Be careful, it means that without carefully installed firewall rules | |
330 | anyone on the Internet may inject to your network any packets with | |
331 | source addresses indistinguishable from local ones. It is not so bad idea | |
332 | to design tunnels in the way enforcing maximal route symmetry | |
333 | and to enable reversed path filter (\verb|rp_filter| sysctl option) on | |
334 | tunnel devices. | |
335 | ||
336 | \item In 2.2 you can monitor and debug tunnels with \verb|tcpdump|. | |
337 | F.e.\ \verb|tcpdump| \verb|-i Cisco| \verb|-nvv| will dump packets, | |
338 | which kernel output, via tunnel \verb|Cisco| and the packets received on it | |
339 | from kernel viewpoint. | |
340 | ||
341 | \end{itemize} | |
342 | ||
343 | ||
344 | \section{Linux and Cisco IOS tunnels.} | |
345 | ||
346 | Among another tunnels Cisco IOS supports IPIP and GRE. | |
347 | Essentially, Cisco setup is subset of options, available for Linux. | |
348 | Let us consider the simplest example: | |
349 | ||
350 | \begin{verbatim} | |
351 | interface Tunnel0 | |
352 | tunnel mode gre ip | |
353 | tunnel source 10.10.14.1 | |
354 | tunnel destination 10.10.13.2 | |
355 | \end{verbatim} | |
356 | ||
357 | ||
358 | This command set translates to: | |
359 | ||
360 | \begin{verbatim} | |
361 | ip tunnel add Tunnel0 \ | |
362 | mode gre \ | |
363 | local 10.10.14.1 \ | |
364 | remote 10.10.13.2 | |
365 | \end{verbatim} | |
366 | ||
367 | Any questions? No questions. | |
368 | ||
369 | \section{Interaction IPIP tunnels and DVMRP.} | |
370 | ||
371 | DVMRP exploits IPIP tunnels to route multicasts via Internet. | |
372 | \verb|mrouted| creates | |
373 | IPIP tunnels listed in its configuration file automatically. | |
374 | From kernel and user viewpoints there are no differences between | |
375 | tunnels, created in this way, and tunnels created by \verb|ip tunnel|. | |
376 | I.e.\ if \verb|mrouted| created some tunnel, it may be used to | |
377 | route unicast packets, provided appropriate routes are added. | |
378 | And vice versa, if administrator has already created a tunnel, | |
379 | it will be reused by \verb|mrouted|, if it requests DVMRP | |
380 | tunnel with the same local and remote addresses. | |
381 | ||
382 | Do not wonder, if your manually configured tunnel is | |
383 | destroyed, when mrouted exits. | |
384 | ||
385 | ||
386 | \section{Broadcast GRE ``tunnels''.} | |
387 | ||
388 | It is possible to set \verb|remote| for GRE tunnel to a multicast | |
389 | address. Such tunnel becomes {\bf broadcast} tunnel (though word | |
390 | tunnel is not quite appropriate in this case, it is rather virtual network). | |
391 | \begin{verbatim} | |
392 | ip tunnel add Universe local 193.233.7.65 \ | |
393 | remote 224.66.66.66 ttl 16 | |
394 | ip addr add 10.0.0.1/16 dev Universe | |
395 | ip link set Universe up | |
396 | \end{verbatim} | |
397 | This tunnel is true broadcast network and broadcast packets are | |
398 | sent to multicast group 224.66.66.66. By default such tunnel starts | |
399 | to resolve both IP and IPv6 addresses via ARP/NDISC, so that | |
400 | if multicast routing is supported in surrounding network, all GRE nodes | |
401 | will find one another automatically and will form virtual Ethernet-like | |
402 | broadcast network. If multicast routing does not work, it is unpleasant | |
403 | but not fatal flaw. The tunnel becomes NBMA rather than broadcast network. | |
404 | You may disable dynamic ARPing by: | |
405 | \begin{verbatim} | |
406 | echo 0 > /proc/sys/net/ipv4/neigh/Universe/mcast_solicit | |
407 | \end{verbatim} | |
408 | and to add required information to ARP tables manually: | |
409 | \begin{verbatim} | |
410 | ip neigh add 10.0.0.2 lladdr 128.6.190.2 dev Universe nud permanent | |
411 | \end{verbatim} | |
412 | In this case packets sent to 10.0.0.2 will be encapsulated in GRE | |
413 | and sent to 128.6.190.2. It is possible to facilitate address resolution | |
414 | using methods typical for another NBMA networks f.e.\ to start user | |
415 | level \verb|arpd| daemon, which will maintain database of hosts attached | |
416 | to GRE virtual network or ask for information | |
417 | dedicated ARP or NHRP server. | |
418 | ||
419 | ||
420 | Actually, such setup is the most natural for tunneling, | |
421 | it is really flexible, scalable and easily managable, so that | |
422 | it is strongly recommended to be used with GRE tunnels instead of ugly | |
423 | hack with NBMA mode and \verb|onlink| modifier. Unfortunately, | |
424 | by historical reasons broadcast mode is not supported by IPIP tunnels, | |
425 | but this probably will change in future. | |
426 | ||
427 | ||
428 | ||
429 | \section{Traffic control issues.} | |
430 | ||
431 | Tunnels are devices, hence all the power of Linux traffic control | |
432 | applies to them. The simplest (and the most useful in practice) | |
433 | example is limiting tunnel bandwidth. The following command: | |
434 | \begin{verbatim} | |
435 | tc qdisc add dev tunl0 root tbf \ | |
436 | rate 128Kbit burst 4K limit 10K | |
437 | \end{verbatim} | |
438 | will limit tunneled traffic to 128Kbit with maximal burst size of 4K | |
439 | and queuing not more than 10K. | |
440 | ||
441 | However, you should remember, that tunnels are {\em virtual} devices | |
442 | implemented in software and true queue management is impossible for them | |
443 | just because they have no queues. Instead, it is better to create classes | |
444 | on real physical interfaces and to map tunneled packets to them. | |
445 | In general case of dynamic routing you should create such classes | |
446 | on all outgoing interfaces, or, alternatively, | |
447 | to use option \verb|dev DEV| to bind tunnel to a fixed physical device. | |
448 | In the last case packets will be routed only via specified device | |
449 | and you need to setup corresponding classes only on it. | |
450 | Though you have to pay for this convenience, | |
451 | if routing will change, your tunnel will fail. | |
452 | ||
453 | Suppose that CBQ class \verb|1:ABC| has been created on device \verb|eth0| | |
454 | specially for tunnel \verb|Cisco| with endpoints \verb|S| and \verb|D|. | |
455 | Now you can select IPIP packets with addresses \verb|S| and \verb|D| | |
456 | with some classifier and map them to class \verb|1:ABC|. F.e.\ | |
457 | it is easy to make with \verb|rsvp| classifier: | |
458 | \begin{verbatim} | |
459 | tc filter add dev eth0 pref 100 proto ip rsvp \ | |
460 | session D ipproto ipip filter S \ | |
461 | classid 1:ABC | |
462 | \end{verbatim} | |
463 | ||
464 | If you want to make more detailed classification of sub-flows | |
465 | transmitted via tunnel, you can build CBQ subtree, | |
466 | rooted at \verb|1:ABC| and attach to subroot set of rules parsing | |
467 | IPIP packets more deeply. | |
468 | ||
469 | \end{document} |