1 \documentstyle[12pt,twoside
]{article
}
2 \def\TITLE{IP Command Reference
}
5 \Large\bf IP Command Reference.
10 { \large Alexey~N.~Kuznetsov
} \\
11 \em Institute for Nuclear Research, Moscow \\
12 \verb|kuznet@ms2.inr.ac.ru| \\
22 \section{About this
document}
24 This
document presents a comprehensive description of the
\verb|ip| utility
25 from the
\verb|iproute2| package. It is not a tutorial or user's guide.
26 It is a
{\em dictionary\/
}, not explaining terms,
27 but translating them into other terms, which may also be unknown to the reader.
28 However, the
document is self-contained and the reader, provided they have a
29 basic networking background, will find enough information
30 and examples to understand and configure Linux-
2.2 IP and IPv6
33 This
document is split into sections explaining
\verb|ip| commands
34 and options, decrypting
\verb|ip| output and containing a few examples.
35 More voluminous examples and some topics, which require more elaborate
36 discussion, are in the appendix.
38 The paragraphs beginning with NB contain side notes, warnings about
39 bugs and design drawbacks. They may be skipped at the first reading.
41 \section{{\tt ip
} --- command syntax
}
43 The generic form of an
\verb|ip| command is:
45 ip
[ OPTIONS
] OBJECT
[ COMMAND
[ ARGUMENTS
]]
47 where
\verb|OPTIONS| is a set of optional modifiers affecting the
48 general behaviour of the
\verb|ip| utility or changing its output. All options
49 begin with the character
\verb|'-'| and may be used in either long or abbreviated
50 forms. Currently, the following options are available:
53 \item \verb|-V|,
\verb|-Version|
55 --- print the version of the
\verb|ip| utility and exit.
58 \item \verb|-s|,
\verb|-stats|,
\verb|-statistics|
60 --- output more information. If the option
61 appears twice or more, the amount of information increases.
62 As a rule, the information is statistics or some time values.
65 \item \verb|-f|,
\verb|-family| followed by a protocol family
66 identifier:
\verb|inet|,
\verb|inet6| or
\verb|link|.
68 --- enforce the protocol family to use. If the option is not present,
69 the protocol family is guessed from other arguments. If the rest of the command
70 line does not give enough information to guess the family,
\verb|ip| falls back to the default
71 one, usually
\verb|inet| or
\verb|any|.
\verb|link| is a special family
72 identifier meaning that no networking protocol is involved.
76 --- shortcut for
\verb|-family inet|.
80 --- shortcut for
\verb|-family inet6|.
84 --- shortcut for
\verb|-family link|.
87 \item \verb|-o|,
\verb|-oneline|
89 --- output each record on a single line, replacing line feeds
90 with the
\verb|'\'| character. This is convenient when you want to
91 count records with
\verb|wc| or to
\verb|grep| the output. The trivial
92 script
\verb|rtpr| converts the output back into readable form.
94 \item \verb|-r|,
\verb|-resolve|
96 --- use the system's name resolver to print DNS names instead of
100 Do not use this option when reporting bugs or asking for advice.
103 \verb|ip| never uses DNS to resolve names to addresses.
108 \verb|OBJECT| is the object to manage or to get information about.
109 The object types currently understood by
\verb|ip| are:
112 \item \verb|link| --- network device
113 \item \verb|address| --- protocol (IP or IPv6) address on a device
114 \item \verb|neighbour| --- ARP or NDISC cache entry
115 \item \verb|route| --- routing table entry
116 \item \verb|rule| --- rule in routing policy database
117 \item \verb|maddress| --- multicast address
118 \item \verb|mroute| --- multicast routing cache entry
119 \item \verb|tunnel| --- tunnel over IP
122 Again, the names of all objects may be written in full or
123 abbreviated form, f.e.\
\verb|address| is abbreviated as
\verb|addr|
126 \verb|COMMAND| specifies the action to perform on the object.
127 The set of possible actions depends on the object type.
128 As a rule, it is possible to
\verb|add|,
\verb|delete| and
129 \verb|show| (or
\verb|list|) objects, but some objects
130 do not allow all of these operations or have some additional commands.
131 The
\verb|help| command is available for all objects. It prints
132 out a list of available commands and argument syntax conventions.
134 If no command is given, some default command is assumed.
135 Usually it is
\verb|list| or, if the objects of this class
136 cannot be listed,
\verb|help|.
138 \verb|ARGUMENTS| is a list of arguments to the command.
139 The arguments depend on the command and object. There are two types of arguments:
140 {\em flags\/
}, consisting of a single keyword, and
{\em parameters\/
},
141 consisting of a keyword followed by a value. For convenience,
142 each command has some
{\em default parameter\/
}
143 which may be omitted. F.e.\ parameter
\verb|dev| is the default
144 for the
{\tt ip link
} command, so
{\tt ip link ls eth0
} is equivalent
145 to
{\tt ip link ls dev eth0
}.
146 In the command descriptions below such parameters
147 are distinguished with the marker: ``(default)''.
149 Almost all keywords may be abbreviated with several first (or even single)
150 letters. The shortcuts are convenient when
\verb|ip| is used interactively,
151 but they are not recommended in scripts or when reporting bugs
152 or asking for advice. ``Officially'' allowed abbreviations are listed
153 in the
document body.
157 \section{{\tt ip
} --- error messages
}
159 \verb|ip| may fail for one of the following reasons:
163 A syntax error on the command line: an unknown keyword, incorrectly formatted
164 IP address
{\em et al\/
}. In this case
\verb|ip| prints an error message
165 and exits. As a rule, the error message will contain information
166 about the reason for the failure. Sometimes it also prints a help page.
169 The arguments did not pass verification for self-consistency.
172 \verb|ip| failed to compile a kernel request from the arguments
173 because the user didn't give enough information.
176 The kernel returned an error to some syscall. In this case
\verb|ip|
177 prints the error message, as it is output with
\verb|perror(
3)|,
178 prefixed with a comment and a syscall identifier.
181 The kernel returned an error to some RTNETLINK request.
182 In this case
\verb|ip| prints the error message, as it is output
183 with
\verb|perror(
3)| prefixed with ``RTNETLINK answers:''.
187 All the operations are atomic, i.e.\
188 if the
\verb|ip| utility fails, it does not change anything
189 in the system. One harmful exception is
\verb|ip link| command
190 (Sec.
\ref{IP-LINK
}, p.
\pageref{IP-LINK
}),
191 which may change only some of the device parameters given
194 It is difficult to list all the error messages (especially
195 syntax errors). However, as a rule, their meaning is clear
196 from the context of the command.
198 The most common mistakes are:
201 \item Netlink is not configured in the kernel. The message is:
203 Cannot open netlink socket: Invalid value
206 \item RTNETLINK is not configured in the kernel. In this case
207 one of the following messages may be printed, depending on the command:
209 Cannot talk to rtnetlink: Connection refused
210 Cannot send dump request: Connection refused
213 \item The
\verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
214 when configuring the kernel. In this case any attempt to use the
215 \verb|ip|
\verb|rule| command will fail, f.e.
217 kuznet@kaiser $ ip rule list
218 RTNETLINK error: Invalid argument
225 \section{{\tt ip link
} --- network device configuration
}
228 \paragraph{Object:
} A
\verb|link| is a network device and the corresponding
229 commands display and change the state of devices.
231 \paragraph{Commands:
} \verb|set| and
\verb|show| (or
\verb|list|).
233 \subsection{{\tt ip link set
} --- change device attributes
}
235 \paragraph{Abbreviations:
} \verb|set|,
\verb|s|.
237 \paragraph{Arguments:
}
240 \item \verb|dev NAME| (default)
242 ---
\verb|NAME| specifies the network device on which to operate.
244 \item \verb|up| and
\verb|down|
246 --- change the state of the device to
\verb|UP| or
\verb|DOWN|.
248 \item \verb|arp on| or
\verb|arp off|
250 --- change the
\verb|NOARP| flag on the device.
253 This operation is
{\em not allowed\/
} if the device is in state
\verb|UP|.
254 Though neither the
\verb|ip| utility nor the kernel check for this condition.
255 You can get unpredictable results changing this flag while the
259 \item \verb|multicast on| or
\verb|multicast off|
261 --- change the
\verb|MULTICAST| flag on the device.
263 \item \verb|dynamic on| or
\verb|dynamic off|
265 --- change the
\verb|DYNAMIC| flag on the device.
267 \item \verb|name NAME|
269 --- change the name of the device. This operation is not
270 recommended if the device is running or has some addresses
273 \item \verb|txqueuelen NUMBER| or
\verb|txqlen NUMBER|
275 --- change the transmit queue length of the device.
277 \item \verb|mtu NUMBER|
279 --- change the MTU of the device.
281 \item \verb|address LLADDRESS|
283 --- change the station address of the interface.
285 \item \verb|broadcast LLADDRESS|,
\verb|brd LLADDRESS| or
\verb|peer LLADDRESS|
287 --- change the link layer broadcast address or the peer address when
288 the interface is
\verb|POINTOPOINT|.
292 For most devices (f.e.\ for Ethernet) changing the link layer
293 broadcast address will break networking.
294 Do not use it, if you do not understand what this operation really does.
301 The
\verb|PROMISC| and
\verb|ALLMULTI| flags are considered
302 obsolete and should not be changed administratively, though
303 the
{\tt ip
} utility will allow that.
306 \paragraph{Warning:
} If multiple parameter changes are requested,
307 \verb|ip| aborts immediately after any of the changes have failed.
308 This is the only case when
\verb|ip| can move the system to
309 an unpredictable state. The solution is to avoid changing
310 several parameters with one
{\tt ip link set
} call.
312 \paragraph{Examples:
}
314 \item \verb|ip link set dummy address
00:
00:
00:
00:
00:
01|
316 --- change the station address of the interface
\verb|dummy|.
318 \item \verb|ip link set dummy up|
320 --- start the interface
\verb|dummy|.
325 \subsection{{\tt ip link show
} --- display device attributes
}
328 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|lst|,
\verb|sh|,
\verb|ls|,
331 \paragraph{Arguments:
}
333 \item \verb|dev NAME| (default)
335 ---
\verb|NAME| specifies the network device to show.
336 If this argument is omitted all devices are listed.
340 --- only display running interfaces.
345 \paragraph{Output format:
}
348 kuznet@alisa:~ $ ip link ls eth0
349 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
350 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
351 kuznet@alisa:~ $ ip link ls sit0
352 5: sit0@NONE: <NOARP,UP> mtu
1480 qdisc noqueue
353 link/sit
0.0.0.0 brd
0.0.0.0
354 kuznet@alisa:~ $ ip link ls dummy
355 2: dummy: <BROADCAST,NOARP> mtu
1500 qdisc noop
356 link/ether
00:
00:
00:
00:
00:
00 brd ff:ff:ff:ff:ff:ff
361 The number before each colon is an
{\em interface index\/
} or
{\em ifindex\/
}.
362 This number uniquely identifies the interface. This is followed by the
{\em interface name\/
}
363 (
\verb|eth0|,
\verb|sit0| etc.). The interface name is also
364 unique at every given moment. However, the interface may disappear from the
365 list (f.e.\ when the corresponding driver module is unloaded) and another
366 one with the same name may be created later. Besides that,
367 the administrator may change the name of any device with
368 \verb|ip|
\verb|link|
\verb|set|
\verb|name|
369 to make it more intelligible.
371 The interface name may have another name or
\verb|NONE| appended
372 after the
\verb|@| sign. This means that this device is bound to some other
374 i.e.\ packets send through it are encapsulated and sent via the ``master''
375 device. If the name is
\verb|NONE|, the master is unknown.
377 Then we see the interface
{\em mtu\/
} (``maximal transfer unit''). This determines
378 the maximal size of data which can be sent as a single packet over this interface.
380 {\em qdisc\/
} (``queuing discipline'') shows the queuing algorithm used
381 on the interface. Particularly,
\verb|noqueue| means that this interface
382 does not queue anything and
\verb|noop| means that the interface is in blackhole
383 mode i.e.\ all packets sent to it are immediately discarded.
384 {\em qlen\/
} is the default transmit queue length of the device measured
387 The interface flags are summarized in the angle brackets.
390 \item \verb|UP| --- the device is turned on. It is ready to accept
391 packets for transmission and it may inject into the kernel packets received
392 from other nodes on the network.
394 \item \verb|LOOPBACK| --- the interface does not communicate with other
395 hosts. All packets sent through it will be returned
396 and nothing but bounced packets can be received.
398 \item \verb|BROADCAST| --- the device has the facility to send packets
399 to all hosts sharing the same link. A typical example is an Ethernet link.
401 \item \verb|POINTOPOINT| --- the link has only two ends with one node
402 attached to each end. All packets sent to this link will reach the peer
403 and all packets received by us came from this single peer.
405 If neither
\verb|LOOPBACK| nor
\verb|BROADCAST| nor
\verb|POINTOPOINT|
406 are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
407 This is the most generic type of device and the most complicated one, because
408 the host attached to a NBMA link has no means to send to anyone
409 without additionally configured information.
411 \item \verb|MULTICAST| --- is an advisory flag indicating that the interface
412 is aware of multicasting i.e.\ sending packets to some subset of neighbouring
413 nodes. Broadcasting is a particular case of multicasting, where the multicast
414 group consists of all nodes on the link. It is important to emphasize
415 that software
{\em must not\/
} interpret the absence of this flag as the inability
416 to use multicasting on this interface. Any
\verb|POINTOPOINT| and
417 \verb|BROADCAST| link is multicasting by definition, because we have
418 direct access to all the neighbours and, hence, to any part of them.
419 Certainly, the use of high bandwidth multicast transfers is not recommended
420 on broadcast-only links because of high expense, but it is not strictly
423 \item \verb|PROMISC| --- the device listens to and feeds to the kernel all
424 traffic on the link even if it is not destined for us, not broadcasted
425 and not destined for a multicast group of which we are member. Usually
426 this mode exists only on broadcast links and is used by bridges and for network
429 \item \verb|ALLMULTI| --- the device receives all multicast packets
430 wandering on the link. This mode is used by multicast routers.
432 \item \verb|NOARP| --- this flag is different from the other ones. It has
433 no invariant value and its interpretation depends on the network protocols
434 involved. As a rule, it indicates that the device needs no address
435 resolution and that the software or hardware knows how to deliver packets
436 without any help from the protocol stacks.
438 \item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
439 dynamically created and destroyed.
441 \item \verb|SLAVE| --- this interface is bonded to some other interfaces
442 to share link capacities.
448 There are other flags but they are either obsolete (
\verb|NOTRAILERS|)
449 or not implemented (
\verb|DEBUG|) or specific to some devices
450 (
\verb|MASTER|,
\verb|AUTOMEDIA| and
\verb|PORTSEL|). We do not discuss
455 The second line contains information on the link layer addresses
456 associated with the device. The first word (
\verb|ether|,
\verb|sit|)
457 defines the interface hardware type. This type determines the format and semantics
458 of the addresses and is logically part of the address.
459 The default format of the station address and the broadcast address
460 (or the peer address for pointopoint links) is a
461 sequence of hexadecimal bytes separated by colons, but some link
462 types may have their natural address format, f.e.\ addresses
463 of tunnels over IP are printed as dotted-quad IP addresses.
467 NBMA links have no well-defined broadcast or peer address,
468 however this field may contain useful information, f.e.\
469 about the address of broadcast relay or about the address of the ARP server.
472 Multicast addresses are not shown by this command, see
473 \verb|ip maddr ls| in~Sec.
\ref{IP-MADDR
} (p.
\pageref{IP-MADDR
} of this
478 \paragraph{Statistics:
} With the
\verb|-statistics| option,
\verb|ip| also
479 prints interface statistics:
482 kuznet@alisa:~ $ ip -s link ls eth0
483 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
484 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
485 RX: bytes packets errors dropped overrun mcast
486 2449949362 2786187 0 0 0 0
487 TX: bytes packets errors dropped carrier collsns
488 178558497 1783945 332 0 332 35172
491 \verb|RX:| and
\verb|TX:| lines summarize receiver and transmitter
492 statistics. They contain:
494 \item \verb|bytes| --- the total number of bytes received or transmitted
495 on the interface. This number wraps when the maximal length of the data type
496 natural for the architecture is exceeded, so continuous monitoring requires
497 a user level daemon snapping it periodically.
498 \item \verb|packets| --- the total number of packets received or transmitted
500 \item \verb|errors| --- the total number of receiver or transmitter errors.
501 \item \verb|dropped| --- the total number of packets dropped due to lack
503 \item \verb|overrun| --- the total number of receiver overruns resulting
504 in dropped packets. As a rule, if the interface is overrun, it means
505 serious problems in the kernel or that your machine is too slow
507 \item \verb|mcast| --- the total number of received multicast packets. This option
508 is only supported by a few devices.
509 \item \verb|carrier| --- total number of link media failures f.e.\ because
511 \item \verb|collsns| --- the total number of collision events
512 on Ethernet-like media. This number may have a different sense on other
514 \item \verb|compressed| --- the total number of compressed packets. This is
515 available only for links using VJ header compression.
519 If the
\verb|-s| option is entered twice or more,
520 \verb|ip| prints more detailed statistics on receiver
521 and transmitter errors.
524 kuznet@alisa:~ $ ip -s -s link ls eth0
525 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
526 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
527 RX: bytes packets errors dropped overrun mcast
528 2449949362 2786187 0 0 0 0
529 RX errors: length crc frame fifo missed
531 TX: bytes packets errors dropped carrier collsns
532 178558497 1783945 332 0 332 35172
533 TX errors: aborted fifo window heartbeat
537 These error names are pure Ethernetisms. Other devices
538 may have non zero values in these fields but they may be
539 interpreted differently.
542 \section{{\tt ip address
} --- protocol address management
}
544 \paragraph{Abbreviations:
} \verb|address|,
\verb|addr|,
\verb|a|.
546 \paragraph{Object:
} The
\verb|address| is a protocol (IP or IPv6) address attached
547 to a network device. Each device must have at least one address
548 to use the corresponding protocol. It is possible to have several
549 different addresses attached to one device. These addresses are not
550 discriminated, so that the term
{\em alias\/
} is not quite appropriate
551 for them and we do not use it in this
document.
553 The
\verb|ip addr| command displays addresses and their properties,
554 adds new addresses and deletes old ones.
556 \paragraph{Commands:
} \verb|add|,
\verb|delete|,
\verb|flush| and
\verb|show|
560 \subsection{{\tt ip address add
} --- add a new protocol address
}
563 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|.
565 \paragraph{Arguments:
}
568 \item \verb|dev NAME|
570 \noindent--- the name of the device to add the address to.
572 \item \verb|local ADDRESS| (default)
574 --- the address of the interface. The format of the address depends
575 on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
576 separated by colons for IPv6. The
\verb|ADDRESS| may be followed by
577 a slash and a decimal number which encodes the network prefix length.
580 \item \verb|peer ADDRESS|
582 --- the address of the remote endpoint for pointopoint interfaces.
583 Again, the
\verb|ADDRESS| may be followed by a slash and a decimal number,
584 encoding the network prefix length. If a peer address is specified,
585 the local address
{\em cannot\/
} have a prefix length. The network prefix is associated
586 with the peer rather than with the local address.
589 \item \verb|broadcast ADDRESS|
591 --- the broadcast address on the interface.
593 It is possible to use the special symbols
\verb|'+'| and
\verb|'-'|
594 instead of the broadcast address. In this case, the broadcast address
595 is derived by setting/resetting the host bits of the interface prefix.
599 Unlike
\verb|ifconfig|, the
\verb|ip| utility
{\em does not\/
} set any broadcast
600 address unless explicitly requested.
604 \item \verb|label NAME|
606 --- Each address may be tagged with a label string.
607 In order to preserve compatibility with Linux-
2.0 net aliases,
608 this string must coincide with the name of the device or must be prefixed
609 with the device name followed by colon.
612 \item \verb|scope SCOPE_VALUE|
614 --- the scope of the area where this address is valid.
615 The available scopes are listed in file
\verb|/etc/iproute2/rt_scopes|.
616 Predefined scope values are:
619 \item \verb|global| --- the address is globally valid.
620 \item \verb|site| --- (IPv6 only) the address is site local,
621 i.e.\ it is valid inside this site.
622 \item \verb|link| --- the address is link local, i.e.\
623 it is valid only on this device.
624 \item \verb|host| --- the address is valid only inside this host.
627 Appendix~
\ref{ADDR-SEL
} (p.
\pageref{ADDR-SEL
} of this
document)
628 contains more details on address scopes.
632 \paragraph{Examples:
}
634 \item \verb|ip addr add
127.0.0.1/
8 dev lo brd + scope host|
636 --- add the usual loopback address to the loopback device.
638 \item \verb|ip addr add
10.0.0.1/
24 brd + dev eth0 label eth0:Alias|
640 --- add the address
10.0.0.1 with prefix length
24 (i.e.\ netmask
641 \verb|
255.255.255.0|), standard broadcast and label
\verb|eth0:Alias|
642 to the interface
\verb|eth0|.
646 \subsection{{\tt ip address delete
} --- delete a protocol address
}
648 \paragraph{Abbreviations:
} \verb|delete|,
\verb|del|,
\verb|d|.
650 \paragraph{Arguments:
} coincide with the arguments of
\verb|ip addr add|.
651 The device name is a required argument. The rest are optional.
652 If no arguments are given, the first address is deleted.
654 \paragraph{Examples:
}
656 \item \verb|ip addr del
127.0.0.1/
8 dev lo|
658 --- deletes the loopback address from the loopback device.
659 It would be best not to repeat this experiment.
661 \item Disable IP on the interface
\verb|eth0|:
663 while ip -f inet addr del dev eth0; do
667 Another method to disable IP on an interface using
{\tt ip addr flush
}
668 may be found in sec.
\ref{IP-ADDR-FLUSH
}, p.
\pageref{IP-ADDR-FLUSH
}.
673 \subsection{{\tt ip address show
} --- display protocol addresses
}
675 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|lst|,
\verb|sh|,
\verb|ls|,
678 \paragraph{Arguments:
}
681 \item \verb|dev NAME| (default)
683 --- the name of the device.
685 \item \verb|scope SCOPE_VAL|
687 --- only list addresses with this scope.
689 \item \verb|to PREFIX|
691 --- only list addresses matching this prefix.
693 \item \verb|label PATTERN|
695 --- only list addresses with labels matching the
\verb|PATTERN|.
696 \verb|PATTERN| is a usual shell style pattern.
699 \item \verb|dynamic| and
\verb|permanent|
701 --- (IPv6 only) only list addresses installed due to stateless
702 address configuration or only list permanent (not dynamic) addresses.
704 \item \verb|tentative|
706 --- (IPv6 only) only list addresses which did not pass duplicate
709 \item \verb|deprecated|
711 --- (IPv6 only) only list deprecated addresses.
714 \item \verb|primary| and
\verb|secondary|
716 --- only list primary (or secondary) addresses.
721 \paragraph{Output format:
}
724 kuznet@alisa:~ $ ip addr ls eth0
725 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
726 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
727 inet
193.233.7.90/
24 brd
193.233.7.255 scope global eth0
728 inet6
3ffe:
2400:
0:
1:
2a0:ccff:fe66:
1878/
64 scope global dynamic
729 valid_lft forever preferred_lft
604746sec
730 inet6 fe80::
2a0:ccff:fe66:
1878/
10 scope link
734 The first two lines coincide with the output of
\verb|ip link ls|.
735 It is natural to interpret link layer addresses
736 as addresses of the protocol family
\verb|AF_PACKET|.
738 Then the list of IP and IPv6 addresses follows, accompanied by
739 additional address attributes: scope value (see Sec.
\ref{IP-ADDR-ADD
},
740 p.
\pageref{IP-ADDR-ADD
} above), flags and the address label.
742 Address flags are set by the kernel and cannot be changed
743 administratively. Currently, the following flags are defined:
746 \item \verb|secondary|
748 --- the address is not used when selecting the default source address
749 of outgoing packets (Cf.\ Appendix~
\ref{ADDR-SEL
}, p.
\pageref{ADDR-SEL
}.).
750 An IP address becomes secondary if another address with the same
751 prefix bits already exists. The first address is primary.
752 It is the leader of the group of all secondary addresses. When the leader
753 is deleted, all secondaries are purged too.
754 There is a tweak in
\verb|/proc/sys/net/ipv4/conf/<dev>/promote_secondaries|
755 which activate secondaries promotion when a primary is deleted.
756 To permanently enable this feature on all devices add
757 \verb|net.ipv4.conf.all.promote_secondaries=
1| to
\verb|/etc/sysctl.conf|.
758 This tweak is available in linux
2.6.15 and later.
763 --- the address was created due to stateless autoconfiguration~
\cite{RFC-ADDRCONF
}.
764 In this case the output also contains information on times, when
765 the address is still valid. After
\verb|preferred_lft| expires the address is
766 moved to the deprecated state. After
\verb|valid_lft| expires the address
767 is finally invalidated.
769 \item \verb|deprecated|
771 --- the address is deprecated, i.e.\ it is still valid, but cannot
772 be used by newly created connections.
774 \item \verb|tentative|
776 --- the address is not used because duplicate address detection~
\cite{RFC-ADDRCONF
}
777 is still not complete or failed.
782 \subsection{{\tt ip address flush
} --- flush protocol addresses
}
783 \label{IP-ADDR-FLUSH
}
785 \paragraph{Abbreviations:
} \verb|flush|,
\verb|f|.
787 \paragraph{Description:
}This command flushes the protocol addresses
788 selected by some criteria.
790 \paragraph{Arguments:
} This command has the same arguments as
\verb|show|.
791 The difference is that it does not run when no arguments are given.
793 \paragraph{Warning:
} This command (and other
\verb|flush| commands
794 described below) is pretty dangerous. If you make a mistake, it will
795 not forgive it, but will cruelly purge all the addresses.
797 \paragraph{Statistics:
} With the
\verb|-statistics| option, the command
798 becomes verbose. It prints out the number of deleted addresses and the number
799 of rounds made to flush the address list. If this option is given
800 twice,
\verb|ip addr flush| also dumps all the deleted addresses
801 in the format described in the previous subsection.
803 \paragraph{Example:
} Delete all the addresses from the private network
806 netadm@amber:~ # ip -s -s a f to
10/
8
807 2: dummy inet
10.7.7.7/
16 brd
10.7.255.255 scope global dummy
808 3: eth0 inet
10.10.7.7/
16 brd
10.10.255.255 scope global eth0
809 4: eth1 inet
10.8.7.7/
16 brd
10.8.255.255 scope global eth1
811 *** Round
1, deleting
3 addresses ***
812 *** Flush is complete after
1 round ***
815 Another instructive example is disabling IP on all the Ethernets:
817 netadm@amber:~ # ip -
4 addr flush label "eth*"
819 And the last example shows how to flush all the IPv6 addresses
820 acquired by the host from stateless address autoconfiguration
821 after you enabled forwarding or disabled autoconfiguration.
823 netadm@amber:~ # ip -
6 addr flush dynamic
828 \section{{\tt ip neighbour
} --- neighbour/arp tables management
}
830 \paragraph{Abbreviations:
} \verb|neighbour|,
\verb|neighbor|,
\verb|neigh|,
833 \paragraph{Object:
} \verb|neighbour| objects establish bindings between protocol
834 addresses and link layer addresses for hosts sharing the same link.
835 Neighbour entries are organized into tables. The IPv4 neighbour table
836 is known by another name --- the ARP table.
838 The corresponding commands display neighbour bindings
839 and their properties, add new neighbour entries and delete old ones.
841 \paragraph{Commands:
} \verb|add|,
\verb|change|,
\verb|replace|,
842 \verb|delete|,
\verb|flush| and
\verb|show| (or
\verb|list|).
844 \paragraph{See also:
} Appendix~
\ref{PROXY-NEIGH
}, p.
\pageref{PROXY-NEIGH
}
845 describes how to manage proxy ARP/NDISC with the
\verb|ip| utility.
848 \subsection{{\tt ip neighbour add
} --- add a new neighbour entry\\
849 {\tt ip neighbour change
} --- change an existing entry\\
850 {\tt ip neighbour replace
} --- add a new entry or change an existing one
}
852 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|change|,
\verb|chg|;
853 \verb|replace|,
\verb|repl|.
855 \paragraph{Description:
} These commands create new neighbour records
856 or update existing ones.
858 \paragraph{Arguments:
}
861 \item \verb|to ADDRESS| (default)
863 --- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
865 \item \verb|dev NAME|
867 --- the interface to which this neighbour is attached.
870 \item \verb|lladdr LLADDRESS|
872 --- the link layer address of the neighbour.
\verb|LLADDRESS| can also be
875 \item \verb|nud NUD_STATE|
877 --- the state of the neighbour entry.
\verb|nud| is an abbreviation for ``Neighbour
878 Unreachability Detection''. The state can take one of the following values:
881 \item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
883 \item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
884 this entry will be made but it can be removed when its lifetime expires.
885 \item \verb|reachable| --- the neighbour entry is valid until the reachability
887 \item \verb|stale| --- the neighbour entry is valid but suspicious.
888 This option to
\verb|ip neigh| does not change the neighbour state if
889 it was valid and the address is not changed by this command.
894 \paragraph{Examples:
}
896 \item \verb|ip neigh add
10.0.0.3 lladdr
0:
0:
0:
0:
0:
1 dev eth0 nud perm|
898 --- add a permanent ARP entry for the neighbour
10.0.0.3 on the device
\verb|eth0|.
900 \item \verb|ip neigh chg
10.0.0.3 dev eth0 nud reachable|
902 --- change its state to
\verb|reachable|.
906 \subsection{{\tt ip neighbour delete
} --- delete a neighbour entry
}
908 \paragraph{Abbreviations:
} \verb|delete|,
\verb|del|,
\verb|d|.
910 \paragraph{Description:
} This command invalidates a neighbour entry.
912 \paragraph{Arguments:
} The arguments are the same as with
\verb|ip neigh add|,
913 except that
\verb|lladdr| and
\verb|nud| are ignored.
918 \item \verb|ip neigh del
10.0.0.3 dev eth0|
920 --- invalidate an ARP entry for the neighbour
10.0.0.3 on the device
\verb|eth0|.
925 The deleted neighbour entry will not disappear from the tables
926 immediately. If it is in use it cannot be deleted until the last
927 client releases it. Otherwise it will be destroyed during
928 the next garbage collection.
932 \paragraph{Warning:
} Attempts to delete or manually change
933 a
\verb|noarp| entry created by the kernel may result in unpredictable behaviour.
934 Particularly, the kernel may try to resolve this address even
935 on a
\verb|NOARP| interface or if the address is multicast or broadcast.
938 \subsection{{\tt ip neighbour show
} --- list neighbour entries
}
940 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|.
942 \paragraph{Description:
}This commands displays neighbour tables.
944 \paragraph{Arguments:
}
948 \item \verb|to ADDRESS| (default)
950 --- the prefix selecting the neighbours to list.
952 \item \verb|dev NAME|
954 --- only list the neighbours attached to this device.
958 --- only list neighbours which are not currently in use.
960 \item \verb|nud NUD_STATE|
962 --- only list neighbour entries in this state.
\verb|NUD_STATE| takes
963 values listed below or the special value
\verb|all| which means all states.
964 This option may occur more than once. If this option is absent,
\verb|ip|
965 lists all entries except for
\verb|none| and
\verb|noarp|.
970 \paragraph{Output format:
}
973 kuznet@alisa:~ $ ip neigh ls
974 :: dev lo lladdr
00:
00:
00:
00:
00:
00 nud noarp
975 fe80::
200:cff:fe76:
3f85 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 router \
977 0.0.0.0 dev lo lladdr
00:
00:
00:
00:
00:
00 nud noarp
978 193.233.7.254 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 nud reachable
979 193.233.7.85 dev eth0 lladdr
00:e0:
1e:
63:
39:
00 nud stale
983 The first word of each line is the protocol address of the neighbour.
984 Then the device name follows. The rest of the line describes the contents of
985 the neighbour entry identified by the pair (device, address).
987 \verb|lladdr| is the link layer address of the neighbour.
989 \verb|nud| is the state of the ``neighbour unreachability detection'' machine
990 for this entry. The detailed description of the neighbour
991 state machine can be found in~
\cite{RFC-NDISC
}. Here is the full list
992 of the states with short descriptions:
995 \item\verb|none| --- the state of the neighbour is void.
996 \item\verb|incomplete| --- the neighbour is in the process of resolution.
997 \item\verb|reachable| --- the neighbour is valid and apparently reachable.
998 \item\verb|stale| --- the neighbour is valid, but is probably already
999 unreachable, so the kernel will try to check it at the first transmission.
1000 \item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
1002 \item\verb|probe| --- the delay timer expired but no confirmation was received.
1003 The kernel has started to probe the neighbour with ARP/NDISC messages.
1004 \item\verb|failed| --- resolution has failed.
1005 \item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1007 \item\verb|permanent| --- it is a
\verb|noarp| entry, but only the administrator
1008 may remove the entry from the neighbour table.
1011 The link layer address is valid in all states except for
\verb|none|,
1012 \verb|failed| and
\verb|incomplete|.
1014 IPv6 neighbours can be marked with the additional flag
\verb|router|
1015 which means that the neighbour introduced itself as an IPv6 router~
\cite{RFC-NDISC
}.
1017 \paragraph{Statistics:
} The
\verb|-statistics| option displays some usage
1021 kuznet@alisa:~ $ ip -s n ls
193.233.7.254
1022 193.233.7.254 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 ref
5 used
12/
13/
20 \
1027 Here
\verb|ref| is the number of users of this entry
1028 and
\verb|used| is a triplet of time intervals in seconds
1029 separated by slashes. In this case they show that:
1032 \item the entry was used
12 seconds ago.
1033 \item the entry was confirmed
13 seconds ago.
1034 \item the entry was updated
20 seconds ago.
1037 \subsection{{\tt ip neighbour flush
} --- flush neighbour entries
}
1039 \paragraph{Abbreviations:
} \verb|flush|,
\verb|f|.
1041 \paragraph{Description:
}This command flushes neighbour tables, selecting
1042 entries to flush by some criteria.
1044 \paragraph{Arguments:
} This command has the same arguments as
\verb|show|.
1045 The differences are that it does not run when no arguments are given,
1046 and that the default neighbour states to be flushed do not include
1047 \verb|permanent| and
\verb|noarp|.
1050 \paragraph{Statistics:
} With the
\verb|-statistics| option, the command
1051 becomes verbose. It prints out the number of deleted neighbours and the number
1052 of rounds made to flush the neighbour table. If the option is given
1053 twice,
\verb|ip neigh flush| also dumps all the deleted neighbours
1054 in the format described in the previous subsection.
1056 \paragraph{Example:
}
1058 netadm@alisa:~ # ip -s -s n f
193.233.7.254
1059 193.233.7.254 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 ref
5 used
12/
13/
20 \
1062 *** Round
1, deleting
1 entries ***
1063 *** Flush is complete after
1 round ***
1068 \section{{\tt ip route
} --- routing table management
}
1071 \paragraph{Abbreviations:
} \verb|route|,
\verb|ro|,
\verb|r|.
1073 \paragraph{Object:
} \verb|route| entries in the kernel routing tables keep
1074 information about paths to other networked nodes.
1076 Each route entry has a
{\em key\/
} consisting of a
{\em prefix\/
}
1077 (i.e.\ a pair containing a network address and the length of its mask) and,
1078 optionally, the TOS value. An IP packet matches the route if the highest
1079 bits of its destination address are equal to the route prefix at least
1080 up to the prefix length and if the TOS of the route is zero or equal to
1081 the TOS of the packet.
1083 If several routes match the packet, the following pruning rules
1084 are used to select the best one (see~
\cite{RFC1812
}):
1086 \item The longest matching prefix is selected. All shorter ones
1089 \item If the TOS of some route with the longest prefix is equal to the TOS
1090 of the packet, the routes with different TOS are dropped.
1092 If no exact TOS match was found and routes with TOS=
0 exist,
1093 the rest of routes are pruned.
1095 Otherwise, the route lookup fails.
1097 \item If several routes remain after the previous steps, then
1098 the routes with the best preference values are selected.
1100 \item If we still have several routes, then the
{\em first\/
} of them
1104 Note the ambiguity of the last step. Unfortunately, Linux
1105 historically allows such a bizarre situation. The sense of the
1106 word ``first'' depends on the order of route additions and it is practically
1107 impossible to maintain a bundle of such routes in this order.
1110 For simplicity we will limit ourselves to the case where such a situation
1111 is impossible and routes are uniquely identified by the triplet
1112 \
{prefix, tos, preference\
}. Actually, it is impossible to create
1113 non-unique routes with
\verb|ip| commands described in this section.
1115 One useful exception to this rule is the default route on non-forwarding
1116 hosts. It is ``officially'' allowed to have several fallback routes
1117 when several routers are present on directly connected networks.
1118 In this case, Linux-
2.2 makes ``dead gateway detection''~
\cite{RFC1122
}
1119 controlled by neighbour unreachability detection and by advice
1120 from transport protocols to select a working router, so the order
1121 of the routes is not essential. However, in this case,
1122 fiddling with default routes manually is not recommended. Use the Router Discovery
1123 protocol (see Appendix~
\ref{EXAMPLE-SETUP
}, p.
\pageref{EXAMPLE-SETUP
})
1124 instead. Actually, Linux-
2.2 IPv6 does not give user level applications
1125 any access to default routes.
1128 Certainly, the steps above are not performed exactly
1129 in this sequence. Instead, the routing table in the kernel is kept
1130 in some data structure to achieve the final result
1131 with minimal cost. However, not depending on a particular
1132 routing algorithm implemented in the kernel, we can summarize
1133 the statements above as: a route is identified by the triplet
1134 \
{prefix, tos, preference\
}. This
{\em key\/
} lets us locate
1135 the route in the routing table.
1137 \paragraph{Route attributes:
} Each route key refers to a routing
1138 information record containing
1139 the data required to deliver IP packets (f.e.\ output device and
1140 next hop router) and some optional attributes (f.e. the path MTU or
1141 the preferred source address when communicating with this destination).
1142 These attributes are described in the following subsection.
1144 \paragraph{Route types:
} \label{IP-ROUTE-TYPES
}
1145 It is important that the set
1146 of required and optional attributes depend on the route
{\em type\/
}.
1147 The most important route type
1148 is
\verb|unicast|. It describes real paths to other hosts.
1149 As a rule, common routing tables contain only such routes. However,
1150 there are other types of routes with different semantics. The
1151 full list of types understood by Linux-
2.2 is:
1153 \item \verb|unicast| --- the route entry describes real paths to the
1154 destinations covered by the route prefix.
1155 \item \verb|unreachable| --- these destinations are unreachable. Packets
1156 are discarded and the ICMP message
{\em host unreachable\/
} is generated.
1157 The local senders get an
\verb|EHOSTUNREACH| error.
1158 \item \verb|blackhole| --- these destinations are unreachable. Packets
1159 are discarded silently. The local senders get an
\verb|EINVAL| error.
1160 \item \verb|prohibit| --- these destinations are unreachable. Packets
1161 are discarded and the ICMP message
{\em communication administratively
1162 prohibited\/
} is generated. The local senders get an
\verb|EACCES| error.
1163 \item \verb|local| --- the destinations are assigned to this
1164 host. The packets are looped back and delivered locally.
1165 \item \verb|broadcast| --- the destinations are broadcast addresses.
1166 The packets are sent as link broadcasts.
1167 \item \verb|throw| --- a special control route used together with policy
1168 rules (see sec.
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}). If such a route is selected, lookup
1169 in this table is terminated pretending that no route was found.
1170 Without policy routing it is equivalent to the absence of the route in the routing
1171 table. The packets are dropped and the ICMP message
{\em net unreachable\/
}
1172 is generated. The local senders get an
\verb|ENETUNREACH| error.
1173 \item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1174 are considered to be dummy (or external) addresses which require translation
1175 to real (or internal) ones before forwarding. The addresses to translate to
1176 are selected with the attribute
\verb|via|. More about NAT is
1177 in Appendix~
\ref{ROUTE-NAT
}, p.
\pageref{ROUTE-NAT
}.
1178 \item \verb|anycast| --- (
{\em not implemented\/
}) the destinations are
1179 {\em anycast\/
} addresses assigned to this host. They are mainly equivalent
1180 to
\verb|local| with one difference: such addresses are invalid when used
1181 as the source address of any packet.
1182 \item \verb|multicast| --- a special type used for multicast routing.
1183 It is not present in normal routing tables.
1186 \paragraph{Route tables:
} Linux-
2.2 can pack routes into several routing
1187 tables identified by a number in the range from
1 to
255 or by
1188 name from the file
\verb|/etc/iproute2/rt_tables|. By default all normal
1189 routes are inserted into the
\verb|main| table (ID
254) and the kernel only uses
1190 this table when calculating routes.
1192 Actually, one other table always exists, which is invisible but
1193 even more important. It is the
\verb|local| table (ID
255). This table
1194 consists of routes for local and broadcast addresses. The kernel maintains
1195 this table automatically and the administrator usually need not modify it
1198 The multiple routing tables enter the game when
{\em policy routing\/
}
1199 is used. See sec.
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}.
1200 In this case, the table identifier effectively becomes
1201 one more parameter, which should be added to the triplet
1202 \
{prefix, tos, preference\
} to uniquely identify the route.
1205 \subsection{{\tt ip route add
} --- add a new route\\
1206 {\tt ip route change
} --- change a route\\
1207 {\tt ip route replace
} --- change a route or add a new one
}
1208 \label{IP-ROUTE-ADD
}
1210 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|change|,
\verb|chg|;
1211 \verb|replace|,
\verb|repl|.
1214 \paragraph{Arguments:
}
1216 \item \verb|to PREFIX| or
\verb|to TYPE PREFIX| (default)
1218 --- the destination prefix of the route. If
\verb|TYPE| is omitted,
1219 \verb|ip| assumes type
\verb|unicast|. Other values of
\verb|TYPE|
1220 are listed above.
\verb|PREFIX| is an IP or IPv6 address optionally followed
1221 by a slash and the prefix length. If the length of the prefix is missing,
1222 \verb|ip| assumes a full-length host route. There is also a special
1223 \verb|PREFIX| ---
\verb|default| --- which is equivalent to IP
\verb|
0/
0| or
1224 to IPv6
\verb|::/
0|.
1226 \item \verb|tos TOS| or
\verb|dsfield TOS|
1228 --- the Type Of Service (TOS) key. This key has no associated mask and
1229 the longest match is understood as: First, compare the TOS
1230 of the route and of the packet. If they are not equal, then the packet
1231 may still match a route with a zero TOS.
\verb|TOS| is either an
8 bit hexadecimal
1232 number or an identifier from
{\tt /etc/iproute2/rt
\_dsfield}.
1235 \item \verb|metric NUMBER| or
\verb|preference NUMBER|
1237 --- the preference value of the route.
\verb|NUMBER| is an arbitrary
32bit number.
1239 \item \verb|table TABLEID|
1241 --- the table to add this route to.
1242 \verb|TABLEID| may be a number or a string from the file
1243 \verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1244 \verb|ip| assumes the
\verb|main| table, with the exception of
1245 \verb|local|,
\verb|broadcast| and
\verb|nat| routes, which are
1246 put into the
\verb|local| table by default.
1248 \item \verb|dev NAME|
1250 --- the output device name.
1252 \item \verb|via ADDRESS|
1254 --- the address of the nexthop router. Actually, the sense of this field depends
1255 on the route type. For normal
\verb|unicast| routes it is either the true nexthop
1256 router or, if it is a direct route installed in BSD compatibility mode,
1257 it can be a local address of the interface.
1258 For NAT routes it is the first address of the block of translated IP destinations.
1260 \item \verb|src ADDRESS|
1262 --- the source address to prefer when sending to the destinations
1263 covered by the route prefix.
1265 \item \verb|realm REALMID|
1267 --- the realm to which this route is assigned.
1268 \verb|REALMID| may be a number or a string from the file
1269 \verb|/etc/iproute2/rt_realms|. Sec.
\ref{RT-REALMS
} (p.
\pageref{RT-REALMS
})
1270 contains more information on realms.
1272 \item \verb|mtu MTU| or
\verb|mtu lock MTU|
1274 --- the MTU along the path to the destination. If the modifier
\verb|lock| is
1275 not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1276 If the modifier
\verb|lock| is used, no path MTU discovery will be tried,
1277 all packets will be sent without the DF bit in IPv4 case
1278 or fragmented to MTU for IPv6.
1280 \item \verb|window NUMBER|
1282 --- the maximal window for TCP to advertise to these destinations,
1283 measured in bytes. It limits maximal data bursts that our TCP
1284 peers are allowed to send to us.
1286 \item \verb|rtt NUMBER|
1288 --- the initial RTT (``Round Trip Time'') estimate.
1291 \item \verb|rttvar NUMBER|
1293 ---
\threeonly the initial RTT variance estimate.
1296 \item \verb|ssthresh NUMBER|
1298 ---
\threeonly an estimate for the initial slow start threshold.
1301 \item \verb|cwnd NUMBER|
1303 ---
\threeonly the clamp for congestion window. It is ignored if the
\verb|lock|
1307 \item \verb|advmss NUMBER|
1309 ---
\threeonly the MSS (``Maximal Segment Size'') to advertise to these
1310 destinations when establishing TCP connections. If it is not given,
1311 Linux uses a default value calculated from the first hop device MTU.
1314 If the path to these destination is asymmetric, this guess may be wrong.
1317 \item \verb|reordering NUMBER|
1319 ---
\threeonly Maximal reordering on the path to this destination.
1320 If it is not given, Linux uses the value selected with
\verb|sysctl|
1321 variable
\verb|net/ipv4/tcp_reordering|.
1325 \item \verb|nexthop NEXTHOP|
1327 --- the nexthop of a multipath route.
\verb|NEXTHOP| is a complex value
1328 with its own syntax similar to the top level argument lists:
1330 \item \verb|via ADDRESS| is the nexthop router.
1331 \item \verb|dev NAME| is the output device.
1332 \item \verb|weight NUMBER| is a weight for this element of a multipath
1333 route reflecting its relative bandwidth or quality.
1336 \item \verb|scope SCOPE_VAL|
1338 --- the scope of the destinations covered by the route prefix.
1339 \verb|SCOPE_VAL| may be a number or a string from the file
1340 \verb|/etc/iproute2/rt_scopes|.
1341 If this parameter is omitted,
1342 \verb|ip| assumes scope
\verb|global| for all gatewayed
\verb|unicast|
1343 routes, scope
\verb|link| for direct
\verb|unicast| and
\verb|broadcast| routes
1344 and scope
\verb|host| for
\verb|local| routes.
1346 \item \verb|protocol RTPROTO|
1348 --- the routing protocol identifier of this route.
1349 \verb|RTPROTO| may be a number or a string from the file
1350 \verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1351 not given,
\verb|ip| assumes protocol
\verb|boot| (i.e.\
1352 it assumes the route was added by someone who doesn't
1353 understand what they are doing). Several protocol values have a fixed interpretation.
1356 \item \verb|redirect| --- the route was installed due to an ICMP redirect.
1357 \item \verb|kernel| --- the route was installed by the kernel during
1359 \item \verb|boot| --- the route was installed during the bootup sequence.
1360 If a routing daemon starts, it will purge all of them.
1361 \item \verb|static| --- the route was installed by the administrator
1362 to override dynamic routing. Routing daemon will respect them
1363 and, probably, even advertise them to its peers.
1364 \item \verb|ra| --- the route was installed by Router Discovery protocol.
1366 The rest of the values are not reserved and the administrator is free
1367 to assign (or not to assign) protocol tags. At least, routing
1368 daemons should take care of setting some unique protocol values,
1369 f.e.\ as they are assigned in
\verb|rtnetlink.h| or in
\verb|rt_protos|
1375 --- pretend that the nexthop is directly attached to this link,
1376 even if it does not match any interface prefix. One application of this
1377 option may be found in~
\cite{IP-TUNNELS
}.
1379 \item \verb|equalize|
1381 --- allow packet by packet randomization on multipath routes.
1382 Without this modifier, the route will be frozen to one selected
1383 nexthop, so that load splitting will only occur on per-flow base.
1384 \verb|equalize| only works if the kernel is patched.
1391 Actually there are more commands:
\verb|prepend| does the same
1392 thing as classic
\verb|route add|, i.e.\ adds a route, even if another
1393 route to the same destination exists. Its opposite case is
\verb|append|,
1394 which adds the route to the end of the list. Avoid these
1398 More sad news, IPv6 only understands the
\verb|append| command correctly.
1399 All the others are translated into
\verb|append| commands. Certainly,
1400 this will change in the future.
1403 \paragraph{Examples:
}
1405 \item add a plain route to network
10.0.0/
24 via gateway
193.233.7.65
1407 ip route add
10.0.0/
24 via
193.233.7.65
1409 \item change it to a direct route via the
\verb|dummy| device
1411 ip ro chg
10.0.0/
24 dev dummy
1413 \item add a default multipath route splitting the load between
\verb|ppp0|
1416 ip route add default scope global nexthop dev ppp0 \
1419 Note the scope value. It is not necessary but it informs the kernel
1420 that this route is gatewayed rather than direct. Actually, if you
1421 know the addresses of remote endpoints it would be better to use the
1422 \verb|via| parameter.
1423 \item announce that the address
192.203.80.144 is not a real one, but
1424 should be translated to
193.233.7.83 before forwarding
1426 ip route add nat
192.203.80.144 via
193.233.7.83
1428 Backward translation is setup with policy rules described
1429 in the following section (sec.
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}).
1432 \subsection{{\tt ip route delete
} --- delete a route
}
1434 \paragraph{Abbreviations:
} \verb|delete|,
\verb|del|,
\verb|d|.
1436 \paragraph{Arguments:
} \verb|ip route del| has the same arguments as
1437 \verb|ip route add|, but their semantics are a bit different.
1439 Key values (
\verb|to|,
\verb|tos|,
\verb|preference| and
\verb|table|)
1440 select the route to delete. If optional attributes are present,
\verb|ip|
1441 verifies that they coincide with the attributes of the route to delete.
1442 If no route with the given key and attributes was found,
\verb|ip route del|
1445 Linux-
2.0 had the option to delete a route selected only by prefix address,
1446 ignoring its length (i.e.\ netmask). This option no longer exists
1447 because it was ambiguous. However, look at
{\tt ip route flush
}
1448 (sec.
\ref{IP-ROUTE-FLUSH
}, p.
\pageref{IP-ROUTE-FLUSH
}) which
1449 provides similar and even richer functionality.
1452 \paragraph{Example:
}
1454 \item delete the multipath route created by the command in previous subsection
1456 ip route del default scope global nexthop dev ppp0 \
1463 \subsection{{\tt ip route show
} --- list routes
}
1465 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
1467 \paragraph{Description:
} the command displays the contents of the routing tables
1468 or the route(s) selected by some criteria.
1471 \paragraph{Arguments:
}
1473 \item \verb|to SELECTOR| (default)
1475 --- only select routes from the given range of destinations.
\verb|SELECTOR|
1476 consists of an optional modifier (
\verb|root|,
\verb|match| or
\verb|exact|)
1477 and a prefix.
\verb|root PREFIX| selects routes with prefixes not shorter
1478 than
\verb|PREFIX|. F.e.\
\verb|root
0/
0| selects the entire routing table.
1479 \verb|match PREFIX| selects routes with prefixes not longer than
1480 \verb|PREFIX|. F.e.\
\verb|match
10.0/
16| selects
\verb|
10.0/
16|,
1481 \verb|
10/
8| and
\verb|
0/
0|, but it does not select
\verb|
10.1/
16| and
1482 \verb|
10.0.0/
24|. And
\verb|exact PREFIX| (or just
\verb|PREFIX|)
1483 selects routes with this exact prefix. If neither of these options
1484 are present,
\verb|ip| assumes
\verb|root
0/
0| i.e.\ it lists the entire table.
1487 \item \verb|tos TOS| or
\verb|dsfield TOS|
1489 --- only select routes with the given TOS.
1492 \item \verb|table TABLEID|
1494 --- show the routes from this table(s). The default setting is to show
1495 \verb|table|
\verb|main|.
\verb|TABLEID| may either be the ID of a real table
1496 or one of the special values:
1498 \item \verb|all| --- list all of the tables.
1499 \item \verb|cache| --- dump the routing cache.
1502 IPv6 has a single table. However, splitting it into
\verb|main|,
\verb|local|
1503 and
\verb|cache| is emulated by the
\verb|ip| utility.
1506 \item \verb|cloned| or
\verb|cached|
1508 --- list cloned routes i.e.\ routes which were dynamically forked from
1509 other routes because some route attribute (f.e.\ MTU) was updated.
1510 Actually, it is equivalent to
\verb|table cache|.
1512 \item \verb|from SELECTOR|
1514 --- the same syntax as for
\verb|to|, but it binds the source address range
1515 rather than destinations. Note that the
\verb|from| option only works with
1518 \item \verb|protocol RTPROTO|
1520 --- only list routes of this protocol.
1523 \item \verb|scope SCOPE_VAL|
1525 --- only list routes with this scope.
1527 \item \verb|type TYPE|
1529 --- only list routes of this type.
1531 \item \verb|dev NAME|
1533 --- only list routes going via this device.
1535 \item \verb|via PREFIX|
1537 --- only list routes going via the nexthop routers selected by
\verb|PREFIX|.
1539 \item \verb|src PREFIX|
1541 --- only list routes with preferred source addresses selected
1544 \item \verb|realm REALMID| or
\verb|realms FROMREALM/TOREALM|
1546 --- only list routes with these realms.
1550 \paragraph{Examples:
} Let us count routes of protocol
\verb|gated/bgp|
1553 kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1557 To count the size of the routing cache, we have to use the
\verb|-o| option
1558 because cached attributes can take more than one line of output:
1560 kuznet@amber:~ $ ip -o ro ls cloned | wc
1566 \paragraph{Output format:
} The output of this command consists
1567 of per route records separated by line feeds.
1568 However, some records may consist
1569 of more than one line: particularly, this is the case when the route
1570 is cloned or you requested additional statistics. If the
1571 \verb|-o| option was given, then line feeds separating lines inside
1572 records are replaced with the backslash sign.
1574 The output has the same syntax as arguments given to
{\tt ip route add
},
1575 so that it can be understood easily. F.e.\
1577 kuznet@amber:~ $ ip ro ls
193.233.7/
24
1578 193.233.7.0/
24 dev eth0 proto gated/conn scope link \
1579 src
193.233.7.65 realms inr.ac
1583 If you list cloned entries, the output contains other attributes which
1584 are evaluated during route calculation and updated during route
1585 lifetime. An example of the output is:
1587 kuznet@amber:~ $ ip ro ls
193.233.7.82 tab cache
1588 193.233.7.82 from
193.233.7.82 dev eth0 src
193.233.7.65 \
1589 realms inr.ac/inr.ac
1590 cache <src-direct,redirect> mtu
1500 rtt
300 iif eth0
1591 193.233.7.82 dev eth0 src
193.233.7.65 realms inr.ac
1592 cache mtu
1500 rtt
300
1596 \label{NB-strange-route
}
1597 The route looks a bit strange, doesn't it? Did you notice that
1598 it is a path from
193.233.7.82 back to
193.233.82? Well, you will
1599 see in the section on
\verb|ip route get| (p.
\pageref{NB-nature-of-strangeness
})
1602 The second line, starting with the word
\verb|cache|, shows
1603 additional attributes which normal routes do not possess.
1604 Cached flags are summarized in angle brackets:
1606 \item \verb|local| --- packets are delivered locally.
1607 It stands for loopback unicast routes, for broadcast routes
1608 and for multicast routes, if this host is a member of the corresponding
1611 \item \verb|reject| --- the path is bad. Any attempt to use it results
1612 in an error. See attribute
\verb|error| below (p.
\pageref{IP-ROUTE-GET-error
}).
1614 \item \verb|mc| --- the destination is multicast.
1616 \item \verb|brd| --- the destination is broadcast.
1618 \item \verb|src-direct| --- the source is on a directly connected
1621 \item \verb|redirected| --- the route was created by an ICMP Redirect.
1623 \item \verb|redirect| --- packets going via this route will
1624 trigger an ICMP redirect.
1626 \item \verb|fastroute| --- the route is eligible to be used for fastroute.
1628 \item \verb|equalize| --- make packet by packet randomization
1631 \item \verb|dst-nat| --- the destination address requires translation.
1633 \item \verb|src-nat| --- the source address requires translation.
1635 \item \verb|masq| --- the source address requires masquerading.
1636 This feature disappeared in linux-
2.4.
1638 \item \verb|notify| --- (
{\em not implemented
}) change/deletion
1639 of this route will trigger RTNETLINK notification.
1642 Then some optional attributes follow:
1644 \item \verb|error| --- on
\verb|reject| routes it is error code
1645 returned to local senders when they try to use this route.
1646 These error codes are translated into ICMP error codes, sent to remote
1647 senders, according to the rules described above in the subsection
1648 devoted to route types (p.
\pageref{IP-ROUTE-TYPES
}).
1649 \label{IP-ROUTE-GET-error
}
1651 \item \verb|expires| --- this entry will expire after this timeout.
1653 \item \verb|iif| --- the packets for this path are expected to arrive
1657 \paragraph{Statistics:
} With the
\verb|-statistics| option, more
1658 information about this route is shown:
1660 \item \verb|users| --- the number of users of this entry.
1661 \item \verb|age| --- shows when this route was last used.
1662 \item \verb|used| --- the number of lookups of this route since its creation.
1666 \subsection{{\tt ip route flush
} --- flush routing tables
}
1667 \label{IP-ROUTE-FLUSH
}
1669 \paragraph{Abbreviations:
} \verb|flush|,
\verb|f|.
1671 \paragraph{Description:
} this command flushes routes selected
1674 \paragraph{Arguments:
} the arguments have the same syntax and semantics
1675 as the arguments of
\verb|ip route show|, but routing tables are not
1676 listed but purged. The only difference is the default action:
\verb|show|
1677 dumps all the IP main routing table but
\verb|flush| prints the helper page.
1678 The reason for this difference does not require any explanation, does it?
1681 \paragraph{Statistics:
} With the
\verb|-statistics| option, the command
1682 becomes verbose. It prints out the number of deleted routes and the number
1683 of rounds made to flush the routing table. If the option is given
1684 twice,
\verb|ip route flush| also dumps all the deleted routes
1685 in the format described in the previous subsection.
1687 \paragraph{Examples:
} The first example flushes all the
1688 gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1690 netadm@amber:~ # ip -
4 ro flush scope global type unicast
1692 This option deserves to be put into a scriptlet
\verb|routef|.
1694 This option was described in the
\verb|route(
8)| man page borrowed
1695 from BSD, but was never implemented in Linux.
1698 The second example flushes all IPv6 cloned routes:
1700 netadm@amber:~ # ip -
6 -s -s ro flush cache
1701 3ffe:
2400::
220:afff:fef4:c5d1 via
3ffe:
2400::
220:afff:fef4:c5d1 \
1703 cache used
2 age
12sec mtu
1500 rtt
300
1704 3ffe:
2400::
280:adff:feb7:
8034 via
3ffe:
2400::
280:adff:feb7:
8034 \
1706 cache used
2 age
15sec mtu
1500 rtt
300
1707 3ffe:
2400::
280:c8ff:fe59:
5bcc via
3ffe:
2400::
280:c8ff:fe59:
5bcc \
1709 cache users
1 used
1 age
23sec mtu
1500 rtt
300
1710 3ffe:
2400:
0:
1:
2a0:ccff:fe66:
1878 via
3ffe:
2400:
0:
1:
2a0:ccff:fe66:
1878 \
1712 cache used
2 age
20sec mtu
1500 rtt
300
1713 3ffe:
2400:
0:
1:a00:
20ff:fe71:fb30 via
3ffe:
2400:
0:
1:a00:
20ff:fe71:fb30 \
1715 cache used
2 age
33sec mtu
1500 rtt
300
1716 ff02::
1 via ff02::
1 dev eth1 metric
0
1717 cache users
1 used
1 age
45sec mtu
1500 rtt
300
1719 *** Round
1, deleting
6 entries ***
1720 *** Flush is complete after
1 round ***
1721 netadm@amber:~ # ip -
6 -s -s ro flush cache
1726 The third example flushes BGP routing tables after a
\verb|gated|
1729 netadm@amber:~ # ip ro ls proto gated/bgp | wc
1731 netadm@amber:~ # ip -s ro f proto gated/bgp
1733 *** Round
1, deleting
1408 entries ***
1734 *** Flush is complete after
1 round ***
1735 netadm@amber:~ # ip ro f proto gated/bgp
1737 netadm@amber:~ # ip ro ls proto gated/bgp
1742 \subsection{{\tt ip route get
} --- get a single route
}
1743 \label{IP-ROUTE-GET
}
1745 \paragraph{Abbreviations:
} \verb|get|,
\verb|g|.
1747 \paragraph{Description:
} this command gets a single route to a destination
1748 and prints its contents exactly as the kernel sees it.
1750 \paragraph{Arguments:
}
1752 \item \verb|to ADDRESS| (default)
1754 --- the destination address.
1756 \item \verb|from ADDRESS|
1758 --- the source address.
1760 \item \verb|tos TOS| or
\verb|dsfield TOS|
1762 --- the Type Of Service.
1764 \item \verb|iif NAME|
1766 --- the device from which this packet is expected to arrive.
1768 \item \verb|oif NAME|
1770 --- force the output device on which this packet will be routed.
1772 \item \verb|connected|
1774 --- if no source address (option
\verb|from|) was given, relookup
1775 the route with the source set to the preferred address received from the first lookup.
1776 If policy routing is used, it may be a different route.
1780 Note that this operation is not equivalent to
\verb|ip route show|.
1781 \verb|show| shows existing routes.
\verb|get| resolves them and
1782 creates new clones if necessary. Essentially,
\verb|get|
1783 is equivalent to sending a packet along this path.
1784 If the
\verb|iif| argument is not given, the kernel creates a route
1785 to output packets towards the requested destination.
1786 This is equivalent to pinging the destination
1787 with a subsequent
{\tt ip route ls cache
}, however, no packets are
1788 actually sent. With the
\verb|iif| argument, the kernel pretends
1789 that a packet arrived from this interface and searches for
1790 a path to forward the packet.
1792 \paragraph{Output format:
} This command outputs routes in the same
1793 format as
\verb|ip route ls|.
1795 \paragraph{Examples:
}
1797 \item Find a route to output packets to
193.233.7.82:
1799 kuznet@amber:~ $ ip route get
193.233.7.82
1800 193.233.7.82 dev eth0 src
193.233.7.65 realms inr.ac
1801 cache mtu
1500 rtt
300
1805 \item Find a route to forward packets arriving on
\verb|eth0|
1806 from
193.233.7.82 and destined for
193.233.7.82:
1808 kuznet@amber:~ $ ip r g
193.233.7.82 from
193.233.7.82 iif eth0
1809 193.233.7.82 from
193.233.7.82 dev eth0 src
193.233.7.65 \
1810 realms inr.ac/inr.ac
1811 cache <src-direct,redirect> mtu
1500 rtt
300 iif eth0
1815 \label{NB-nature-of-strangeness
}
1816 This is the command that created the funny route from
193.233.7.82
1817 looped back to
193.233.7.82 (cf.\ NB on~p.
\pageref{NB-strange-route
}).
1818 Note the
\verb|redirect| flag on it.
1821 \item Find a multicast route for packets arriving on
\verb|eth0|
1822 from host
193.233.7.82 and destined for multicast group
224.2.127.254
1823 (it is assumed that a multicast routing daemon is running.
1824 In this case, it is
\verb|pimd|)
1826 kuznet@amber:~ $ ip r g
224.2.127.254 from
193.233.7.82 iif eth0
1827 multicast
224.2.127.254 from
193.233.7.82 dev lo \
1828 src
193.233.7.65 realms inr.ac/cosmos
1829 cache <mc> iif eth0 Oifs: eth1 pimreg
1832 This route differs from the ones seen before. It contains a ``normal'' part
1833 and a ``multicast'' part. The normal part is used to deliver (or not to
1834 deliver) the packet to local IP listeners. In this case the router
1836 of this group, so that route has no
\verb|local| flag and only
1837 forwards packets. The output device for such entries is always loopback.
1838 The multicast part consists of an additional
\verb|Oifs:| list showing
1839 the output interfaces.
1843 It is time for a more complicated example. Let us add an invalid
1844 gatewayed route for a destination which is really directly connected:
1846 netadm@alisa:~ # ip route add
193.233.7.98 via
193.233.7.254
1847 netadm@alisa:~ # ip route get
193.233.7.98
1848 193.233.7.98 via
193.233.7.254 dev eth0 src
193.233.7.90
1849 cache mtu
1500 rtt
3072
1852 and probe it with ping:
1854 netadm@alisa:~ # ping -n
193.233.7.98
1855 PING
193.233.7.98 (
193.233.7.98) from
193.233.7.90 :
56 data bytes
1856 From
193.233.7.254: Redirect Host(New nexthop:
193.233.7.98)
1857 64 bytes from
193.233.7.98: icmp_seq=
0 ttl=
255 time=
3.5 ms
1858 From
193.233.7.254: Redirect Host(New nexthop:
193.233.7.98)
1859 64 bytes from
193.233.7.98: icmp_seq=
1 ttl=
255 time=
2.2 ms
1860 64 bytes from
193.233.7.98: icmp_seq=
2 ttl=
255 time=
0.4 ms
1861 64 bytes from
193.233.7.98: icmp_seq=
3 ttl=
255 time=
0.4 ms
1862 64 bytes from
193.233.7.98: icmp_seq=
4 ttl=
255 time=
0.4 ms
1864 ---
193.233.7.98 ping statistics ---
1865 5 packets transmitted,
5 packets received,
0% packet loss
1866 round-trip min/avg/max =
0.4/
1.3/
3.5 ms
1869 What happened? Router
193.233.7.254 understood that we have a much
1870 better path to the destination and sent us an ICMP redirect message.
1871 We may retry
\verb|ip route get| to see what we have in the routing
1874 netadm@alisa:~ # ip route get
193.233.7.98
1875 193.233.7.98 dev eth0 src
193.233.7.90
1876 cache <redirected> mtu
1500 rtt
3072
1882 \section{{\tt ip rule
} --- routing policy database management
}
1885 \paragraph{Abbreviations:
} \verb|rule|,
\verb|ru|.
1887 \paragraph{Object:
} \verb|rule|s in the routing policy database control
1888 the route selection algorithm.
1890 Classic routing algorithms used in the Internet make routing decisions
1891 based only on the destination address of packets (and in theory,
1892 but not in practice, on the TOS field). The seminal review of classic
1893 routing algorithms and their modifications can be found in~
\cite{RFC1812
}.
1895 In some circumstances we want to route packets differently depending not only
1896 on destination addresses, but also on other packet fields: source address,
1897 IP protocol, transport protocol ports or even packet payload.
1898 This task is called ``policy routing''.
1901 ``policy routing'' $
\neq$ ``routing policy''.
1903 \noindent ``policy routing'' $=$ ``cunning routing''.
1905 \noindent ``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1908 To solve this task, the conventional destination based routing table, ordered
1909 according to the longest match rule, is replaced with a ``routing policy
1910 database'' (or RPDB), which selects routes
1911 by executing some set of rules. The rules may have lots of keys of different
1912 natures and therefore they have no natural ordering, but one imposed
1913 by the administrator. Linux-
2.2 RPDB is a linear list of rules
1914 ordered by numeric priority value.
1915 RPDB explicitly allows matching a few packet fields:
1918 \item packet source address.
1919 \item packet destination address.
1921 \item incoming interface (which is packet metadata, rather than a packet field).
1924 Matching IP protocols and transport ports is also possible,
1925 indirectly, via
\verb|ipchains|, by exploiting their ability
1926 to mark some classes of packets with
\verb|fwmark|. Therefore,
1927 \verb|fwmark| is also included in the set of keys checked by rules.
1929 Each policy routing rule consists of a
{\em selector\/
} and an
{\em action\/
}
1930 predicate. The RPDB is scanned in the order of increasing priority. The selector
1931 of each rule is applied to \
{source address, destination address, incoming
1932 interface, tos, fwmark\
} and, if the selector matches the packet,
1933 the action is performed. The action predicate may return with success.
1934 In this case, it will either give a route or failure indication
1935 and the RPDB lookup is terminated. Otherwise, the RPDB program
1936 continues on the next rule.
1938 What is the action, semantically? The natural action is to select the
1939 nexthop and the output device. This is what
1940 Cisco IOS~
\cite{IOS
} does. Let us call it ``match \& set''.
1941 The Linux-
2.2 approach is more flexible. The action includes
1942 lookups in destination-based routing tables and selecting
1943 a route from these tables according to the classic longest match algorithm.
1944 The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1945 when a second level routing table contains a single default route.
1946 Recall that Linux-
2.2 supports multiple tables
1947 managed with the
\verb|ip route| command, described in the previous section.
1949 At startup time the kernel configures the default RPDB consisting of three
1953 \item Priority:
0, Selector: match anything, Action: lookup routing
1954 table
\verb|local| (ID
255).
1955 The
\verb|local| table is a special routing table containing
1956 high priority control routes for local and broadcast addresses.
1958 Rule
0 is special. It cannot be deleted or overridden.
1961 \item Priority:
32766, Selector: match anything, Action: lookup routing
1962 table
\verb|main| (ID
254).
1963 The
\verb|main| table is the normal routing table containing all non-policy
1964 routes. This rule may be deleted and/or overridden with other
1965 ones by the administrator.
1967 \item Priority:
32767, Selector: match anything, Action: lookup routing
1968 table
\verb|default| (ID
253).
1969 The
\verb|default| table is empty. It is reserved for some
1970 post-processing if no previous default rules selected the packet.
1971 This rule may also be deleted.
1975 Do not confuse routing tables with rules: rules point to routing tables,
1976 several rules may refer to one routing table and some routing tables
1977 may have no rules pointing to them. If the administrator deletes all the rules
1978 referring to a table, the table is not used, but it still exists
1979 and will disappear only after all the routes contained in it are deleted.
1982 \paragraph{Rule attributes:
} Each RPDB entry has additional
1983 attributes. F.e.\ each rule has a pointer to some routing
1984 table. NAT and masquerading rules have an attribute to select new IP
1985 address to translate/masquerade. Besides that, rules have some
1986 optional attributes, which routes have, namely
\verb|realms|.
1987 These values do not override those contained in the routing tables. They
1988 are only used if the route did not select any attributes.
1991 \paragraph{Rule types:
} The RPDB may contain rules of the following
1994 \item \verb|unicast| --- the rule prescribes to return the route found
1995 in the routing table referenced by the rule.
1996 \item \verb|blackhole| --- the rule prescribes to silently drop the packet.
1997 \item \verb|unreachable| --- the rule prescribes to generate a ``Network
1998 is unreachable'' error.
1999 \item \verb|prohibit| --- the rule prescribes to generate
2000 ``Communication is administratively prohibited'' error.
2001 \item \verb|nat| --- the rule prescribes to translate the source address
2002 of the IP packet into some other value. More about NAT is
2003 in Appendix~
\ref{ROUTE-NAT
}, p.
\pageref{ROUTE-NAT
}.
2007 \paragraph{Commands:
} \verb|add|,
\verb|delete| and
\verb|show|
2010 \subsection{{\tt ip rule add
} --- insert a new rule\\
2011 {\tt ip rule delete
} --- delete a rule
}
2014 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|delete|,
\verb|del|,
2017 \paragraph{Arguments:
}
2020 \item \verb|type TYPE| (default)
2022 --- the type of this rule. The list of valid types was given in the previous
2025 \item \verb|from PREFIX|
2027 --- select the source prefix to match.
2029 \item \verb|to PREFIX|
2031 --- select the destination prefix to match.
2033 \item \verb|iif NAME|
2035 --- select the incoming device to match. If the interface is loopback,
2036 the rule only matches packets originating from this host. This means that you
2037 may create separate routing tables for forwarded and local packets and,
2038 hence, completely segregate them.
2040 \item \verb|tos TOS| or
\verb|dsfield TOS|
2042 --- select the TOS value to match.
2044 \item \verb|fwmark MARK|
2046 --- select the
\verb|fwmark| value to match.
2048 \item \verb|priority PREFERENCE|
2050 --- the priority of this rule. Each rule should have an explicitly
2051 set
{\em unique\/
} priority value.
2053 Really, for historical reasons
\verb|ip rule add| does not require a
2054 priority value and allows them to be non-unique.
2055 If the user does not supplied a priority, it is selected by the kernel.
2056 If the user creates a rule with a priority value that
2057 already exists, the kernel does not reject the request. It adds
2058 the new rule before all old rules of the same priority.
2060 It is mistake in design, no more. And it will be fixed one day,
2061 so do not rely on this feature. Use explicit priorities.
2065 \item \verb|table TABLEID|
2067 --- the routing table identifier to lookup if the rule selector matches.
2069 \item \verb|realms FROM/TO|
2071 --- Realms to select if the rule matched and the routing table lookup
2072 succeeded. Realm
\verb|TO| is only used if the route did not select
2075 \item \verb|nat ADDRESS|
2077 --- The base of the IP address block to translate (for source addresses).
2078 The
\verb|ADDRESS| may be either the start of the block of NAT addresses
2079 (selected by NAT routes) or in linux-
2.2 a local host address (or even zero).
2080 In the last case the router does not translate the packets,
2081 but masquerades them to this address; this feature disappered in
2.4.
2082 More about NAT is in Appendix~
\ref{ROUTE-NAT
},
2083 p.
\pageref{ROUTE-NAT
}.
2087 \paragraph{Warning:
} Changes to the RPDB made with these commands
2088 do not become active immediately. It is assumed that after
2089 a script finishes a batch of updates, it flushes the routing cache
2090 with
\verb|ip route flush cache|.
2092 \paragraph{Examples:
}
2094 \item Route packets with source addresses from
192.203.80/
24
2095 according to routing table
\verb|inr.ruhep|:
2097 ip ru add from
192.203.80.0/
24 table inr.ruhep prio
220
2100 \item Translate packet source address
193.233.7.83 into
192.203.80.144
2101 and route it according to table \
#1 (actually, it is
\verb|inr.ruhep|):
2103 ip ru add from
193.233.7.83 nat
192.203.80.144 table
1 prio
320
2106 \item Delete the unused default rule:
2108 ip ru del prio
32767
2115 \subsection{{\tt ip rule show
} --- list rules
}
2116 \label{IP-RULE-SHOW
}
2118 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2121 \paragraph{Arguments:
} Good news, this is one command that has no arguments.
2123 \paragraph{Output format:
}
2126 kuznet@amber:~ $ ip ru ls
2127 0: from all lookup local
2128 200: from
192.203.80.0/
24 to
193.233.7.0/
24 lookup main
2129 210: from
192.203.80.0/
24 to
192.203.80.0/
24 lookup main
2130 220: from
192.203.80.0/
24 lookup inr.ruhep realms inr.ruhep/radio-msu
2131 300: from
193.233.7.83 to
193.233.7.0/
24 lookup main
2132 310: from
193.233.7.83 to
192.203.80.0/
24 lookup main
2133 320: from
193.233.7.83 lookup inr.ruhep map-to
192.203.80.144
2134 32766: from all lookup main
2138 In the first column is the rule priority value followed
2139 by a colon. Then the selectors follow. Each key is prefixed
2140 with the same keyword that was used to create the rule.
2142 The keyword
\verb|lookup| is followed by a routing table identifier,
2143 as it is recorded in the file
\verb|/etc/iproute2/rt_tables|.
2145 If the rule does NAT (f.e.\ rule \
#320), it is shown by the keyword
2146 \verb|map-to| followed by the start of the block of addresses to map.
2148 The sense of this example is pretty simple. The prefixes
2149 192.203.80.0/
24 and
193.233.7.0/
24 form the internal network, but
2150 they are routed differently when the packets leave it.
2151 Besides that, the host
193.233.7.83 is translated into
2152 another prefix to look like
192.203.80.144 when talking
2157 \section{{\tt ip maddress
} --- multicast addresses management
}
2160 \paragraph{Object:
} \verb|maddress| objects are multicast addresses.
2162 \paragraph{Commands:
} \verb|add|,
\verb|delete|,
\verb|show| (or
\verb|list|).
2164 \subsection{{\tt ip maddress show
} --- list multicast addresses
}
2166 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2168 \paragraph{Arguments:
}
2172 \item \verb|dev NAME| (default)
2174 --- the device name.
2178 \paragraph{Output format:
}
2181 kuznet@alisa:~ $ ip maddr ls dummy
2183 link
33:
33:
00:
00:
00:
01
2184 link
01:
00:
5e:
00:
00:
01
2185 inet
224.0.0.1 users
2
2190 The first line of the output shows the interface index and its name.
2191 Then the multicast address list follows. Each line starts with the
2192 protocol identifier. The word
\verb|link| denotes a link layer
2193 multicast addresses.
2195 If a multicast address has more than one user, the number
2196 of users is shown after the
\verb|users| keyword.
2198 One additional feature not present in the example above
2199 is the
\verb|static| flag, which indicates that the address was joined
2200 with
\verb|ip maddr add|. See the following subsection.
2204 \subsection{{\tt ip maddress add
} --- add a multicast address\\
2205 {\tt ip maddress delete
} --- delete a multicast address
}
2207 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|delete|,
\verb|del|,
\verb|d|.
2209 \paragraph{Description:
} these commands attach/detach
2210 a static link layer multicast address to listen on the interface.
2211 Note that it is impossible to join protocol multicast groups
2212 statically. This command only manages link layer addresses.
2215 \paragraph{Arguments:
}
2218 \item \verb|address LLADDRESS| (default)
2220 --- the link layer multicast address.
2222 \item \verb|dev NAME|
2224 --- the device to join/leave this multicast address.
2229 \paragraph{Example:
} Let us continue with the example from the previous subsection.
2232 netadm@alisa:~ # ip maddr add
33:
33:
00:
00:
00:
01 dev dummy
2233 netadm@alisa:~ # ip -
0 maddr ls dummy
2235 link
33:
33:
00:
00:
00:
01 users
2 static
2236 link
01:
00:
5e:
00:
00:
01
2237 netadm@alisa:~ # ip maddr del
33:
33:
00:
00:
00:
01 dev dummy
2241 Neither
\verb|ip| nor the kernel check for multicast address validity.
2242 Particularly, this means that you can try to load a unicast address
2243 instead of a multicast address. Most drivers will ignore such addresses,
2244 but several (f.e.\ Tulip) will intern it to their on-board filter.
2245 The effects may be strange. Namely, the addresses become additional
2246 local link addresses and, if you loaded the address of another host
2247 to the router, wait for duplicated packets on the wire.
2248 It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2249 This feature is really more useful for traffic monitoring, but using it
2250 with Linux-
2.2 you
{\em have to\/
} be sure that the host is not
2251 a router and, especially, that it is not a transparent proxy or masquerading
2257 \section{{\tt ip mroute
} --- multicast routing cache management
}
2260 \paragraph{Abbreviations:
} \verb|mroute|,
\verb|mr|.
2262 \paragraph{Object:
} \verb|mroute| objects are multicast routing cache
2263 entries created by a user level mrouting daemon
2264 (f.e.\
\verb|pimd| or
\verb|mrouted|).
2266 Due to the limitations of the current interface to the multicast routing
2267 engine, it is impossible to change
\verb|mroute| objects administratively,
2268 so we may only display them. This limitation will be removed
2271 \paragraph{Commands:
} \verb|show| (or
\verb|list|).
2274 \subsection{{\tt ip mroute show
} --- list mroute cache entries
}
2276 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2278 \paragraph{Arguments:
}
2281 \item \verb|to PREFIX| (default)
2283 --- the prefix selecting the destination multicast addresses to list.
2286 \item \verb|iif NAME|
2288 --- the interface on which multicast packets are received.
2291 \item \verb|from PREFIX|
2293 --- the prefix selecting the IP source addresses of the multicast route.
2298 \paragraph{Output format:
}
2301 kuznet@amber:~ $ ip mroute ls
2302 (
193.232.127.6,
224.0.1.39) Iif: unresolved
2303 (
193.232.244.34,
224.0.1.40) Iif: unresolved
2304 (
193.233.7.65,
224.66.66.66) Iif: eth0 Oifs: pimreg
2308 Each line shows one (S,G) entry in the multicast routing cache,
2309 where S is the source address and G is the multicast group.
\verb|Iif| is
2310 the interface on which multicast packets are expected to arrive.
2311 If the word
\verb|unresolved| is there instead of the interface name,
2312 it means that the routing daemon still hasn't resolved this entry.
2313 The keyword
\verb|oifs| is followed by a list of output interfaces, separated
2314 by spaces. If a multicast routing entry is created with non-trivial
2315 TTL scope, administrative distances are appended to the device names
2316 in the
\verb|oifs| list.
2318 \paragraph{Statistics:
} The
\verb|-statistics| option also prints the
2319 number of packets and bytes forwarded along this route and
2320 the number of packets that arrived on the wrong interface, if this number is not zero.
2323 kuznet@amber:~ $ ip -s mr ls
224.66/
16
2324 (
193.233.7.65,
224.66.66.66) Iif: eth0 Oifs: pimreg
2325 9383 packets,
300256 bytes
2330 \section{{\tt ip tunnel
} --- tunnel configuration
}
2333 \paragraph{Abbreviations:
} \verb|tunnel|,
\verb|tunl|.
2335 \paragraph{Object:
} \verb|tunnel| objects are tunnels, encapsulating
2336 packets in IPv4 packets and then sending them over the IP infrastructure.
2338 \paragraph{Commands:
} \verb|add|,
\verb|delete|,
\verb|change|,
\verb|show|
2341 \paragraph{See also:
} A more informal discussion of tunneling
2342 over IP and the
\verb|ip tunnel| command can be found in~
\cite{IP-TUNNELS
}.
2344 \subsection{{\tt ip tunnel add
} --- add a new tunnel\\
2345 {\tt ip tunnel change
} --- change an existing tunnel\\
2346 {\tt ip tunnel delete
} --- destroy a tunnel
}
2348 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|change|,
\verb|chg|;
2349 \verb|delete|,
\verb|del|,
\verb|d|.
2352 \paragraph{Arguments:
}
2356 \item \verb|name NAME| (default)
2358 --- select the tunnel device name.
2360 \item \verb|mode MODE|
2362 --- set the tunnel mode. Three modes are currently available:
2363 \verb|ipip|,
\verb|sit| and
\verb|gre|.
2365 \item \verb|remote ADDRESS|
2367 --- set the remote endpoint of the tunnel.
2369 \item \verb|local ADDRESS|
2371 --- set the fixed local address for tunneled packets.
2372 It must be an address on another interface of this host.
2376 --- set a fixed TTL
\verb|N| on tunneled packets.
2377 \verb|N| is a number in the range
1--
255.
0 is a special value
2378 meaning that packets inherit the TTL value.
2379 The default value is:
\verb|inherit|.
2381 \item \verb|tos T| or
\verb|dsfield T|
2383 --- set a fixed TOS
\verb|T| on tunneled packets.
2384 The default value is:
\verb|inherit|.
2388 \item \verb|dev NAME|
2390 --- bind the tunnel to the device
\verb|NAME| so that
2391 tunneled packets will only be routed via this device and will
2392 not be able to escape to another device when the route to endpoint changes.
2394 \item \verb|nopmtudisc|
2396 --- disable Path MTU Discovery on this tunnel.
2397 It is enabled by default. Note that a fixed ttl is incompatible
2398 with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2400 \item \verb|key K|,
\verb|ikey K|,
\verb|okey K|
2402 --- (only GRE tunnels) use keyed GRE with key
\verb|K|.
\verb|K| is
2403 either a number or an IP address-like dotted quad.
2404 The
\verb|key| parameter sets the key to use in both directions.
2405 The
\verb|ikey| and
\verb|okey| parameters set different keys for input and output.
2408 \item \verb|csum|,
\verb|icsum|,
\verb|ocsum|
2410 --- (only GRE tunnels) generate/require checksums for tunneled packets.
2411 The
\verb|ocsum| flag calculates checksums for outgoing packets.
2412 The
\verb|icsum| flag requires that all input packets have the correct
2413 checksum. The
\verb|csum| flag is equivalent to the combination
2414 ``
\verb|icsum|
\verb|ocsum|''.
2416 \item \verb|seq|,
\verb|iseq|,
\verb|oseq|
2418 --- (only GRE tunnels) serialize packets.
2419 The
\verb|oseq| flag enables sequencing of outgoing packets.
2420 The
\verb|iseq| flag requires that all input packets are serialized.
2421 The
\verb|seq| flag is equivalent to the combination ``
\verb|iseq|
\verb|oseq|''.
2424 I think this option does not
2425 work. At least, I did not test it, did not debug it and
2426 do not even understand how it is supposed to work or for what
2427 purpose Cisco planned to use it. Do not use it.
2433 \paragraph{Example:
} Create a pointopoint IPv6 tunnel with maximal TTL of
32.
2435 netadm@amber:~ # ip tunl add Cisco mode sit remote
192.31.7.104 \
2436 local
192.203.80.142 ttl
32
2439 \subsection{{\tt ip tunnel show
} --- list tunnels
}
2441 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2444 \paragraph{Arguments:
} None.
2446 \paragraph{Output format:
}
2448 kuznet@amber:~ $ ip tunl ls Cisco
2449 Cisco: ipv6/ip remote
192.31.7.104 local
192.203.80.142 ttl
32
2452 The line starts with the tunnel device name followed by a colon.
2453 Then the tunnel mode follows. The parameters of the tunnel are listed
2454 with the same keywords that were used when creating the tunnel.
2456 \paragraph{Statistics:
}
2459 kuznet@amber:~ $ ip -s tunl ls Cisco
2460 Cisco: ipv6/ip remote
192.31.7.104 local
192.203.80.142 ttl
32
2461 RX: Packets Bytes Errors CsumErrs OutOfSeq Mcasts
2462 12566 1707516 0 0 0 0
2463 TX: Packets Bytes Errors DeadLoop NoRoute NoBufs
2464 13445 1879677 0 0 0 0
2467 Essentially, these numbers are the same as the numbers
2468 printed with
{\tt ip -s link show
}
2469 (sec.
\ref{IP-LINK-SHOW
}, p.
\pageref{IP-LINK-SHOW
}) but the tags are different
2470 to reflect that they are tunnel specific.
2472 \item \verb|CsumErrs| --- the total number of packets dropped
2473 because of checksum failures for a GRE tunnel with checksumming enabled.
2474 \item \verb|OutOfSeq| --- the total number of packets dropped
2475 because they arrived out of sequence for a GRE tunnel with
2476 serialization enabled.
2477 \item \verb|Mcasts| --- the total number of multicast packets
2478 received on a broadcast GRE tunnel.
2479 \item \verb|DeadLoop| --- the total number of packets which were not
2480 transmitted because the tunnel is looped back to itself.
2481 \item \verb|NoRoute| --- the total number of packets which were not
2482 transmitted because there is no IP route to the remote endpoint.
2483 \item \verb|NoBufs| --- the total number of packets which were not
2484 transmitted because the kernel failed to allocate a buffer.
2488 \section{{\tt ip monitor
} and
{\tt rtmon
} --- state monitoring
}
2491 The
\verb|ip| utility can monitor the state of devices, addresses
2492 and routes continuously. This option has a slightly different format.
2494 the
\verb|monitor| command is the first in the command line and then
2495 the object list follows:
2497 ip monitor
[ file FILE
] [ all | OBJECT-LIST
]
2499 \verb|OBJECT-LIST| is the list of object types that we want to monitor.
2500 It may contain
\verb|link|,
\verb|address| and
\verb|route|.
2501 If no
\verb|file| argument is given,
\verb|ip| opens RTNETLINK,
2502 listens on it and dumps state changes in the format described
2503 in previous sections.
2505 If a file name is given, it does not listen on RTNETLINK,
2506 but opens the file containing RTNETLINK messages saved in binary format
2507 and dumps them. Such a history file can be generated with the
2508 \verb|rtmon| utility. This utility has a command line syntax similar to
2510 Ideally,
\verb|rtmon| should be started before
2511 the first network configuration command is issued. F.e.\ if
2514 rtmon file /var/log/rtmon.log
2516 in a startup script, you will be able to view the full history
2519 Certainly, it is possible to start
\verb|rtmon| at any time.
2520 It prepends the history with the state snapshot dumped at the moment
2524 \section{Route realms and policy propagation,
{\tt rtacct
}}
2527 On routers using OSPF ASE or, especially, the BGP protocol, routing
2528 tables may be huge. If we want to classify or to account for the packets
2529 per route, we will have to keep lots of information. Even worse, if we
2530 want to distinguish the packets not only by their destination, but
2531 also by their source, the task gets quadratic complexity and its solution
2532 is physically impossible.
2534 One approach to propagating the policy from routing protocols
2535 to the forwarding engine has been proposed in~
\cite{IOS-BGP-PP
}.
2536 Essentially, Cisco Policy Propagation via BGP is based on the fact
2537 that dedicated routers all have the RIB (Routing Information Base)
2538 close to the forwarding engine, so policy routing rules can
2539 check all the route attributes, including ASPATH information
2540 and community strings.
2542 The Linux architecture, splitting the RIB (maintained by a user level
2543 daemon) and the kernel based FIB (Forwarding Information Base),
2544 does not allow such a simple approach.
2546 It is to our fortune because there is another solution
2547 which allows even more flexible policy and richer semantics.
2549 Namely, routes can be clustered together in user space, based on their
2550 attributes. F.e.\ a BGP router knows route ASPATH, its community;
2551 an OSPF router knows the route tag or its area. The administrator, when adding
2552 routes manually, also knows their nature. Providing that the number of such
2553 aggregates (we call them
{\em realms\/
}) is low, the task of full
2554 classification both by source and destination becomes quite manageable.
2556 So each route may be assigned to a realm. It is assumed that
2557 this identification is made by a routing daemon, but static routes
2558 can also be handled manually with
\verb|ip route| (see sec.
\ref{IP-ROUTE
},
2559 p.
\pageref{IP-ROUTE
}).
2561 There is a patch to
\verb|gated|, allowing classification of routes
2562 to realms with all the set of policy rules implemented in
\verb|gated|:
2563 by prefix, by ASPATH, by origin, by tag etc.
2566 To facilitate the construction (f.e.\ in case the routing
2567 daemon is not aware of realms), missing realms may be completed
2568 with routing policy rules, see sec.~
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}.
2570 For each packet the kernel calculates a tuple of realms: source realm
2571 and destination realm, using the following algorithm:
2574 \item If the route has a realm, the destination realm of the packet is set to it.
2575 \item If the rule has a source realm, the source realm of the packet is set to it.
2576 If the destination realm was not inherited from the route and the rule has a destination realm,
2578 \item If at least one of the realms is still unknown, the kernel finds
2579 the reversed route to the source of the packet.
2580 \item If the source realm is still unknown, get it from the reversed route.
2581 \item If one of the realms is still unknown, swap the realms of reversed
2582 routes and apply step
2 again.
2585 After this procedure is completed we know what realm the packet
2586 arrived from and the realm where it is going to propagate to.
2587 If some of the realms are unknown, they are initialized to zero
2588 (or realm
\verb|unknown|).
2590 The main application of realms is the TC
\verb|route| classifier~
\cite{TC-CREF
},
2591 where they are used to help assign packets to traffic classes,
2592 to account, police and schedule them according to this
2595 A much simpler but still very useful application is incoming packet
2596 accounting by realms. The kernel gathers a packet statistics summary
2597 which can be viewed with the
\verb|rtacct| utility.
2599 kuznet@amber:~ $ rtacct russia
2600 Realm BytesTo PktsTo BytesFrom PktsFrom
2601 russia
20576778 169176 47080168 153805
2604 This shows that this router received
153805 packets from
2605 the realm
\verb|russia| and forwarded
169176 packets to
\verb|russia|.
2606 The realm
\verb|russia| consists of routes with ASPATHs not leaving
2609 Note that locally originating packets are not accounted here,
2610 \verb|rtacct| shows incoming packets only. Using the
\verb|route|
2611 classifier (see~
\cite{TC-CREF
}) you can get even more detailed
2612 accounting information about outgoing packets, optionally
2613 summarizing traffic not only by source or destination, but
2614 by any pair of source and destination realms.
2617 \begin{thebibliography
}{99}
2618 \addcontentsline{toc
}{section
}{References
}
2619 \bibitem{RFC-NDISC
} T.~Narten, E.~Nordmark, W.~Simpson.
2620 ``Neighbor Discovery for IP Version
6 (IPv6)'', RFC-
2461.
2622 \bibitem{RFC-ADDRCONF
} S.~Thomson, T.~Narten.
2623 ``IPv6 Stateless Address Autoconfiguration'', RFC-
2462.
2625 \bibitem{RFC1812
} F.~Baker.
2626 ``Requirements for IP Version
4 Routers'', RFC-
1812.
2628 \bibitem{RFC1122
} R.~T.~Braden.
2629 ``Requirements for Internet hosts --- communication layers'', RFC-
1122.
2631 \bibitem{IOS
} ``Cisco IOS Release
12.0 Network Protocols
2632 Command Reference, Part
1'' and
2633 ``Cisco IOS Release
12.0 Quality of Service Solutions
2634 Configuration Guide: Configuring Policy-Based Routing'',\\
2635 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2637 \bibitem{IP-TUNNELS
} A.~N.~Kuznetsov.
2638 ``Tunnels over IP in Linux-
2.2'', \\
2639 In:
{\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz
}.
2641 \bibitem{TC-CREF
} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2642 In:
{\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz
}.
2644 \bibitem{IOS-BGP-PP
} ``Cisco IOS Release
12.0 Quality of Service Solutions
2645 Configuration Guide: Configuring QoS Policy Propagation via
2646 Border Gateway Protocol'',\\
2647 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2649 \bibitem{RFC-DHCP
} R.~Droms.
2650 ``Dynamic Host Configuration Protocol.'', RFC-
2131
2652 \end{thebibliography
}
2658 \addcontentsline{toc
}{section
}{Appendix
}
2660 \section{Source address selection
}
2663 When a host creates an IP packet, it must select some source
2664 address. Correct source address selection is a critical procedure,
2665 because it gives the receiver the information needed to deliver a
2666 reply. If the source is selected incorrectly, in the best case,
2667 the backward path may appear different to the forward one which
2668 is harmful for performance. In the worst case, when the addresses
2669 are administratively scoped, the reply may be lost entirely.
2671 Linux-
2.2 selects source addresses using the following algorithm:
2675 The application may select a source address explicitly with
\verb|bind(
2)|
2676 syscall or supplying it to
\verb|sendmsg(
2)| via the ancillary data object
2677 \verb|IP_PKTINFO|. In this case the kernel only checks the validity
2678 of the address and never tries to ``improve'' an incorrect user choice,
2679 generating an error instead.
2681 Never say ``Never''. The sysctl option
\verb|ip_dynaddr| breaks
2682 this axiom. It has been made deliberately with the purpose
2683 of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2684 However, this hack
{\em must not\/
} be used on multihomed hosts
2685 and especially on routers: it would break them.
2689 \item Otherwise, IP routing tables can contain an explicit source
2690 address hint for this destination. The hint is set with the
\verb|src| parameter
2691 to the
\verb|ip route| command, sec.
\ref{IP-ROUTE
}, p.
\pageref{IP-ROUTE
}.
2694 \item Otherwise, the kernel searches through the list of addresses
2695 attached to the interface through which the packets will be routed.
2696 The search strategies are different for IP and IPv6. Namely:
2699 \item IPv6 searches for the first valid, not deprecated address
2700 with the same scope as the destination.
2702 \item IP searches for the first valid address with a scope wider
2703 than the scope of the destination but it prefers addresses
2704 which fall to the same subnet as the nexthop of the route
2705 to the destination. Unlike IPv6, the scopes of IPv4 destinations
2706 are not encoded in their addresses but are supplied
2707 in routing tables instead (the
\verb|scope| parameter to the
\verb|ip route| command,
2708 sec.
\ref{IP-ROUTE
}, p.
\pageref{IP-ROUTE
}).
2713 \item Otherwise, if the scope of the destination is
\verb|link| or
\verb|host|,
2714 the algorithm fails and returns a zero source address.
2716 \item Otherwise, all interfaces are scanned to search for an address
2717 with an appropriate scope. The loopback device
\verb|lo| is always the first
2718 in the search list, so that if an address with global scope (not
127.0.0.1!)
2719 is configured on loopback, it is always preferred.
2724 \section{Proxy ARP/NDISC
}
2727 Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2728 In Linux-
2.2 proxy ARP on an interface may be enabled
2729 by setting the kernel
\verb|sysctl| variable
2730 \verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to
1. After this, the router
2731 starts to answer ARP requests on the interface
\verb|<dev>|, provided
2732 the route to the requested destination does
{\em not\/
} go back via the same
2735 The variable
\verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2736 ARP on all the IP devices.
2738 However, this approach fails in the case of IPv6 because the router
2739 must join the solicited node multicast address to listen for the corresponding
2740 NDISC queries. It means that proxy NDISC is possible only on a per destination
2743 Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2744 in user space. However, similar functionality was present in BSD kernels
2745 and in Linux-
2.0, so we have to preserve it at least to the extent that
2746 is standardized in BSD.
2748 Linux-
2.0 ARP had a feature called
{\em subnet\/
} proxy ARP.
2749 It is replaced with the sysctl flag in Linux-
2.2.
2753 The
\verb|ip| utility provides a way to manage proxy ARP/NDISC
2754 with the
\verb|ip neigh| command, namely:
2756 ip neigh add proxy ADDRESS
[ dev NAME
]
2758 adds a new proxy ARP/NDISC record and
2760 ip neigh del proxy ADDRESS
[ dev NAME
]
2764 If the name of the device is not given, the router will answer solicitations
2765 for address
\verb|ADDRESS| on all devices, otherwise it will only serve
2766 the device
\verb|NAME|. Even if the proxy entry is created with
2767 \verb|ip neigh|, the router
{\em will not\/
} answer a query if the route
2768 to the destination goes back via the interface from which the solicitation
2771 It is important to emphasize that proxy entries have
{\em no\/
}
2772 parameters other than these (IP/IPv6 address and optional device).
2773 Particularly, the entry does not store any link layer address.
2774 It always advertises the station address of the interface
2775 on which it sends advertisements (i.e. it's own station address).
2777 \section{Route NAT status
}
2780 NAT (or ``Network Address Translation'') remaps some parts
2781 of the IP address space into other ones. Linux-
2.2 route NAT is supposed
2782 to be used to facilitate policy routing by rewriting addresses
2783 to other routing domains or to help while renumbering sites
2786 \paragraph{What it is not:
}
2787 It is necessary to emphasize that
{\em it is not supposed\/
}
2788 to be used to compress address space or to split load.
2789 This is not missing functionality but a design principle.
2790 Route NAT is
{\em stateless\/
}. It does not hold any state
2791 about translated sessions. This means that it handles any number
2792 of sessions flawlessly. But it also means that it is
{\em static\/
}.
2793 It cannot detect the moment when the last TCP client stops
2794 using an address. For the same reason, it will not help to split
2795 load between several servers.
2797 It is a pretty commonly held belief that it is useful to split load between
2798 several servers with NAT. This is a mistake. All you get from this
2799 is the requirement that the router keep the state of all the TCP connections
2800 going via it. Well, if the router is so powerful, run apache on it.
8)
2803 The second feature: it does not touch packet payload,
2804 does not try to ``improve'' broken protocols by looking
2805 through its data and mangling it. It mangles IP addresses,
2806 only IP addresses and nothing but IP addresses.
2807 This also, is not missing any functionality.
2809 To resume: if you need to compress address space or keep
2810 active FTP clients happy, your choice is not route NAT but masquerading,
2811 port forwarding, NAPT etc.
2813 By the way, you may also want to look at
2814 http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
2818 \paragraph{How it works.
}
2819 Some part of the address space is reserved for dummy addresses
2820 which will look for all the world like some host addresses
2821 inside your network. No other hosts may use these addresses,
2822 however other routers may also be configured to translate them.
2824 A great advantage of route NAT is that it may be used not
2825 only in stub networks but in environments with arbitrarily complicated
2826 structure. It does not firewall, it
{\em forwards.
}
2828 These addresses are selected by the
\verb|ip route| command
2829 (sec.
\ref{IP-ROUTE-ADD
}, p.
\pageref{IP-ROUTE-ADD
}). F.e.\
2831 ip route add nat
192.203.80.144 via
193.233.7.83
2833 states that the single address
192.203.80.144 is a dummy NAT address.
2834 For all the world it looks like a host address inside our network.
2835 For neighbouring hosts and routers it looks like the local address
2836 of the translating router. The router answers ARP for it, advertises
2837 this address as routed via it,
{\em et al\/
}. When the router
2838 receives a packet destined for
192.203.80.144, it replaces
2839 this address with
193.233.7.83 which is the address of some real
2840 host and forwards the packet. If you need to remap
2841 blocks of addresses, you may use a command like:
2843 ip route add nat
192.203.80.192/
26 via
193.233.7.64
2845 This command will map a block of
63 addresses
192.203.80.192-
255 to
2848 When an internal host (
193.233.7.83 in the example above)
2849 sends something to the outer world and these packets are forwarded
2850 by our router, it should translate the source address
193.233.7.83
2851 into
192.203.80.144. This task is solved by setting a special
2852 policy rule (sec.
\ref{IP-RULE-ADD
}, p.
\pageref{IP-RULE-ADD
}):
2854 ip rule add prio
320 from
193.233.7.83 nat
192.203.80.144
2856 This rule says that the source address
193.233.7.83
2857 should be translated into
192.203.80.144 before forwarding.
2858 It is important that the address after the
\verb|nat| keyword
2859 is some NAT address, declared by
{\tt ip route add nat
}.
2860 If it is just a random address the router will not map to it.
2862 The exception is when the address is a local address of this
2863 router (or
0.0.0.0) and masquerading is configured in the linux-
2.2
2864 kernel. In this case the router will masquerade the packets as this address.
2865 If
0.0.0.0 is selected, the result is equivalent to one
2866 obtained with firewalling rules. Otherwise, you have the way
2867 to order Linux to masquerade to this fixed address.
2868 NAT mechanism used in linux-
2.4 is more flexible than
2869 masquerading, so that this feature has lost meaning and disabled.
2872 If the network has non-trivial internal structure, it is
2873 useful and even necessary to add rules disabling translation
2874 when a packet does not leave this network. Let us return to the
2875 example from sec.
\ref{IP-RULE-SHOW
} (p.
\pageref{IP-RULE-SHOW
}).
2877 300: from
193.233.7.83 to
193.233.7.0/
24 lookup main
2878 310: from
193.233.7.83 to
192.203.80.0/
24 lookup main
2879 320: from
193.233.7.83 lookup inr.ruhep map-to
192.203.80.144
2881 This block of rules causes normal forwarding when
2882 packets from
193.233.7.83 do not leave networks
193.233.7/
24
2883 and
192.203.80/
24. Also, if the
\verb|inr.ruhep| table does not
2884 contain a route to the destination (which means that the routing
2885 domain owning addresses from
192.203.80/
24 is dead), no translation
2886 will occur. Otherwise, the packets are translated.
2888 \paragraph{How to only translate selected ports:
}
2889 If you only want to translate selected ports (f.e.\ http)
2890 and leave the rest intact, you may use
\verb|ipchains|
2891 to
\verb|fwmark| a class of packets.
2892 Suppose you did and all the packets from
193.233.7.83
2893 destined for port
80 are marked with marker
0x1234 in input fwchain.
2894 In this case you may replace rule \
#320 with:
2896 320: from
193.233.7.83 fwmark
1234 lookup main map-to
192.203.80.144
2898 and translation will only be enabled for outgoing http requests.
2900 \section{Example: minimal host setup
}
2901 \label{EXAMPLE-SETUP
}
2903 The following script gives an example of a fault safe
2904 setup of IP (and IPv6, if it is compiled into the kernel)
2905 in the common case of a node attached to a single broadcast
2906 network. A more advanced script, which may be used both on multihomed
2907 hosts and on routers, is described in the following
2910 The utilities used in the script may be found in the
2911 directory ftp://ftp.inr.ac.ru/ip-routing/:
2913 \item \verb|ip| --- package
\verb|iproute2|.
2914 \item \verb|arping| --- package
\verb|iputils|.
2915 \item \verb|rdisc| --- package
\verb|iputils|.
2918 It also refers to a DHCP client,
\verb|dhcpcd|. I should refrain from
2919 recommending a good DHCP client to use. All that I can
2920 say is that ISC
\verb|dhcp-
2.0b1pl6| patched with the patch that
2921 can be found in the
\verb|dhcp.bootp.rarp| subdirectory of
2922 the same ftp site
{\em does\/
} work,
2923 at least on Ethernet and Token Ring.
2930 \#
{\bf Usage:
\verb|ifone ADDRESS
[/PREFIX-LENGTH
] [DEVICE
]|
}\\
2931 \#
{\bf Parameters:
}\\
2932 \# \$
1 --- Static IP address, optionally followed by prefix length.\\
2933 \# \$
2 --- Device name. If it is missing,
\verb|eth0| is asssumed.\\
2934 \# F.e.
\verb|ifone
193.233.7.90|
2941 \# Parse IP address, splitting prefix length.
2943 if
[ "$
1" != ""
]; then
2945 if
[ "$
1" != "$ipaddr"
]; then
2950 pfx="$
{ipaddr
}/$
{pfxlen
}"
2954 \#
{\bf Step
0} --- enable loopback.\\
2956 \# This step is necessary on any networked box before attempt\\
2957 \# to configure any other device.\\
2960 ip link set up dev lo
2961 ip addr add
127.0.0.1/
8 dev lo brd + scope host
2964 \# IPv6 autoconfigure themself on loopback.\\
2966 \# If user gave loopback as device, we add the address as alias and exit.
2969 if
[ "$dev" = "lo"
]; then
2970 if
[ "$ipaddr" != "" -a "$ipaddr" != "
127.0.0.1"
]; then
2971 ip address add $ipaddr dev $dev
2978 \noindent\#
{\bf Step
1} --- enable device
\verb|$dev|
2981 if ! ip link set up dev $dev ; then
2982 echo "Cannot enable interface $dev. Aborting."
1>&
2
2987 \# The interface is
\verb|UP|. IPv6 started stateless autoconfiguration itself,\\
2988 \# and its configuration finishes here. However,\\
2989 \# IP still needs some static preconfigured address.
2992 if
[ "$ipaddr" = ""
]; then
2993 echo "No address for $dev is configured, trying DHCP..."
1>&
2
3000 \#
{\bf Step
2} --- IP Duplicate Address Detection~
\cite{RFC-DHCP
}.\\
3001 \# Send two probes and wait for result for
3 seconds.\\
3002 \# If the interface opens slower f.e.\ due to long media detection,\\
3003 \# you want to increase the timeout.\\
3006 if ! arping -q -c
2 -w
3 -D -I $dev $ipaddr ; then
3007 echo "Address $ipaddr is busy, trying DHCP..."
1>&
2
3013 \# OK, the address is unique, we may add it on the interface.\\
3015 \#
{\bf Step
3} --- Configure the address on the interface.
3019 if ! ip address add $pfx brd + dev $dev; then
3020 echo "Failed to add $pfx on $dev, trying DHCP..."
1>&
2
3026 \noindent\#
{\bf Step
4} --- Announce our presence on the link.
3028 arping -A -c
1 -I $dev $ipaddr
3031 arping -U -c
1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3035 \#
{\bf Step
5} (optional) --- Add some control routes.\\
3037 \#
1. Prohibit link local multicast addresses.\\
3038 \#
2. Prohibit link local (alias, limited) broadcast.\\
3039 \#
3. Add default multicast route.
3042 ip route add unreachable
224.0.0.0/
24
3043 ip route add unreachable
255.255.255.255
3044 if
[ `ip link ls $dev | grep -c MULTICAST` -ge
1 ]; then
3045 ip route add
224.0.0.0/
4 dev $dev scope global
3050 \#
{\bf Step
6} --- Add fallback default route with huge metric.\\
3051 \# If a proxy ARP server is present on the interface, we will be\\
3052 \# able to talk to all the Internet without further configuration.\\
3053 \# It is not so cheap though and we still hope that this route\\
3054 \# will be overridden by more correct one by rdisc.\\
3055 \# Do not make this step if the device is not ARPable,\\
3056 \# because dead nexthop detection does not work on them.
3059 if
[ "$noarp" = "
0"
]; then
3060 ip ro add default dev $dev metric
30000 scope global
3065 \#
{\bf Step
7} --- Restart router discovery and exit.
3068 killall -HUP rdisc || rdisc -fs
3073 \section{Example:
{\protect\tt ifcfg
} --- interface address management
}
3074 \label{EXAMPLE-IFCFG
}
3076 This is a simplistic script replacing one option of
\verb|ifconfig|,
3077 namely, IP address management. It not only adds
3078 addresses, but also carries out Duplicate Address Detection~
\cite{RFC-DHCP
},
3079 sends unsolicited ARP to update the caches of other hosts sharing
3080 the interface, adds some control routes and restarts Router Discovery
3081 when it is necessary.
3083 I strongly recommend using it
{\em instead\/
} of
\verb|ifconfig| both
3084 on hosts and on routers.
3090 \#
{\bf Usage:
\verb?ifcfg DEVICE
[:ALIAS
] [add|del
] ADDRESS
[/LENGTH
] [PEER
]?
}\\
3091 \#
{\bf Parameters:
}\\
3092 \# ---Device name. It may have alias suffix, separated by colon.\\
3093 \# ---Command: add, delete or stop.\\
3094 \# ---IP address, optionally followed by prefix length.\\
3095 \# ---Optional peer address for pointopoint interfaces.\\
3096 \# F.e.
\verb|ifcfg eth0
193.233.7.90/
24|
3098 \noindent\# This function determines, whether it is router or host.\\
3099 \# It returns
0, if the host is apparently not router.
3102 CheckForwarding ()
{
3104 sbase=/proc/sys/net/ipv4/conf
3106 if
[ -d $sbase
]; then
3107 for dir in $sbase/*/forwarding; do
3108 fwd=$
[$fwd + `cat $dir`
]
3117 \# This function restarts Router Discovery.\\
3121 killall -HUP rdisc || rdisc -fs
3125 \# Calculate ABC "natural" mask length\\
3126 \# Arg: \$
1 = dotquad address
3132 if
[ $class -eq
0 -o $class -ge
224 ]; then return
0
3133 elif
[ $class -ge
192 ]; then return
24
3134 elif
[ $class -ge
128 ]; then return
16
3143 \# Strip alias suffix separated by colon.
3149 if
[ "$dev" = "" -o "$
1" = "help"
]; then
3150 echo "Usage: ifcfg DEV
[[add|del
[ADDR
[/LEN
]] [PEER
] | stop
]"
1>&
2
3151 echo " add - add new address"
1>&
2
3152 echo " del - delete address"
1>&
2
3153 echo " stop - completely disable IP"
1>&
2
3162 \# Parse command. If it is ``stop'', flush and exit.
3169 if
[ "$ldev" != "$dev"
]; then
3170 echo "Cannot stop alias $ldev"
1>&
2
3173 ip -
4 addr flush dev $dev $label || exit
1
3174 if
[ $fwd -eq
0 ]; then RestartRDISC; fi
3177 deleting=
1; shift ;;
3182 \# Parse prefix, split prefix length, separated by slash.
3187 if
[ "$
1" != ""
]; then
3189 if
[ "$
1" != "$ipaddr"
]; then
3192 if
[ "$ipaddr" = ""
]; then
3193 echo "$
1 is bad IP address."
1>&
2
3200 \# If peer address is present, prefix length is
32.\\
3201 \# Otherwise, if prefix length was not given, guess it.
3205 if
[ "$peer" != ""
]; then
3206 if
[ "$pfxlen" != "" -a "$pfxlen" != "
32"
]; then
3207 echo "Peer address with non-trivial netmask."
1>&
2
3210 pfx="$ipaddr peer $peer"
3212 if
[ "$pfxlen" = ""
]; then
3216 pfx="$ipaddr/$pfxlen"
3218 if
[ "$ldev" = "$dev" -a "$ipaddr" != ""
]; then
3223 \# If deletion was requested, delete the address and restart RDISC
3226 if
[ $deleting -ne
0 ]; then
3227 ip addr del $pfx dev $dev $label || exit
1
3228 if
[ $fwd -eq
0 ]; then RestartRDISC; fi
3233 \# Start interface initialization.\\
3235 \#
{\bf Step
0} --- enable device
\verb|$dev|
3238 if ! ip link set up dev $dev ; then
3239 echo "Error: cannot enable interface $dev."
1>&
2
3242 if
[ "$ipaddr" = ""
]; then exit
0; fi
3245 \#
{\bf Step
1} --- IP Duplicate Address Detection~
\cite{RFC-DHCP
}.\\
3246 \# Send two probes and wait for result for
3 seconds.\\
3247 \# If the interface opens slower f.e.\ due to long media detection,\\
3248 \# you want to increase the timeout.\\
3251 if ! arping -q -c
2 -w
3 -D -I $dev $ipaddr ; then
3252 echo "Error: some host already uses address $ipaddr on $dev."
1>&
2
3257 \# OK, the address is unique. We may add it to the interface.\\
3259 \#
{\bf Step
2} --- Configure the address on the interface.
3262 if ! ip address add $pfx brd + dev $dev $label; then
3263 echo "Error: failed to add $pfx on $dev."
1>&
2
3267 \noindent\#
{\bf Step
3} --- Announce our presence on the link
3269 arping -q -A -c
1 -I $dev $ipaddr
3272 arping -q -U -c
1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3275 \#
{\bf Step
4} (optional) --- Add some control routes.\\
3277 \#
1. Prohibit link local multicast addresses.\\
3278 \#
2. Prohibit link local (alias, limited) broadcast.\\
3279 \#
3. Add default multicast route.
3282 ip route add unreachable
224.0.0.0/
24 >& /dev/null
3283 ip route add unreachable
255.255.255.255 >& /dev/null
3284 if
[ `ip link ls $dev | grep -c MULTICAST` -ge
1 ]; then
3285 ip route add
224.0.0.0/
4 dev $dev scope global >& /dev/null
3289 \#
{\bf Step
5} --- Add fallback default route with huge metric.\\
3290 \# If a proxy ARP server is present on the interface, we will be\\
3291 \# able to talk to all the Internet without further configuration.\\
3292 \# Do not make this step on router or if the device is not ARPable.\\
3293 \# because dead nexthop detection does not work on them.
3296 if
[ $fwd -eq
0 ]; then
3297 if
[ $noarp -eq
0 ]; then
3298 ip ro append default dev $dev metric
30000 scope global
3299 elif
[ "$peer" != ""
]; then
3300 if ping -q -c
2 -w
4 $peer ; then
3301 ip ro append default via $peer dev $dev metric
30001
3310 \# End of
{\bf MAIN()
}