1 \documentstyle[12pt,twoside
]{article
}
2 \def\TITLE{IP Command Reference
}
5 \Large\bf IP Command Reference.
10 { \large Alexey~N.~Kuznetsov
} \\
11 \em Institute for Nuclear Research, Moscow \\
12 \verb|kuznet@ms2.inr.ac.ru| \\
22 \section{About this
document}
24 This
document presents a comprehensive description of the
\verb|ip| utility
25 from the
\verb|iproute2| package. It is not a tutorial or user's guide.
26 It is a
{\em dictionary\/
}, not explaining terms,
27 but translating them into other terms, which may also be unknown to the reader.
28 However, the
document is self-contained and the reader, provided they have a
29 basic networking background, will find enough information
30 and examples to understand and configure Linux-
2.2 IP and IPv6
33 This
document is split into sections explaining
\verb|ip| commands
34 and options, decrypting
\verb|ip| output and containing a few examples.
35 More voluminous examples and some topics, which require more elaborate
36 discussion, are in the appendix.
38 The paragraphs beginning with NB contain side notes, warnings about
39 bugs and design drawbacks. They may be skipped at the first reading.
41 \section{{\tt ip
} --- command syntax
}
43 The generic form of an
\verb|ip| command is:
45 ip
[ OPTIONS
] OBJECT
[ COMMAND
[ ARGUMENTS
]]
47 where
\verb|OPTIONS| is a set of optional modifiers affecting the
48 general behaviour of the
\verb|ip| utility or changing its output. All options
49 begin with the character
\verb|'-'| and may be used in either long or abbreviated
50 forms. Currently, the following options are available:
53 \item \verb|-V|,
\verb|-Version|
55 --- print the version of the
\verb|ip| utility and exit.
58 \item \verb|-s|,
\verb|-stats|,
\verb|-statistics|
60 --- output more information. If the option
61 appears twice or more, the amount of information increases.
62 As a rule, the information is statistics or some time values.
65 \item \verb|-f|,
\verb|-family| followed by a protocol family
66 identifier:
\verb|inet|,
\verb|inet6| or
\verb|link|.
68 --- enforce the protocol family to use. If the option is not present,
69 the protocol family is guessed from other arguments. If the rest of the command
70 line does not give enough information to guess the family,
\verb|ip| falls back to the default
71 one, usually
\verb|inet| or
\verb|any|.
\verb|link| is a special family
72 identifier meaning that no networking protocol is involved.
76 --- shortcut for
\verb|-family inet|.
80 --- shortcut for
\verb|-family inet6|.
84 --- shortcut for
\verb|-family link|.
87 \item \verb|-o|,
\verb|-oneline|
89 --- output each record on a single line, replacing line feeds
90 with the
\verb|'\'| character. This is convenient when you want to
91 count records with
\verb|wc| or to
\verb|grep| the output. The trivial
92 script
\verb|rtpr| converts the output back into readable form.
94 \item \verb|-r|,
\verb|-resolve|
96 --- use the system's name resolver to print DNS names instead of
100 Do not use this option when reporting bugs or asking for advice.
103 \verb|ip| never uses DNS to resolve names to addresses.
108 \verb|OBJECT| is the object to manage or to get information about.
109 The object types currently understood by
\verb|ip| are:
112 \item \verb|link| --- network device
113 \item \verb|address| --- protocol (IP or IPv6) address on a device
114 \item \verb|neighbour| --- ARP or NDISC cache entry
115 \item \verb|route| --- routing table entry
116 \item \verb|rule| --- rule in routing policy database
117 \item \verb|maddress| --- multicast address
118 \item \verb|mroute| --- multicast routing cache entry
119 \item \verb|tunnel| --- tunnel over IP
122 Again, the names of all objects may be written in full or
123 abbreviated form, f.e.\
\verb|address| is abbreviated as
\verb|addr|
126 \verb|COMMAND| specifies the action to perform on the object.
127 The set of possible actions depends on the object type.
128 As a rule, it is possible to
\verb|add|,
\verb|delete| and
129 \verb|show| (or
\verb|list|) objects, but some objects
130 do not allow all of these operations or have some additional commands.
131 The
\verb|help| command is available for all objects. It prints
132 out a list of available commands and argument syntax conventions.
134 If no command is given, some default command is assumed.
135 Usually it is
\verb|list| or, if the objects of this class
136 cannot be listed,
\verb|help|.
138 \verb|ARGUMENTS| is a list of arguments to the command.
139 The arguments depend on the command and object. There are two types of arguments:
140 {\em flags\/
}, consisting of a single keyword, and
{\em parameters\/
},
141 consisting of a keyword followed by a value. For convenience,
142 each command has some
{\em default parameter\/
}
143 which may be omitted. F.e.\ parameter
\verb|dev| is the default
144 for the
{\tt ip link
} command, so
{\tt ip link ls eth0
} is equivalent
145 to
{\tt ip link ls dev eth0
}.
146 In the command descriptions below such parameters
147 are distinguished with the marker: ``(default)''.
149 Almost all keywords may be abbreviated with several first (or even single)
150 letters. The shortcuts are convenient when
\verb|ip| is used interactively,
151 but they are not recommended in scripts or when reporting bugs
152 or asking for advice. ``Officially'' allowed abbreviations are listed
153 in the
document body.
157 \section{{\tt ip
} --- error messages
}
159 \verb|ip| may fail for one of the following reasons:
163 A syntax error on the command line: an unknown keyword, incorrectly formatted
164 IP address
{\em et al\/
}. In this case
\verb|ip| prints an error message
165 and exits. As a rule, the error message will contain information
166 about the reason for the failure. Sometimes it also prints a help page.
169 The arguments did not pass verification for self-consistency.
172 \verb|ip| failed to compile a kernel request from the arguments
173 because the user didn't give enough information.
176 The kernel returned an error to some syscall. In this case
\verb|ip|
177 prints the error message, as it is output with
\verb|perror(
3)|,
178 prefixed with a comment and a syscall identifier.
181 The kernel returned an error to some RTNETLINK request.
182 In this case
\verb|ip| prints the error message, as it is output
183 with
\verb|perror(
3)| prefixed with ``RTNETLINK answers:''.
187 All the operations are atomic, i.e.\
188 if the
\verb|ip| utility fails, it does not change anything
189 in the system. One harmful exception is
\verb|ip link| command
190 (Sec.
\ref{IP-LINK
}, p.
\pageref{IP-LINK
}),
191 which may change only some of the device parameters given
194 It is difficult to list all the error messages (especially
195 syntax errors). However, as a rule, their meaning is clear
196 from the context of the command.
198 The most common mistakes are:
201 \item Netlink is not configured in the kernel. The message is:
203 Cannot open netlink socket: Invalid value
206 \item RTNETLINK is not configured in the kernel. In this case
207 one of the following messages may be printed, depending on the command:
209 Cannot talk to rtnetlink: Connection refused
210 Cannot send dump request: Connection refused
213 \item The
\verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
214 when configuring the kernel. In this case any attempt to use the
215 \verb|ip|
\verb|rule| command will fail, f.e.
217 kuznet@kaiser $ ip rule list
218 RTNETLINK error: Invalid argument
225 \section{{\tt ip link
} --- network device configuration
}
228 \paragraph{Object:
} A
\verb|link| is a network device and the corresponding
229 commands display and change the state of devices.
231 \paragraph{Commands:
} \verb|set| and
\verb|show| (or
\verb|list|).
233 \subsection{{\tt ip link set
} --- change device attributes
}
235 \paragraph{Abbreviations:
} \verb|set|,
\verb|s|.
237 \paragraph{Arguments:
}
240 \item \verb|dev NAME| (default)
242 ---
\verb|NAME| specifies the network device on which to operate.
244 \item \verb|up| and
\verb|down|
246 --- change the state of the device to
\verb|UP| or
\verb|DOWN|.
248 \item \verb|arp on| or
\verb|arp off|
250 --- change the
\verb|NOARP| flag on the device.
253 This operation is
{\em not allowed\/
} if the device is in state
\verb|UP|.
254 Though neither the
\verb|ip| utility nor the kernel check for this condition.
255 You can get unpredictable results changing this flag while the
259 \item \verb|multicast on| or
\verb|multicast off|
261 --- change the
\verb|MULTICAST| flag on the device.
263 \item \verb|dynamic on| or
\verb|dynamic off|
265 --- change the
\verb|DYNAMIC| flag on the device.
267 \item \verb|name NAME|
269 --- change the name of the device. This operation is not
270 recommended if the device is running or has some addresses
273 \item \verb|txqueuelen NUMBER| or
\verb|txqlen NUMBER|
275 --- change the transmit queue length of the device.
277 \item \verb|mtu NUMBER|
279 --- change the MTU of the device.
281 \item \verb|address LLADDRESS|
283 --- change the station address of the interface.
285 \item \verb|broadcast LLADDRESS|,
\verb|brd LLADDRESS| or
\verb|peer LLADDRESS|
287 --- change the link layer broadcast address or the peer address when
288 the interface is
\verb|POINTOPOINT|.
292 For most devices (f.e.\ for Ethernet) changing the link layer
293 broadcast address will break networking.
294 Do not use it, if you do not understand what this operation really does.
301 The
\verb|PROMISC| and
\verb|ALLMULTI| flags are considered
302 obsolete and should not be changed administratively, though
303 the
{\tt ip
} utility will allow that.
306 \paragraph{Warning:
} If multiple parameter changes are requested,
307 \verb|ip| aborts immediately after any of the changes have failed.
308 This is the only case when
\verb|ip| can move the system to
309 an unpredictable state. The solution is to avoid changing
310 several parameters with one
{\tt ip link set
} call.
312 \paragraph{Examples:
}
314 \item \verb|ip link set dummy address
00:
00:
00:
00:
00:
01|
316 --- change the station address of the interface
\verb|dummy|.
318 \item \verb|ip link set dummy up|
320 --- start the interface
\verb|dummy|.
325 \subsection{{\tt ip link show
} --- display device attributes
}
328 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|lst|,
\verb|sh|,
\verb|ls|,
331 \paragraph{Arguments:
}
333 \item \verb|dev NAME| (default)
335 ---
\verb|NAME| specifies the network device to show.
336 If this argument is omitted all devices are listed.
340 --- only display running interfaces.
345 \paragraph{Output format:
}
348 kuznet@alisa:~ $ ip link ls eth0
349 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
350 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
351 kuznet@alisa:~ $ ip link ls sit0
352 5: sit0@NONE: <NOARP,UP> mtu
1480 qdisc noqueue
353 link/sit
0.0.0.0 brd
0.0.0.0
354 kuznet@alisa:~ $ ip link ls dummy
355 2: dummy: <BROADCAST,NOARP> mtu
1500 qdisc noop
356 link/ether
00:
00:
00:
00:
00:
00 brd ff:ff:ff:ff:ff:ff
361 The number before each colon is an
{\em interface index\/
} or
{\em ifindex\/
}.
362 This number uniquely identifies the interface. This is followed by the
{\em interface name\/
}
363 (
\verb|eth0|,
\verb|sit0| etc.). The interface name is also
364 unique at every given moment. However, the interface may disappear from the
365 list (f.e.\ when the corresponding driver module is unloaded) and another
366 one with the same name may be created later. Besides that,
367 the administrator may change the name of any device with
368 \verb|ip|
\verb|link|
\verb|set|
\verb|name|
369 to make it more intelligible.
371 The interface name may have another name or
\verb|NONE| appended
372 after the
\verb|@| sign. This means that this device is bound to some other
374 i.e.\ packets send through it are encapsulated and sent via the ``master''
375 device. If the name is
\verb|NONE|, the master is unknown.
377 Then we see the interface
{\em mtu\/
} (``maximal transfer unit''). This determines
378 the maximal size of data which can be sent as a single packet over this interface.
380 {\em qdisc\/
} (``queuing discipline'') shows the queuing algorithm used
381 on the interface. Particularly,
\verb|noqueue| means that this interface
382 does not queue anything and
\verb|noop| means that the interface is in blackhole
383 mode i.e.\ all packets sent to it are immediately discarded.
384 {\em qlen\/
} is the default transmit queue length of the device measured
387 The interface flags are summarized in the angle brackets.
390 \item \verb|UP| --- the device is turned on. It is ready to accept
391 packets for transmission and it may inject into the kernel packets received
392 from other nodes on the network.
394 \item \verb|LOOPBACK| --- the interface does not communicate with other
395 hosts. All packets sent through it will be returned
396 and nothing but bounced packets can be received.
398 \item \verb|BROADCAST| --- the device has the facility to send packets
399 to all hosts sharing the same link. A typical example is an Ethernet link.
401 \item \verb|POINTOPOINT| --- the link has only two ends with one node
402 attached to each end. All packets sent to this link will reach the peer
403 and all packets received by us came from this single peer.
405 If neither
\verb|LOOPBACK| nor
\verb|BROADCAST| nor
\verb|POINTOPOINT|
406 are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
407 This is the most generic type of device and the most complicated one, because
408 the host attached to a NBMA link has no means to send to anyone
409 without additionally configured information.
411 \item \verb|MULTICAST| --- is an advisory flag indicating that the interface
412 is aware of multicasting i.e.\ sending packets to some subset of neighbouring
413 nodes. Broadcasting is a particular case of multicasting, where the multicast
414 group consists of all nodes on the link. It is important to emphasize
415 that software
{\em must not\/
} interpret the absence of this flag as the inability
416 to use multicasting on this interface. Any
\verb|POINTOPOINT| and
417 \verb|BROADCAST| link is multicasting by definition, because we have
418 direct access to all the neighbours and, hence, to any part of them.
419 Certainly, the use of high bandwidth multicast transfers is not recommended
420 on broadcast-only links because of high expense, but it is not strictly
423 \item \verb|PROMISC| --- the device listens to and feeds to the kernel all
424 traffic on the link even if it is not destined for us, not broadcasted
425 and not destined for a multicast group of which we are member. Usually
426 this mode exists only on broadcast links and is used by bridges and for network
429 \item \verb|ALLMULTI| --- the device receives all multicast packets
430 wandering on the link. This mode is used by multicast routers.
432 \item \verb|NOARP| --- this flag is different from the other ones. It has
433 no invariant value and its interpretation depends on the network protocols
434 involved. As a rule, it indicates that the device needs no address
435 resolution and that the software or hardware knows how to deliver packets
436 without any help from the protocol stacks.
438 \item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
439 dynamically created and destroyed.
441 \item \verb|SLAVE| --- this interface is bonded to some other interfaces
442 to share link capacities.
448 There are other flags but they are either obsolete (
\verb|NOTRAILERS|)
449 or not implemented (
\verb|DEBUG|) or specific to some devices
450 (
\verb|MASTER|,
\verb|AUTOMEDIA| and
\verb|PORTSEL|). We do not discuss
455 The second line contains information on the link layer addresses
456 associated with the device. The first word (
\verb|ether|,
\verb|sit|)
457 defines the interface hardware type. This type determines the format and semantics
458 of the addresses and is logically part of the address.
459 The default format of the station address and the broadcast address
460 (or the peer address for pointopoint links) is a
461 sequence of hexadecimal bytes separated by colons, but some link
462 types may have their natural address format, f.e.\ addresses
463 of tunnels over IP are printed as dotted-quad IP addresses.
467 NBMA links have no well-defined broadcast or peer address,
468 however this field may contain useful information, f.e.\
469 about the address of broadcast relay or about the address of the ARP server.
472 Multicast addresses are not shown by this command, see
473 \verb|ip maddr ls| in~Sec.
\ref{IP-MADDR
} (p.
\pageref{IP-MADDR
} of this
478 \paragraph{Statistics:
} With the
\verb|-statistics| option,
\verb|ip| also
479 prints interface statistics:
482 kuznet@alisa:~ $ ip -s link ls eth0
483 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
484 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
485 RX: bytes packets errors dropped overrun mcast
486 2449949362 2786187 0 0 0 0
487 TX: bytes packets errors dropped carrier collsns
488 178558497 1783945 332 0 332 35172
491 \verb|RX:| and
\verb|TX:| lines summarize receiver and transmitter
492 statistics. They contain:
494 \item \verb|bytes| --- the total number of bytes received or transmitted
495 on the interface. This number wraps when the maximal length of the data type
496 natural for the architecture is exceeded, so continuous monitoring requires
497 a user level daemon snapping it periodically.
498 \item \verb|packets| --- the total number of packets received or transmitted
500 \item \verb|errors| --- the total number of receiver or transmitter errors.
501 \item \verb|dropped| --- the total number of packets dropped due to lack
503 \item \verb|overrun| --- the total number of receiver overruns resulting
504 in dropped packets. As a rule, if the interface is overrun, it means
505 serious problems in the kernel or that your machine is too slow
507 \item \verb|mcast| --- the total number of received multicast packets. This option
508 is only supported by a few devices.
509 \item \verb|carrier| --- total number of link media failures f.e.\ because
511 \item \verb|collsns| --- the total number of collision events
512 on Ethernet-like media. This number may have a different sense on other
514 \item \verb|compressed| --- the total number of compressed packets. This is
515 available only for links using VJ header compression.
519 If the
\verb|-s| option is entered twice or more,
520 \verb|ip| prints more detailed statistics on receiver
521 and transmitter errors.
524 kuznet@alisa:~ $ ip -s -s link ls eth0
525 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
526 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
527 RX: bytes packets errors dropped overrun mcast
528 2449949362 2786187 0 0 0 0
529 RX errors: length crc frame fifo missed
531 TX: bytes packets errors dropped carrier collsns
532 178558497 1783945 332 0 332 35172
533 TX errors: aborted fifo window heartbeat
537 These error names are pure Ethernetisms. Other devices
538 may have non zero values in these fields but they may be
539 interpreted differently.
542 \section{{\tt ip address
} --- protocol address management
}
544 \paragraph{Abbreviations:
} \verb|address|,
\verb|addr|,
\verb|a|.
546 \paragraph{Object:
} The
\verb|address| is a protocol (IP or IPv6) address attached
547 to a network device. Each device must have at least one address
548 to use the corresponding protocol. It is possible to have several
549 different addresses attached to one device. These addresses are not
550 discriminated, so that the term
{\em alias\/
} is not quite appropriate
551 for them and we do not use it in this
document.
553 The
\verb|ip addr| command displays addresses and their properties,
554 adds new addresses and deletes old ones.
556 \paragraph{Commands:
} \verb|add|,
\verb|delete|,
\verb|flush| and
\verb|show|
560 \subsection{{\tt ip address add
} --- add a new protocol address
}
563 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|.
565 \paragraph{Arguments:
}
568 \item \verb|dev NAME|
570 \noindent--- the name of the device to add the address to.
572 \item \verb|local ADDRESS| (default)
574 --- the address of the interface. The format of the address depends
575 on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
576 separated by colons for IPv6. The
\verb|ADDRESS| may be followed by
577 a slash and a decimal number which encodes the network prefix length.
580 \item \verb|peer ADDRESS|
582 --- the address of the remote endpoint for pointopoint interfaces.
583 Again, the
\verb|ADDRESS| may be followed by a slash and a decimal number,
584 encoding the network prefix length. If a peer address is specified,
585 the local address
{\em cannot\/
} have a prefix length. The network prefix is associated
586 with the peer rather than with the local address.
589 \item \verb|broadcast ADDRESS|
591 --- the broadcast address on the interface.
593 It is possible to use the special symbols
\verb|'+'| and
\verb|'-'|
594 instead of the broadcast address. In this case, the broadcast address
595 is derived by setting/resetting the host bits of the interface prefix.
599 Unlike
\verb|ifconfig|, the
\verb|ip| utility
{\em does not\/
} set any broadcast
600 address unless explicitly requested.
604 \item \verb|label NAME|
606 --- Each address may be tagged with a label string.
607 In order to preserve compatibility with Linux-
2.0 net aliases,
608 this string must coincide with the name of the device or must be prefixed
609 with the device name followed by colon.
612 \item \verb|scope SCOPE_VALUE|
614 --- the scope of the area where this address is valid.
615 The available scopes are listed in file
\verb|/etc/iproute2/rt_scopes|.
616 Predefined scope values are:
619 \item \verb|global| --- the address is globally valid.
620 \item \verb|site| --- (IPv6 only) the address is site local,
621 i.e.\ it is valid inside this site.
622 \item \verb|link| --- the address is link local, i.e.\
623 it is valid only on this device.
624 \item \verb|host| --- the address is valid only inside this host.
627 Appendix~
\ref{ADDR-SEL
} (p.
\pageref{ADDR-SEL
} of this
document)
628 contains more details on address scopes.
632 \paragraph{Examples:
}
634 \item \verb|ip addr add
127.0.0.1/
8 dev lo brd + scope host|
636 --- add the usual loopback address to the loopback device.
638 \item \verb|ip addr add
10.0.0.1/
24 brd + dev eth0 label eth0:Alias|
640 --- add the address
10.0.0.1 with prefix length
24 (i.e.\ netmask
641 \verb|
255.255.255.0|), standard broadcast and label
\verb|eth0:Alias|
642 to the interface
\verb|eth0|.
646 \subsection{{\tt ip address delete
} --- delete a protocol address
}
648 \paragraph{Abbreviations:
} \verb|delete|,
\verb|del|,
\verb|d|.
650 \paragraph{Arguments:
} coincide with the arguments of
\verb|ip addr add|.
651 The device name is a required argument. The rest are optional.
652 If no arguments are given, the first address is deleted.
654 \paragraph{Examples:
}
656 \item \verb|ip addr del
127.0.0.1/
8 dev lo|
658 --- deletes the loopback address from the loopback device.
659 It would be best not to repeat this experiment.
661 \item Disable IP on the interface
\verb|eth0|:
663 while ip -f inet addr del dev eth0; do
667 Another method to disable IP on an interface using
{\tt ip addr flush
}
668 may be found in sec.
\ref{IP-ADDR-FLUSH
}, p.
\pageref{IP-ADDR-FLUSH
}.
673 \subsection{{\tt ip address show
} --- display protocol addresses
}
675 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|lst|,
\verb|sh|,
\verb|ls|,
678 \paragraph{Arguments:
}
681 \item \verb|dev NAME| (default)
683 --- the name of the device.
685 \item \verb|scope SCOPE_VAL|
687 --- only list addresses with this scope.
689 \item \verb|to PREFIX|
691 --- only list addresses matching this prefix.
693 \item \verb|label PATTERN|
695 --- only list addresses with labels matching the
\verb|PATTERN|.
696 \verb|PATTERN| is a usual shell style pattern.
699 \item \verb|dynamic| and
\verb|permanent|
701 --- (IPv6 only) only list addresses installed due to stateless
702 address configuration or only list permanent (not dynamic) addresses.
704 \item \verb|tentative|
706 --- (IPv6 only) only list addresses which did not pass duplicate
709 \item \verb|deprecated|
711 --- (IPv6 only) only list deprecated addresses.
714 \item \verb|primary| and
\verb|secondary|
716 --- only list primary (or secondary) addresses.
721 \paragraph{Output format:
}
724 kuznet@alisa:~ $ ip addr ls eth0
725 3: eth0: <BROADCAST,MULTICAST,UP> mtu
1500 qdisc cbq qlen
100
726 link/ether
00:a0:cc:
66:
18:
78 brd ff:ff:ff:ff:ff:ff
727 inet
193.233.7.90/
24 brd
193.233.7.255 scope global eth0
728 inet6
3ffe:
2400:
0:
1:
2a0:ccff:fe66:
1878/
64 scope global dynamic
729 valid_lft forever preferred_lft
604746sec
730 inet6 fe80::
2a0:ccff:fe66:
1878/
10 scope link
734 The first two lines coincide with the output of
\verb|ip link ls|.
735 It is natural to interpret link layer addresses
736 as addresses of the protocol family
\verb|AF_PACKET|.
738 Then the list of IP and IPv6 addresses follows, accompanied by
739 additional address attributes: scope value (see Sec.
\ref{IP-ADDR-ADD
},
740 p.
\pageref{IP-ADDR-ADD
} above), flags and the address label.
742 Address flags are set by the kernel and cannot be changed
743 administratively. Currently, the following flags are defined:
746 \item \verb|secondary|
748 --- the address is not used when selecting the default source address
749 of outgoing packets (Cf.\ Appendix~
\ref{ADDR-SEL
}, p.
\pageref{ADDR-SEL
}.).
750 An IP address becomes secondary if another address with the same
751 prefix bits already exists. The first address is primary.
752 It is the leader of the group of all secondary addresses. When the leader
753 is deleted, all secondaries are purged too.
758 --- the address was created due to stateless autoconfiguration~
\cite{RFC-ADDRCONF
}.
759 In this case the output also contains information on times, when
760 the address is still valid. After
\verb|preferred_lft| expires the address is
761 moved to the deprecated state. After
\verb|valid_lft| expires the address
762 is finally invalidated.
764 \item \verb|deprecated|
766 --- the address is deprecated, i.e.\ it is still valid, but cannot
767 be used by newly created connections.
769 \item \verb|tentative|
771 --- the address is not used because duplicate address detection~
\cite{RFC-ADDRCONF
}
772 is still not complete or failed.
777 \subsection{{\tt ip address flush
} --- flush protocol addresses
}
778 \label{IP-ADDR-FLUSH
}
780 \paragraph{Abbreviations:
} \verb|flush|,
\verb|f|.
782 \paragraph{Description:
}This command flushes the protocol addresses
783 selected by some criteria.
785 \paragraph{Arguments:
} This command has the same arguments as
\verb|show|.
786 The difference is that it does not run when no arguments are given.
788 \paragraph{Warning:
} This command (and other
\verb|flush| commands
789 described below) is pretty dangerous. If you make a mistake, it will
790 not forgive it, but will cruelly purge all the addresses.
792 \paragraph{Statistics:
} With the
\verb|-statistics| option, the command
793 becomes verbose. It prints out the number of deleted addresses and the number
794 of rounds made to flush the address list. If this option is given
795 twice,
\verb|ip addr flush| also dumps all the deleted addresses
796 in the format described in the previous subsection.
798 \paragraph{Example:
} Delete all the addresses from the private network
801 netadm@amber:~ # ip -s -s a f to
10/
8
802 2: dummy inet
10.7.7.7/
16 brd
10.7.255.255 scope global dummy
803 3: eth0 inet
10.10.7.7/
16 brd
10.10.255.255 scope global eth0
804 4: eth1 inet
10.8.7.7/
16 brd
10.8.255.255 scope global eth1
806 *** Round
1, deleting
3 addresses ***
807 *** Flush is complete after
1 round ***
810 Another instructive example is disabling IP on all the Ethernets:
812 netadm@amber:~ # ip -
4 addr flush label "eth*"
814 And the last example shows how to flush all the IPv6 addresses
815 acquired by the host from stateless address autoconfiguration
816 after you enabled forwarding or disabled autoconfiguration.
818 netadm@amber:~ # ip -
6 addr flush dynamic
823 \section{{\tt ip neighbour
} --- neighbour/arp tables management
}
825 \paragraph{Abbreviations:
} \verb|neighbour|,
\verb|neighbor|,
\verb|neigh|,
828 \paragraph{Object:
} \verb|neighbour| objects establish bindings between protocol
829 addresses and link layer addresses for hosts sharing the same link.
830 Neighbour entries are organized into tables. The IPv4 neighbour table
831 is known by another name --- the ARP table.
833 The corresponding commands display neighbour bindings
834 and their properties, add new neighbour entries and delete old ones.
836 \paragraph{Commands:
} \verb|add|,
\verb|change|,
\verb|replace|,
837 \verb|delete|,
\verb|flush| and
\verb|show| (or
\verb|list|).
839 \paragraph{See also:
} Appendix~
\ref{PROXY-NEIGH
}, p.
\pageref{PROXY-NEIGH
}
840 describes how to manage proxy ARP/NDISC with the
\verb|ip| utility.
843 \subsection{{\tt ip neighbour add
} --- add a new neighbour entry\\
844 {\tt ip neighbour change
} --- change an existing entry\\
845 {\tt ip neighbour replace
} --- add a new entry or change an existing one
}
847 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|change|,
\verb|chg|;
848 \verb|replace|,
\verb|repl|.
850 \paragraph{Description:
} These commands create new neighbour records
851 or update existing ones.
853 \paragraph{Arguments:
}
856 \item \verb|to ADDRESS| (default)
858 --- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
860 \item \verb|dev NAME|
862 --- the interface to which this neighbour is attached.
865 \item \verb|lladdr LLADDRESS|
867 --- the link layer address of the neighbour.
\verb|LLADDRESS| can also be
870 \item \verb|nud NUD_STATE|
872 --- the state of the neighbour entry.
\verb|nud| is an abbreviation for ``Neighbour
873 Unreachability Detection''. The state can take one of the following values:
876 \item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
878 \item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
879 this entry will be made but it can be removed when its lifetime expires.
880 \item \verb|reachable| --- the neighbour entry is valid until the reachability
882 \item \verb|stale| --- the neighbour entry is valid but suspicious.
883 This option to
\verb|ip neigh| does not change the neighbour state if
884 it was valid and the address is not changed by this command.
889 \paragraph{Examples:
}
891 \item \verb|ip neigh add
10.0.0.3 lladdr
0:
0:
0:
0:
0:
1 dev eth0 nud perm|
893 --- add a permanent ARP entry for the neighbour
10.0.0.3 on the device
\verb|eth0|.
895 \item \verb|ip neigh chg
10.0.0.3 dev eth0 nud reachable|
897 --- change its state to
\verb|reachable|.
901 \subsection{{\tt ip neighbour delete
} --- delete a neighbour entry
}
903 \paragraph{Abbreviations:
} \verb|delete|,
\verb|del|,
\verb|d|.
905 \paragraph{Description:
} This command invalidates a neighbour entry.
907 \paragraph{Arguments:
} The arguments are the same as with
\verb|ip neigh add|,
908 except that
\verb|lladdr| and
\verb|nud| are ignored.
913 \item \verb|ip neigh del
10.0.0.3 dev eth0|
915 --- invalidate an ARP entry for the neighbour
10.0.0.3 on the device
\verb|eth0|.
920 The deleted neighbour entry will not disappear from the tables
921 immediately. If it is in use it cannot be deleted until the last
922 client releases it. Otherwise it will be destroyed during
923 the next garbage collection.
927 \paragraph{Warning:
} Attempts to delete or manually change
928 a
\verb|noarp| entry created by the kernel may result in unpredictable behaviour.
929 Particularly, the kernel may try to resolve this address even
930 on a
\verb|NOARP| interface or if the address is multicast or broadcast.
933 \subsection{{\tt ip neighbour show
} --- list neighbour entries
}
935 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|.
937 \paragraph{Description:
}This commands displays neighbour tables.
939 \paragraph{Arguments:
}
943 \item \verb|to ADDRESS| (default)
945 --- the prefix selecting the neighbours to list.
947 \item \verb|dev NAME|
949 --- only list the neighbours attached to this device.
953 --- only list neighbours which are not currently in use.
955 \item \verb|nud NUD_STATE|
957 --- only list neighbour entries in this state.
\verb|NUD_STATE| takes
958 values listed below or the special value
\verb|all| which means all states.
959 This option may occur more than once. If this option is absent,
\verb|ip|
960 lists all entries except for
\verb|none| and
\verb|noarp|.
965 \paragraph{Output format:
}
968 kuznet@alisa:~ $ ip neigh ls
969 :: dev lo lladdr
00:
00:
00:
00:
00:
00 nud noarp
970 fe80::
200:cff:fe76:
3f85 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 router \
972 0.0.0.0 dev lo lladdr
00:
00:
00:
00:
00:
00 nud noarp
973 193.233.7.254 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 nud reachable
974 193.233.7.85 dev eth0 lladdr
00:e0:
1e:
63:
39:
00 nud stale
978 The first word of each line is the protocol address of the neighbour.
979 Then the device name follows. The rest of the line describes the contents of
980 the neighbour entry identified by the pair (device, address).
982 \verb|lladdr| is the link layer address of the neighbour.
984 \verb|nud| is the state of the ``neighbour unreachability detection'' machine
985 for this entry. The detailed description of the neighbour
986 state machine can be found in~
\cite{RFC-NDISC
}. Here is the full list
987 of the states with short descriptions:
990 \item\verb|none| --- the state of the neighbour is void.
991 \item\verb|incomplete| --- the neighbour is in the process of resolution.
992 \item\verb|reachable| --- the neighbour is valid and apparently reachable.
993 \item\verb|stale| --- the neighbour is valid, but is probably already
994 unreachable, so the kernel will try to check it at the first transmission.
995 \item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
997 \item\verb|probe| --- the delay timer expired but no confirmation was received.
998 The kernel has started to probe the neighbour with ARP/NDISC messages.
999 \item\verb|failed| --- resolution has failed.
1000 \item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1002 \item\verb|permanent| --- it is a
\verb|noarp| entry, but only the administrator
1003 may remove the entry from the neighbour table.
1006 The link layer address is valid in all states except for
\verb|none|,
1007 \verb|failed| and
\verb|incomplete|.
1009 IPv6 neighbours can be marked with the additional flag
\verb|router|
1010 which means that the neighbour introduced itself as an IPv6 router~
\cite{RFC-NDISC
}.
1012 \paragraph{Statistics:
} The
\verb|-statistics| option displays some usage
1016 kuznet@alisa:~ $ ip -s n ls
193.233.7.254
1017 193.233.7.254 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 ref
5 used
12/
13/
20 \
1022 Here
\verb|ref| is the number of users of this entry
1023 and
\verb|used| is a triplet of time intervals in seconds
1024 separated by slashes. In this case they show that:
1027 \item the entry was used
12 seconds ago.
1028 \item the entry was confirmed
13 seconds ago.
1029 \item the entry was updated
20 seconds ago.
1032 \subsection{{\tt ip neighbour flush
} --- flush neighbour entries
}
1034 \paragraph{Abbreviations:
} \verb|flush|,
\verb|f|.
1036 \paragraph{Description:
}This command flushes neighbour tables, selecting
1037 entries to flush by some criteria.
1039 \paragraph{Arguments:
} This command has the same arguments as
\verb|show|.
1040 The differences are that it does not run when no arguments are given,
1041 and that the default neighbour states to be flushed do not include
1042 \verb|permanent| and
\verb|noarp|.
1045 \paragraph{Statistics:
} With the
\verb|-statistics| option, the command
1046 becomes verbose. It prints out the number of deleted neighbours and the number
1047 of rounds made to flush the neighbour table. If the option is given
1048 twice,
\verb|ip neigh flush| also dumps all the deleted neighbours
1049 in the format described in the previous subsection.
1051 \paragraph{Example:
}
1053 netadm@alisa:~ # ip -s -s n f
193.233.7.254
1054 193.233.7.254 dev eth0 lladdr
00:
00:
0c:
76:
3f:
85 ref
5 used
12/
13/
20 \
1057 *** Round
1, deleting
1 entries ***
1058 *** Flush is complete after
1 round ***
1063 \section{{\tt ip route
} --- routing table management
}
1066 \paragraph{Abbreviations:
} \verb|route|,
\verb|ro|,
\verb|r|.
1068 \paragraph{Object:
} \verb|route| entries in the kernel routing tables keep
1069 information about paths to other networked nodes.
1071 Each route entry has a
{\em key\/
} consisting of a
{\em prefix\/
}
1072 (i.e.\ a pair containing a network address and the length of its mask) and,
1073 optionally, the TOS value. An IP packet matches the route if the highest
1074 bits of its destination address are equal to the route prefix at least
1075 up to the prefix length and if the TOS of the route is zero or equal to
1076 the TOS of the packet.
1078 If several routes match the packet, the following pruning rules
1079 are used to select the best one (see~
\cite{RFC1812
}):
1081 \item The longest matching prefix is selected. All shorter ones
1084 \item If the TOS of some route with the longest prefix is equal to the TOS
1085 of the packet, the routes with different TOS are dropped.
1087 If no exact TOS match was found and routes with TOS=
0 exist,
1088 the rest of routes are pruned.
1090 Otherwise, the route lookup fails.
1092 \item If several routes remain after the previous steps, then
1093 the routes with the best preference values are selected.
1095 \item If we still have several routes, then the
{\em first\/
} of them
1099 Note the ambiguity of the last step. Unfortunately, Linux
1100 historically allows such a bizarre situation. The sense of the
1101 word ``first'' depends on the order of route additions and it is practically
1102 impossible to maintain a bundle of such routes in this order.
1105 For simplicity we will limit ourselves to the case where such a situation
1106 is impossible and routes are uniquely identified by the triplet
1107 \
{prefix, tos, preference\
}. Actually, it is impossible to create
1108 non-unique routes with
\verb|ip| commands described in this section.
1110 One useful exception to this rule is the default route on non-forwarding
1111 hosts. It is ``officially'' allowed to have several fallback routes
1112 when several routers are present on directly connected networks.
1113 In this case, Linux-
2.2 makes ``dead gateway detection''~
\cite{RFC1122
}
1114 controlled by neighbour unreachability detection and by advice
1115 from transport protocols to select a working router, so the order
1116 of the routes is not essential. However, in this case,
1117 fiddling with default routes manually is not recommended. Use the Router Discovery
1118 protocol (see Appendix~
\ref{EXAMPLE-SETUP
}, p.
\pageref{EXAMPLE-SETUP
})
1119 instead. Actually, Linux-
2.2 IPv6 does not give user level applications
1120 any access to default routes.
1123 Certainly, the steps above are not performed exactly
1124 in this sequence. Instead, the routing table in the kernel is kept
1125 in some data structure to achieve the final result
1126 with minimal cost. However, not depending on a particular
1127 routing algorithm implemented in the kernel, we can summarize
1128 the statements above as: a route is identified by the triplet
1129 \
{prefix, tos, preference\
}. This
{\em key\/
} lets us locate
1130 the route in the routing table.
1132 \paragraph{Route attributes:
} Each route key refers to a routing
1133 information record containing
1134 the data required to deliver IP packets (f.e.\ output device and
1135 next hop router) and some optional attributes (f.e. the path MTU or
1136 the preferred source address when communicating with this destination).
1137 These attributes are described in the following subsection.
1139 \paragraph{Route types:
} \label{IP-ROUTE-TYPES
}
1140 It is important that the set
1141 of required and optional attributes depend on the route
{\em type\/
}.
1142 The most important route type
1143 is
\verb|unicast|. It describes real paths to other hosts.
1144 As a rule, common routing tables contain only such routes. However,
1145 there are other types of routes with different semantics. The
1146 full list of types understood by Linux-
2.2 is:
1148 \item \verb|unicast| --- the route entry describes real paths to the
1149 destinations covered by the route prefix.
1150 \item \verb|unreachable| --- these destinations are unreachable. Packets
1151 are discarded and the ICMP message
{\em host unreachable\/
} is generated.
1152 The local senders get an
\verb|EHOSTUNREACH| error.
1153 \item \verb|blackhole| --- these destinations are unreachable. Packets
1154 are discarded silently. The local senders get an
\verb|EINVAL| error.
1155 \item \verb|prohibit| --- these destinations are unreachable. Packets
1156 are discarded and the ICMP message
{\em communication administratively
1157 prohibited\/
} is generated. The local senders get an
\verb|EACCES| error.
1158 \item \verb|local| --- the destinations are assigned to this
1159 host. The packets are looped back and delivered locally.
1160 \item \verb|broadcast| --- the destinations are broadcast addresses.
1161 The packets are sent as link broadcasts.
1162 \item \verb|throw| --- a special control route used together with policy
1163 rules (see sec.
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}). If such a route is selected, lookup
1164 in this table is terminated pretending that no route was found.
1165 Without policy routing it is equivalent to the absence of the route in the routing
1166 table. The packets are dropped and the ICMP message
{\em net unreachable\/
}
1167 is generated. The local senders get an
\verb|ENETUNREACH| error.
1168 \item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1169 are considered to be dummy (or external) addresses which require translation
1170 to real (or internal) ones before forwarding. The addresses to translate to
1171 are selected with the attribute
\verb|via|. More about NAT is
1172 in Appendix~
\ref{ROUTE-NAT
}, p.
\pageref{ROUTE-NAT
}.
1173 \item \verb|anycast| --- (
{\em not implemented\/
}) the destinations are
1174 {\em anycast\/
} addresses assigned to this host. They are mainly equivalent
1175 to
\verb|local| with one difference: such addresses are invalid when used
1176 as the source address of any packet.
1177 \item \verb|multicast| --- a special type used for multicast routing.
1178 It is not present in normal routing tables.
1181 \paragraph{Route tables:
} Linux-
2.2 can pack routes into several routing
1182 tables identified by a number in the range from
1 to
255 or by
1183 name from the file
\verb|/etc/iproute2/rt_tables|. By default all normal
1184 routes are inserted into the
\verb|main| table (ID
254) and the kernel only uses
1185 this table when calculating routes.
1187 Actually, one other table always exists, which is invisible but
1188 even more important. It is the
\verb|local| table (ID
255). This table
1189 consists of routes for local and broadcast addresses. The kernel maintains
1190 this table automatically and the administrator usually need not modify it
1193 The multiple routing tables enter the game when
{\em policy routing\/
}
1194 is used. See sec.
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}.
1195 In this case, the table identifier effectively becomes
1196 one more parameter, which should be added to the triplet
1197 \
{prefix, tos, preference\
} to uniquely identify the route.
1200 \subsection{{\tt ip route add
} --- add a new route\\
1201 {\tt ip route change
} --- change a route\\
1202 {\tt ip route replace
} --- change a route or add a new one
}
1203 \label{IP-ROUTE-ADD
}
1205 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|change|,
\verb|chg|;
1206 \verb|replace|,
\verb|repl|.
1209 \paragraph{Arguments:
}
1211 \item \verb|to PREFIX| or
\verb|to TYPE PREFIX| (default)
1213 --- the destination prefix of the route. If
\verb|TYPE| is omitted,
1214 \verb|ip| assumes type
\verb|unicast|. Other values of
\verb|TYPE|
1215 are listed above.
\verb|PREFIX| is an IP or IPv6 address optionally followed
1216 by a slash and the prefix length. If the length of the prefix is missing,
1217 \verb|ip| assumes a full-length host route. There is also a special
1218 \verb|PREFIX| ---
\verb|default| --- which is equivalent to IP
\verb|
0/
0| or
1219 to IPv6
\verb|::/
0|.
1221 \item \verb|tos TOS| or
\verb|dsfield TOS|
1223 --- the Type Of Service (TOS) key. This key has no associated mask and
1224 the longest match is understood as: First, compare the TOS
1225 of the route and of the packet. If they are not equal, then the packet
1226 may still match a route with a zero TOS.
\verb|TOS| is either an
8 bit hexadecimal
1227 number or an identifier from
{\tt /etc/iproute2/rt
\_dsfield}.
1230 \item \verb|metric NUMBER| or
\verb|preference NUMBER|
1232 --- the preference value of the route.
\verb|NUMBER| is an arbitrary
32bit number.
1234 \item \verb|table TABLEID|
1236 --- the table to add this route to.
1237 \verb|TABLEID| may be a number or a string from the file
1238 \verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1239 \verb|ip| assumes the
\verb|main| table, with the exception of
1240 \verb|local|,
\verb|broadcast| and
\verb|nat| routes, which are
1241 put into the
\verb|local| table by default.
1243 \item \verb|dev NAME|
1245 --- the output device name.
1247 \item \verb|via ADDRESS|
1249 --- the address of the nexthop router. Actually, the sense of this field depends
1250 on the route type. For normal
\verb|unicast| routes it is either the true nexthop
1251 router or, if it is a direct route installed in BSD compatibility mode,
1252 it can be a local address of the interface.
1253 For NAT routes it is the first address of the block of translated IP destinations.
1255 \item \verb|src ADDRESS|
1257 --- the source address to prefer when sending to the destinations
1258 covered by the route prefix.
1260 \item \verb|realm REALMID|
1262 --- the realm to which this route is assigned.
1263 \verb|REALMID| may be a number or a string from the file
1264 \verb|/etc/iproute2/rt_realms|. Sec.
\ref{RT-REALMS
} (p.
\pageref{RT-REALMS
})
1265 contains more information on realms.
1267 \item \verb|mtu MTU| or
\verb|mtu lock MTU|
1269 --- the MTU along the path to the destination. If the modifier
\verb|lock| is
1270 not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1271 If the modifier
\verb|lock| is used, no path MTU discovery will be tried,
1272 all packets will be sent without the DF bit in IPv4 case
1273 or fragmented to MTU for IPv6.
1275 \item \verb|window NUMBER|
1277 --- the maximal window for TCP to advertise to these destinations,
1278 measured in bytes. It limits maximal data bursts that our TCP
1279 peers are allowed to send to us.
1281 \item \verb|rtt NUMBER|
1283 --- the initial RTT (``Round Trip Time'') estimate.
1286 \item \verb|rttvar NUMBER|
1288 ---
\threeonly the initial RTT variance estimate.
1291 \item \verb|ssthresh NUMBER|
1293 ---
\threeonly an estimate for the initial slow start threshold.
1296 \item \verb|cwnd NUMBER|
1298 ---
\threeonly the clamp for congestion window. It is ignored if the
\verb|lock|
1302 \item \verb|advmss NUMBER|
1304 ---
\threeonly the MSS (``Maximal Segment Size'') to advertise to these
1305 destinations when establishing TCP connections. If it is not given,
1306 Linux uses a default value calculated from the first hop device MTU.
1309 If the path to these destination is asymmetric, this guess may be wrong.
1312 \item \verb|reordering NUMBER|
1314 ---
\threeonly Maximal reordering on the path to this destination.
1315 If it is not given, Linux uses the value selected with
\verb|sysctl|
1316 variable
\verb|net/ipv4/tcp_reordering|.
1320 \item \verb|nexthop NEXTHOP|
1322 --- the nexthop of a multipath route.
\verb|NEXTHOP| is a complex value
1323 with its own syntax similar to the top level argument lists:
1325 \item \verb|via ADDRESS| is the nexthop router.
1326 \item \verb|dev NAME| is the output device.
1327 \item \verb|weight NUMBER| is a weight for this element of a multipath
1328 route reflecting its relative bandwidth or quality.
1331 \item \verb|scope SCOPE_VAL|
1333 --- the scope of the destinations covered by the route prefix.
1334 \verb|SCOPE_VAL| may be a number or a string from the file
1335 \verb|/etc/iproute2/rt_scopes|.
1336 If this parameter is omitted,
1337 \verb|ip| assumes scope
\verb|global| for all gatewayed
\verb|unicast|
1338 routes, scope
\verb|link| for direct
\verb|unicast| and
\verb|broadcast| routes
1339 and scope
\verb|host| for
\verb|local| routes.
1341 \item \verb|protocol RTPROTO|
1343 --- the routing protocol identifier of this route.
1344 \verb|RTPROTO| may be a number or a string from the file
1345 \verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1346 not given,
\verb|ip| assumes protocol
\verb|boot| (i.e.\
1347 it assumes the route was added by someone who doesn't
1348 understand what they are doing). Several protocol values have a fixed interpretation.
1351 \item \verb|redirect| --- the route was installed due to an ICMP redirect.
1352 \item \verb|kernel| --- the route was installed by the kernel during
1354 \item \verb|boot| --- the route was installed during the bootup sequence.
1355 If a routing daemon starts, it will purge all of them.
1356 \item \verb|static| --- the route was installed by the administrator
1357 to override dynamic routing. Routing daemon will respect them
1358 and, probably, even advertise them to its peers.
1359 \item \verb|ra| --- the route was installed by Router Discovery protocol.
1361 The rest of the values are not reserved and the administrator is free
1362 to assign (or not to assign) protocol tags. At least, routing
1363 daemons should take care of setting some unique protocol values,
1364 f.e.\ as they are assigned in
\verb|rtnetlink.h| or in
\verb|rt_protos|
1370 --- pretend that the nexthop is directly attached to this link,
1371 even if it does not match any interface prefix. One application of this
1372 option may be found in~
\cite{IP-TUNNELS
}.
1374 \item \verb|equalize|
1376 --- allow packet by packet randomization on multipath routes.
1377 Without this modifier, the route will be frozen to one selected
1378 nexthop, so that load splitting will only occur on per-flow base.
1379 \verb|equalize| only works if the kernel is patched.
1386 Actually there are more commands:
\verb|prepend| does the same
1387 thing as classic
\verb|route add|, i.e.\ adds a route, even if another
1388 route to the same destination exists. Its opposite case is
\verb|append|,
1389 which adds the route to the end of the list. Avoid these
1393 More sad news, IPv6 only understands the
\verb|append| command correctly.
1394 All the others are translated into
\verb|append| commands. Certainly,
1395 this will change in the future.
1398 \paragraph{Examples:
}
1400 \item add a plain route to network
10.0.0/
24 via gateway
193.233.7.65
1402 ip route add
10.0.0/
24 via
193.233.7.65
1404 \item change it to a direct route via the
\verb|dummy| device
1406 ip ro chg
10.0.0/
24 dev dummy
1408 \item add a default multipath route splitting the load between
\verb|ppp0|
1411 ip route add default scope global nexthop dev ppp0 \
1414 Note the scope value. It is not necessary but it informs the kernel
1415 that this route is gatewayed rather than direct. Actually, if you
1416 know the addresses of remote endpoints it would be better to use the
1417 \verb|via| parameter.
1418 \item announce that the address
192.203.80.144 is not a real one, but
1419 should be translated to
193.233.7.83 before forwarding
1421 ip route add nat
192.203.80.144 via
193.233.7.83
1423 Backward translation is setup with policy rules described
1424 in the following section (sec.
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}).
1427 \subsection{{\tt ip route delete
} --- delete a route
}
1429 \paragraph{Abbreviations:
} \verb|delete|,
\verb|del|,
\verb|d|.
1431 \paragraph{Arguments:
} \verb|ip route del| has the same arguments as
1432 \verb|ip route add|, but their semantics are a bit different.
1434 Key values (
\verb|to|,
\verb|tos|,
\verb|preference| and
\verb|table|)
1435 select the route to delete. If optional attributes are present,
\verb|ip|
1436 verifies that they coincide with the attributes of the route to delete.
1437 If no route with the given key and attributes was found,
\verb|ip route del|
1440 Linux-
2.0 had the option to delete a route selected only by prefix address,
1441 ignoring its length (i.e.\ netmask). This option no longer exists
1442 because it was ambiguous. However, look at
{\tt ip route flush
}
1443 (sec.
\ref{IP-ROUTE-FLUSH
}, p.
\pageref{IP-ROUTE-FLUSH
}) which
1444 provides similar and even richer functionality.
1447 \paragraph{Example:
}
1449 \item delete the multipath route created by the command in previous subsection
1451 ip route del default scope global nexthop dev ppp0 \
1458 \subsection{{\tt ip route show
} --- list routes
}
1460 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
1462 \paragraph{Description:
} the command displays the contents of the routing tables
1463 or the route(s) selected by some criteria.
1466 \paragraph{Arguments:
}
1468 \item \verb|to SELECTOR| (default)
1470 --- only select routes from the given range of destinations.
\verb|SELECTOR|
1471 consists of an optional modifier (
\verb|root|,
\verb|match| or
\verb|exact|)
1472 and a prefix.
\verb|root PREFIX| selects routes with prefixes not shorter
1473 than
\verb|PREFIX|. F.e.\
\verb|root
0/
0| selects the entire routing table.
1474 \verb|match PREFIX| selects routes with prefixes not longer than
1475 \verb|PREFIX|. F.e.\
\verb|match
10.0/
16| selects
\verb|
10.0/
16|,
1476 \verb|
10/
8| and
\verb|
0/
0|, but it does not select
\verb|
10.1/
16| and
1477 \verb|
10.0.0/
24|. And
\verb|exact PREFIX| (or just
\verb|PREFIX|)
1478 selects routes with this exact prefix. If neither of these options
1479 are present,
\verb|ip| assumes
\verb|root
0/
0| i.e.\ it lists the entire table.
1482 \item \verb|tos TOS| or
\verb|dsfield TOS|
1484 --- only select routes with the given TOS.
1487 \item \verb|table TABLEID|
1489 --- show the routes from this table(s). The default setting is to show
1490 \verb|table|
\verb|main|.
\verb|TABLEID| may either be the ID of a real table
1491 or one of the special values:
1493 \item \verb|all| --- list all of the tables.
1494 \item \verb|cache| --- dump the routing cache.
1497 IPv6 has a single table. However, splitting it into
\verb|main|,
\verb|local|
1498 and
\verb|cache| is emulated by the
\verb|ip| utility.
1501 \item \verb|cloned| or
\verb|cached|
1503 --- list cloned routes i.e.\ routes which were dynamically forked from
1504 other routes because some route attribute (f.e.\ MTU) was updated.
1505 Actually, it is equivalent to
\verb|table cache|.
1507 \item \verb|from SELECTOR|
1509 --- the same syntax as for
\verb|to|, but it binds the source address range
1510 rather than destinations. Note that the
\verb|from| option only works with
1513 \item \verb|protocol RTPROTO|
1515 --- only list routes of this protocol.
1518 \item \verb|scope SCOPE_VAL|
1520 --- only list routes with this scope.
1522 \item \verb|type TYPE|
1524 --- only list routes of this type.
1526 \item \verb|dev NAME|
1528 --- only list routes going via this device.
1530 \item \verb|via PREFIX|
1532 --- only list routes going via the nexthop routers selected by
\verb|PREFIX|.
1534 \item \verb|src PREFIX|
1536 --- only list routes with preferred source addresses selected
1539 \item \verb|realm REALMID| or
\verb|realms FROMREALM/TOREALM|
1541 --- only list routes with these realms.
1545 \paragraph{Examples:
} Let us count routes of protocol
\verb|gated/bgp|
1548 kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1552 To count the size of the routing cache, we have to use the
\verb|-o| option
1553 because cached attributes can take more than one line of output:
1555 kuznet@amber:~ $ ip -o ro ls cloned | wc
1561 \paragraph{Output format:
} The output of this command consists
1562 of per route records separated by line feeds.
1563 However, some records may consist
1564 of more than one line: particularly, this is the case when the route
1565 is cloned or you requested additional statistics. If the
1566 \verb|-o| option was given, then line feeds separating lines inside
1567 records are replaced with the backslash sign.
1569 The output has the same syntax as arguments given to
{\tt ip route add
},
1570 so that it can be understood easily. F.e.\
1572 kuznet@amber:~ $ ip ro ls
193.233.7/
24
1573 193.233.7.0/
24 dev eth0 proto gated/conn scope link \
1574 src
193.233.7.65 realms inr.ac
1578 If you list cloned entries, the output contains other attributes which
1579 are evaluated during route calculation and updated during route
1580 lifetime. An example of the output is:
1582 kuznet@amber:~ $ ip ro ls
193.233.7.82 tab cache
1583 193.233.7.82 from
193.233.7.82 dev eth0 src
193.233.7.65 \
1584 realms inr.ac/inr.ac
1585 cache <src-direct,redirect> mtu
1500 rtt
300 iif eth0
1586 193.233.7.82 dev eth0 src
193.233.7.65 realms inr.ac
1587 cache mtu
1500 rtt
300
1591 \label{NB-strange-route
}
1592 The route looks a bit strange, doesn't it? Did you notice that
1593 it is a path from
193.233.7.82 back to
193.233.82? Well, you will
1594 see in the section on
\verb|ip route get| (p.
\pageref{NB-nature-of-strangeness
})
1597 The second line, starting with the word
\verb|cache|, shows
1598 additional attributes which normal routes do not possess.
1599 Cached flags are summarized in angle brackets:
1601 \item \verb|local| --- packets are delivered locally.
1602 It stands for loopback unicast routes, for broadcast routes
1603 and for multicast routes, if this host is a member of the corresponding
1606 \item \verb|reject| --- the path is bad. Any attempt to use it results
1607 in an error. See attribute
\verb|error| below (p.
\pageref{IP-ROUTE-GET-error
}).
1609 \item \verb|mc| --- the destination is multicast.
1611 \item \verb|brd| --- the destination is broadcast.
1613 \item \verb|src-direct| --- the source is on a directly connected
1616 \item \verb|redirected| --- the route was created by an ICMP Redirect.
1618 \item \verb|redirect| --- packets going via this route will
1619 trigger an ICMP redirect.
1621 \item \verb|fastroute| --- the route is eligible to be used for fastroute.
1623 \item \verb|equalize| --- make packet by packet randomization
1626 \item \verb|dst-nat| --- the destination address requires translation.
1628 \item \verb|src-nat| --- the source address requires translation.
1630 \item \verb|masq| --- the source address requires masquerading.
1631 This feature disappeared in linux-
2.4.
1633 \item \verb|notify| --- (
{\em not implemented
}) change/deletion
1634 of this route will trigger RTNETLINK notification.
1637 Then some optional attributes follow:
1639 \item \verb|error| --- on
\verb|reject| routes it is error code
1640 returned to local senders when they try to use this route.
1641 These error codes are translated into ICMP error codes, sent to remote
1642 senders, according to the rules described above in the subsection
1643 devoted to route types (p.
\pageref{IP-ROUTE-TYPES
}).
1644 \label{IP-ROUTE-GET-error
}
1646 \item \verb|expires| --- this entry will expire after this timeout.
1648 \item \verb|iif| --- the packets for this path are expected to arrive
1652 \paragraph{Statistics:
} With the
\verb|-statistics| option, more
1653 information about this route is shown:
1655 \item \verb|users| --- the number of users of this entry.
1656 \item \verb|age| --- shows when this route was last used.
1657 \item \verb|used| --- the number of lookups of this route since its creation.
1661 \subsection{{\tt ip route flush
} --- flush routing tables
}
1662 \label{IP-ROUTE-FLUSH
}
1664 \paragraph{Abbreviations:
} \verb|flush|,
\verb|f|.
1666 \paragraph{Description:
} this command flushes routes selected
1669 \paragraph{Arguments:
} the arguments have the same syntax and semantics
1670 as the arguments of
\verb|ip route show|, but routing tables are not
1671 listed but purged. The only difference is the default action:
\verb|show|
1672 dumps all the IP main routing table but
\verb|flush| prints the helper page.
1673 The reason for this difference does not require any explanation, does it?
1676 \paragraph{Statistics:
} With the
\verb|-statistics| option, the command
1677 becomes verbose. It prints out the number of deleted routes and the number
1678 of rounds made to flush the routing table. If the option is given
1679 twice,
\verb|ip route flush| also dumps all the deleted routes
1680 in the format described in the previous subsection.
1682 \paragraph{Examples:
} The first example flushes all the
1683 gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1685 netadm@amber:~ # ip -
4 ro flush scope global type unicast
1687 This option deserves to be put into a scriptlet
\verb|routef|.
1689 This option was described in the
\verb|route(
8)| man page borrowed
1690 from BSD, but was never implemented in Linux.
1693 The second example flushes all IPv6 cloned routes:
1695 netadm@amber:~ # ip -
6 -s -s ro flush cache
1696 3ffe:
2400::
220:afff:fef4:c5d1 via
3ffe:
2400::
220:afff:fef4:c5d1 \
1698 cache used
2 age
12sec mtu
1500 rtt
300
1699 3ffe:
2400::
280:adff:feb7:
8034 via
3ffe:
2400::
280:adff:feb7:
8034 \
1701 cache used
2 age
15sec mtu
1500 rtt
300
1702 3ffe:
2400::
280:c8ff:fe59:
5bcc via
3ffe:
2400::
280:c8ff:fe59:
5bcc \
1704 cache users
1 used
1 age
23sec mtu
1500 rtt
300
1705 3ffe:
2400:
0:
1:
2a0:ccff:fe66:
1878 via
3ffe:
2400:
0:
1:
2a0:ccff:fe66:
1878 \
1707 cache used
2 age
20sec mtu
1500 rtt
300
1708 3ffe:
2400:
0:
1:a00:
20ff:fe71:fb30 via
3ffe:
2400:
0:
1:a00:
20ff:fe71:fb30 \
1710 cache used
2 age
33sec mtu
1500 rtt
300
1711 ff02::
1 via ff02::
1 dev eth1 metric
0
1712 cache users
1 used
1 age
45sec mtu
1500 rtt
300
1714 *** Round
1, deleting
6 entries ***
1715 *** Flush is complete after
1 round ***
1716 netadm@amber:~ # ip -
6 -s -s ro flush cache
1721 The third example flushes BGP routing tables after a
\verb|gated|
1724 netadm@amber:~ # ip ro ls proto gated/bgp | wc
1726 netadm@amber:~ # ip -s ro f proto gated/bgp
1728 *** Round
1, deleting
1408 entries ***
1729 *** Flush is complete after
1 round ***
1730 netadm@amber:~ # ip ro f proto gated/bgp
1732 netadm@amber:~ # ip ro ls proto gated/bgp
1737 \subsection{{\tt ip route get
} --- get a single route
}
1738 \label{IP-ROUTE-GET
}
1740 \paragraph{Abbreviations:
} \verb|get|,
\verb|g|.
1742 \paragraph{Description:
} this command gets a single route to a destination
1743 and prints its contents exactly as the kernel sees it.
1745 \paragraph{Arguments:
}
1747 \item \verb|to ADDRESS| (default)
1749 --- the destination address.
1751 \item \verb|from ADDRESS|
1753 --- the source address.
1755 \item \verb|tos TOS| or
\verb|dsfield TOS|
1757 --- the Type Of Service.
1759 \item \verb|iif NAME|
1761 --- the device from which this packet is expected to arrive.
1763 \item \verb|oif NAME|
1765 --- force the output device on which this packet will be routed.
1767 \item \verb|connected|
1769 --- if no source address (option
\verb|from|) was given, relookup
1770 the route with the source set to the preferred address received from the first lookup.
1771 If policy routing is used, it may be a different route.
1775 Note that this operation is not equivalent to
\verb|ip route show|.
1776 \verb|show| shows existing routes.
\verb|get| resolves them and
1777 creates new clones if necessary. Essentially,
\verb|get|
1778 is equivalent to sending a packet along this path.
1779 If the
\verb|iif| argument is not given, the kernel creates a route
1780 to output packets towards the requested destination.
1781 This is equivalent to pinging the destination
1782 with a subsequent
{\tt ip route ls cache
}, however, no packets are
1783 actually sent. With the
\verb|iif| argument, the kernel pretends
1784 that a packet arrived from this interface and searches for
1785 a path to forward the packet.
1787 \paragraph{Output format:
} This command outputs routes in the same
1788 format as
\verb|ip route ls|.
1790 \paragraph{Examples:
}
1792 \item Find a route to output packets to
193.233.7.82:
1794 kuznet@amber:~ $ ip route get
193.233.7.82
1795 193.233.7.82 dev eth0 src
193.233.7.65 realms inr.ac
1796 cache mtu
1500 rtt
300
1800 \item Find a route to forward packets arriving on
\verb|eth0|
1801 from
193.233.7.82 and destined for
193.233.7.82:
1803 kuznet@amber:~ $ ip r g
193.233.7.82 from
193.233.7.82 iif eth0
1804 193.233.7.82 from
193.233.7.82 dev eth0 src
193.233.7.65 \
1805 realms inr.ac/inr.ac
1806 cache <src-direct,redirect> mtu
1500 rtt
300 iif eth0
1810 \label{NB-nature-of-strangeness
}
1811 This is the command that created the funny route from
193.233.7.82
1812 looped back to
193.233.7.82 (cf.\ NB on~p.
\pageref{NB-strange-route
}).
1813 Note the
\verb|redirect| flag on it.
1816 \item Find a multicast route for packets arriving on
\verb|eth0|
1817 from host
193.233.7.82 and destined for multicast group
224.2.127.254
1818 (it is assumed that a multicast routing daemon is running.
1819 In this case, it is
\verb|pimd|)
1821 kuznet@amber:~ $ ip r g
224.2.127.254 from
193.233.7.82 iif eth0
1822 multicast
224.2.127.254 from
193.233.7.82 dev lo \
1823 src
193.233.7.65 realms inr.ac/cosmos
1824 cache <mc> iif eth0 Oifs: eth1 pimreg
1827 This route differs from the ones seen before. It contains a ``normal'' part
1828 and a ``multicast'' part. The normal part is used to deliver (or not to
1829 deliver) the packet to local IP listeners. In this case the router
1831 of this group, so that route has no
\verb|local| flag and only
1832 forwards packets. The output device for such entries is always loopback.
1833 The multicast part consists of an additional
\verb|Oifs:| list showing
1834 the output interfaces.
1838 It is time for a more complicated example. Let us add an invalid
1839 gatewayed route for a destination which is really directly connected:
1841 netadm@alisa:~ # ip route add
193.233.7.98 via
193.233.7.254
1842 netadm@alisa:~ # ip route get
193.233.7.98
1843 193.233.7.98 via
193.233.7.254 dev eth0 src
193.233.7.90
1844 cache mtu
1500 rtt
3072
1847 and probe it with ping:
1849 netadm@alisa:~ # ping -n
193.233.7.98
1850 PING
193.233.7.98 (
193.233.7.98) from
193.233.7.90 :
56 data bytes
1851 From
193.233.7.254: Redirect Host(New nexthop:
193.233.7.98)
1852 64 bytes from
193.233.7.98: icmp_seq=
0 ttl=
255 time=
3.5 ms
1853 From
193.233.7.254: Redirect Host(New nexthop:
193.233.7.98)
1854 64 bytes from
193.233.7.98: icmp_seq=
1 ttl=
255 time=
2.2 ms
1855 64 bytes from
193.233.7.98: icmp_seq=
2 ttl=
255 time=
0.4 ms
1856 64 bytes from
193.233.7.98: icmp_seq=
3 ttl=
255 time=
0.4 ms
1857 64 bytes from
193.233.7.98: icmp_seq=
4 ttl=
255 time=
0.4 ms
1859 ---
193.233.7.98 ping statistics ---
1860 5 packets transmitted,
5 packets received,
0% packet loss
1861 round-trip min/avg/max =
0.4/
1.3/
3.5 ms
1864 What happened? Router
193.233.7.254 understood that we have a much
1865 better path to the destination and sent us an ICMP redirect message.
1866 We may retry
\verb|ip route get| to see what we have in the routing
1869 netadm@alisa:~ # ip route get
193.233.7.98
1870 193.233.7.98 dev eth0 src
193.233.7.90
1871 cache <redirected> mtu
1500 rtt
3072
1877 \section{{\tt ip rule
} --- routing policy database management
}
1880 \paragraph{Abbreviations:
} \verb|rule|,
\verb|ru|.
1882 \paragraph{Object:
} \verb|rule|s in the routing policy database control
1883 the route selection algorithm.
1885 Classic routing algorithms used in the Internet make routing decisions
1886 based only on the destination address of packets (and in theory,
1887 but not in practice, on the TOS field). The seminal review of classic
1888 routing algorithms and their modifications can be found in~
\cite{RFC1812
}.
1890 In some circumstances we want to route packets differently depending not only
1891 on destination addresses, but also on other packet fields: source address,
1892 IP protocol, transport protocol ports or even packet payload.
1893 This task is called ``policy routing''.
1896 ``policy routing'' $
\neq$ ``routing policy''.
1898 \noindent ``policy routing'' $=$ ``cunning routing''.
1900 \noindent ``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1903 To solve this task, the conventional destination based routing table, ordered
1904 according to the longest match rule, is replaced with a ``routing policy
1905 database'' (or RPDB), which selects routes
1906 by executing some set of rules. The rules may have lots of keys of different
1907 natures and therefore they have no natural ordering, but one imposed
1908 by the administrator. Linux-
2.2 RPDB is a linear list of rules
1909 ordered by numeric priority value.
1910 RPDB explicitly allows matching a few packet fields:
1913 \item packet source address.
1914 \item packet destination address.
1916 \item incoming interface (which is packet metadata, rather than a packet field).
1919 Matching IP protocols and transport ports is also possible,
1920 indirectly, via
\verb|ipchains|, by exploiting their ability
1921 to mark some classes of packets with
\verb|fwmark|. Therefore,
1922 \verb|fwmark| is also included in the set of keys checked by rules.
1924 Each policy routing rule consists of a
{\em selector\/
} and an
{\em action\/
}
1925 predicate. The RPDB is scanned in the order of increasing priority. The selector
1926 of each rule is applied to \
{source address, destination address, incoming
1927 interface, tos, fwmark\
} and, if the selector matches the packet,
1928 the action is performed. The action predicate may return with success.
1929 In this case, it will either give a route or failure indication
1930 and the RPDB lookup is terminated. Otherwise, the RPDB program
1931 continues on the next rule.
1933 What is the action, semantically? The natural action is to select the
1934 nexthop and the output device. This is what
1935 Cisco IOS~
\cite{IOS
} does. Let us call it ``match \& set''.
1936 The Linux-
2.2 approach is more flexible. The action includes
1937 lookups in destination-based routing tables and selecting
1938 a route from these tables according to the classic longest match algorithm.
1939 The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1940 when a second level routing table contains a single default route.
1941 Recall that Linux-
2.2 supports multiple tables
1942 managed with the
\verb|ip route| command, described in the previous section.
1944 At startup time the kernel configures the default RPDB consisting of three
1948 \item Priority:
0, Selector: match anything, Action: lookup routing
1949 table
\verb|local| (ID
255).
1950 The
\verb|local| table is a special routing table containing
1951 high priority control routes for local and broadcast addresses.
1953 Rule
0 is special. It cannot be deleted or overridden.
1956 \item Priority:
32766, Selector: match anything, Action: lookup routing
1957 table
\verb|main| (ID
254).
1958 The
\verb|main| table is the normal routing table containing all non-policy
1959 routes. This rule may be deleted and/or overridden with other
1960 ones by the administrator.
1962 \item Priority:
32767, Selector: match anything, Action: lookup routing
1963 table
\verb|default| (ID
253).
1964 The
\verb|default| table is empty. It is reserved for some
1965 post-processing if no previous default rules selected the packet.
1966 This rule may also be deleted.
1970 Do not confuse routing tables with rules: rules point to routing tables,
1971 several rules may refer to one routing table and some routing tables
1972 may have no rules pointing to them. If the administrator deletes all the rules
1973 referring to a table, the table is not used, but it still exists
1974 and will disappear only after all the routes contained in it are deleted.
1977 \paragraph{Rule attributes:
} Each RPDB entry has additional
1978 attributes. F.e.\ each rule has a pointer to some routing
1979 table. NAT and masquerading rules have an attribute to select new IP
1980 address to translate/masquerade. Besides that, rules have some
1981 optional attributes, which routes have, namely
\verb|realms|.
1982 These values do not override those contained in the routing tables. They
1983 are only used if the route did not select any attributes.
1986 \paragraph{Rule types:
} The RPDB may contain rules of the following
1989 \item \verb|unicast| --- the rule prescribes to return the route found
1990 in the routing table referenced by the rule.
1991 \item \verb|blackhole| --- the rule prescribes to silently drop the packet.
1992 \item \verb|unreachable| --- the rule prescribes to generate a ``Network
1993 is unreachable'' error.
1994 \item \verb|prohibit| --- the rule prescribes to generate
1995 ``Communication is administratively prohibited'' error.
1996 \item \verb|nat| --- the rule prescribes to translate the source address
1997 of the IP packet into some other value. More about NAT is
1998 in Appendix~
\ref{ROUTE-NAT
}, p.
\pageref{ROUTE-NAT
}.
2002 \paragraph{Commands:
} \verb|add|,
\verb|delete| and
\verb|show|
2005 \subsection{{\tt ip rule add
} --- insert a new rule\\
2006 {\tt ip rule delete
} --- delete a rule
}
2009 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|delete|,
\verb|del|,
2012 \paragraph{Arguments:
}
2015 \item \verb|type TYPE| (default)
2017 --- the type of this rule. The list of valid types was given in the previous
2020 \item \verb|from PREFIX|
2022 --- select the source prefix to match.
2024 \item \verb|to PREFIX|
2026 --- select the destination prefix to match.
2028 \item \verb|iif NAME|
2030 --- select the incoming device to match. If the interface is loopback,
2031 the rule only matches packets originating from this host. This means that you
2032 may create separate routing tables for forwarded and local packets and,
2033 hence, completely segregate them.
2035 \item \verb|tos TOS| or
\verb|dsfield TOS|
2037 --- select the TOS value to match.
2039 \item \verb|fwmark MARK|
2041 --- select the
\verb|fwmark| value to match.
2043 \item \verb|priority PREFERENCE|
2045 --- the priority of this rule. Each rule should have an explicitly
2046 set
{\em unique\/
} priority value.
2048 Really, for historical reasons
\verb|ip rule add| does not require a
2049 priority value and allows them to be non-unique.
2050 If the user does not supplied a priority, it is selected by the kernel.
2051 If the user creates a rule with a priority value that
2052 already exists, the kernel does not reject the request. It adds
2053 the new rule before all old rules of the same priority.
2055 It is mistake in design, no more. And it will be fixed one day,
2056 so do not rely on this feature. Use explicit priorities.
2060 \item \verb|table TABLEID|
2062 --- the routing table identifier to lookup if the rule selector matches.
2064 \item \verb|realms FROM/TO|
2066 --- Realms to select if the rule matched and the routing table lookup
2067 succeeded. Realm
\verb|TO| is only used if the route did not select
2070 \item \verb|nat ADDRESS|
2072 --- The base of the IP address block to translate (for source addresses).
2073 The
\verb|ADDRESS| may be either the start of the block of NAT addresses
2074 (selected by NAT routes) or in linux-
2.2 a local host address (or even zero).
2075 In the last case the router does not translate the packets,
2076 but masquerades them to this address; this feature disappered in
2.4.
2077 More about NAT is in Appendix~
\ref{ROUTE-NAT
},
2078 p.
\pageref{ROUTE-NAT
}.
2082 \paragraph{Warning:
} Changes to the RPDB made with these commands
2083 do not become active immediately. It is assumed that after
2084 a script finishes a batch of updates, it flushes the routing cache
2085 with
\verb|ip route flush cache|.
2087 \paragraph{Examples:
}
2089 \item Route packets with source addresses from
192.203.80/
24
2090 according to routing table
\verb|inr.ruhep|:
2092 ip ru add from
192.203.80.0/
24 table inr.ruhep prio
220
2095 \item Translate packet source address
193.233.7.83 into
192.203.80.144
2096 and route it according to table \
#1 (actually, it is
\verb|inr.ruhep|):
2098 ip ru add from
193.233.7.83 nat
192.203.80.144 table
1 prio
320
2101 \item Delete the unused default rule:
2103 ip ru del prio
32767
2110 \subsection{{\tt ip rule show
} --- list rules
}
2111 \label{IP-RULE-SHOW
}
2113 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2116 \paragraph{Arguments:
} Good news, this is one command that has no arguments.
2118 \paragraph{Output format:
}
2121 kuznet@amber:~ $ ip ru ls
2122 0: from all lookup local
2123 200: from
192.203.80.0/
24 to
193.233.7.0/
24 lookup main
2124 210: from
192.203.80.0/
24 to
192.203.80.0/
24 lookup main
2125 220: from
192.203.80.0/
24 lookup inr.ruhep realms inr.ruhep/radio-msu
2126 300: from
193.233.7.83 to
193.233.7.0/
24 lookup main
2127 310: from
193.233.7.83 to
192.203.80.0/
24 lookup main
2128 320: from
193.233.7.83 lookup inr.ruhep map-to
192.203.80.144
2129 32766: from all lookup main
2133 In the first column is the rule priority value followed
2134 by a colon. Then the selectors follow. Each key is prefixed
2135 with the same keyword that was used to create the rule.
2137 The keyword
\verb|lookup| is followed by a routing table identifier,
2138 as it is recorded in the file
\verb|/etc/iproute2/rt_tables|.
2140 If the rule does NAT (f.e.\ rule \
#320), it is shown by the keyword
2141 \verb|map-to| followed by the start of the block of addresses to map.
2143 The sense of this example is pretty simple. The prefixes
2144 192.203.80.0/
24 and
193.233.7.0/
24 form the internal network, but
2145 they are routed differently when the packets leave it.
2146 Besides that, the host
193.233.7.83 is translated into
2147 another prefix to look like
192.203.80.144 when talking
2152 \section{{\tt ip maddress
} --- multicast addresses management
}
2155 \paragraph{Object:
} \verb|maddress| objects are multicast addresses.
2157 \paragraph{Commands:
} \verb|add|,
\verb|delete|,
\verb|show| (or
\verb|list|).
2159 \subsection{{\tt ip maddress show
} --- list multicast addresses
}
2161 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2163 \paragraph{Arguments:
}
2167 \item \verb|dev NAME| (default)
2169 --- the device name.
2173 \paragraph{Output format:
}
2176 kuznet@alisa:~ $ ip maddr ls dummy
2178 link
33:
33:
00:
00:
00:
01
2179 link
01:
00:
5e:
00:
00:
01
2180 inet
224.0.0.1 users
2
2185 The first line of the output shows the interface index and its name.
2186 Then the multicast address list follows. Each line starts with the
2187 protocol identifier. The word
\verb|link| denotes a link layer
2188 multicast addresses.
2190 If a multicast address has more than one user, the number
2191 of users is shown after the
\verb|users| keyword.
2193 One additional feature not present in the example above
2194 is the
\verb|static| flag, which indicates that the address was joined
2195 with
\verb|ip maddr add|. See the following subsection.
2199 \subsection{{\tt ip maddress add
} --- add a multicast address\\
2200 {\tt ip maddress delete
} --- delete a multicast address
}
2202 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|delete|,
\verb|del|,
\verb|d|.
2204 \paragraph{Description:
} these commands attach/detach
2205 a static link layer multicast address to listen on the interface.
2206 Note that it is impossible to join protocol multicast groups
2207 statically. This command only manages link layer addresses.
2210 \paragraph{Arguments:
}
2213 \item \verb|address LLADDRESS| (default)
2215 --- the link layer multicast address.
2217 \item \verb|dev NAME|
2219 --- the device to join/leave this multicast address.
2224 \paragraph{Example:
} Let us continue with the example from the previous subsection.
2227 netadm@alisa:~ # ip maddr add
33:
33:
00:
00:
00:
01 dev dummy
2228 netadm@alisa:~ # ip -
0 maddr ls dummy
2230 link
33:
33:
00:
00:
00:
01 users
2 static
2231 link
01:
00:
5e:
00:
00:
01
2232 netadm@alisa:~ # ip maddr del
33:
33:
00:
00:
00:
01 dev dummy
2236 Neither
\verb|ip| nor the kernel check for multicast address validity.
2237 Particularly, this means that you can try to load a unicast address
2238 instead of a multicast address. Most drivers will ignore such addresses,
2239 but several (f.e.\ Tulip) will intern it to their on-board filter.
2240 The effects may be strange. Namely, the addresses become additional
2241 local link addresses and, if you loaded the address of another host
2242 to the router, wait for duplicated packets on the wire.
2243 It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2244 This feature is really more useful for traffic monitoring, but using it
2245 with Linux-
2.2 you
{\em have to\/
} be sure that the host is not
2246 a router and, especially, that it is not a transparent proxy or masquerading
2252 \section{{\tt ip mroute
} --- multicast routing cache management
}
2255 \paragraph{Abbreviations:
} \verb|mroute|,
\verb|mr|.
2257 \paragraph{Object:
} \verb|mroute| objects are multicast routing cache
2258 entries created by a user level mrouting daemon
2259 (f.e.\
\verb|pimd| or
\verb|mrouted|).
2261 Due to the limitations of the current interface to the multicast routing
2262 engine, it is impossible to change
\verb|mroute| objects administratively,
2263 so we may only display them. This limitation will be removed
2266 \paragraph{Commands:
} \verb|show| (or
\verb|list|).
2269 \subsection{{\tt ip mroute show
} --- list mroute cache entries
}
2271 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2273 \paragraph{Arguments:
}
2276 \item \verb|to PREFIX| (default)
2278 --- the prefix selecting the destination multicast addresses to list.
2281 \item \verb|iif NAME|
2283 --- the interface on which multicast packets are received.
2286 \item \verb|from PREFIX|
2288 --- the prefix selecting the IP source addresses of the multicast route.
2293 \paragraph{Output format:
}
2296 kuznet@amber:~ $ ip mroute ls
2297 (
193.232.127.6,
224.0.1.39) Iif: unresolved
2298 (
193.232.244.34,
224.0.1.40) Iif: unresolved
2299 (
193.233.7.65,
224.66.66.66) Iif: eth0 Oifs: pimreg
2303 Each line shows one (S,G) entry in the multicast routing cache,
2304 where S is the source address and G is the multicast group.
\verb|Iif| is
2305 the interface on which multicast packets are expected to arrive.
2306 If the word
\verb|unresolved| is there instead of the interface name,
2307 it means that the routing daemon still hasn't resolved this entry.
2308 The keyword
\verb|oifs| is followed by a list of output interfaces, separated
2309 by spaces. If a multicast routing entry is created with non-trivial
2310 TTL scope, administrative distances are appended to the device names
2311 in the
\verb|oifs| list.
2313 \paragraph{Statistics:
} The
\verb|-statistics| option also prints the
2314 number of packets and bytes forwarded along this route and
2315 the number of packets that arrived on the wrong interface, if this number is not zero.
2318 kuznet@amber:~ $ ip -s mr ls
224.66/
16
2319 (
193.233.7.65,
224.66.66.66) Iif: eth0 Oifs: pimreg
2320 9383 packets,
300256 bytes
2325 \section{{\tt ip tunnel
} --- tunnel configuration
}
2328 \paragraph{Abbreviations:
} \verb|tunnel|,
\verb|tunl|.
2330 \paragraph{Object:
} \verb|tunnel| objects are tunnels, encapsulating
2331 packets in IPv4 packets and then sending them over the IP infrastructure.
2333 \paragraph{Commands:
} \verb|add|,
\verb|delete|,
\verb|change|,
\verb|show|
2336 \paragraph{See also:
} A more informal discussion of tunneling
2337 over IP and the
\verb|ip tunnel| command can be found in~
\cite{IP-TUNNELS
}.
2339 \subsection{{\tt ip tunnel add
} --- add a new tunnel\\
2340 {\tt ip tunnel change
} --- change an existing tunnel\\
2341 {\tt ip tunnel delete
} --- destroy a tunnel
}
2343 \paragraph{Abbreviations:
} \verb|add|,
\verb|a|;
\verb|change|,
\verb|chg|;
2344 \verb|delete|,
\verb|del|,
\verb|d|.
2347 \paragraph{Arguments:
}
2351 \item \verb|name NAME| (default)
2353 --- select the tunnel device name.
2355 \item \verb|mode MODE|
2357 --- set the tunnel mode. Three modes are currently available:
2358 \verb|ipip|,
\verb|sit| and
\verb|gre|.
2360 \item \verb|remote ADDRESS|
2362 --- set the remote endpoint of the tunnel.
2364 \item \verb|local ADDRESS|
2366 --- set the fixed local address for tunneled packets.
2367 It must be an address on another interface of this host.
2371 --- set a fixed TTL
\verb|N| on tunneled packets.
2372 \verb|N| is a number in the range
1--
255.
0 is a special value
2373 meaning that packets inherit the TTL value.
2374 The default value is:
\verb|inherit|.
2376 \item \verb|tos T| or
\verb|dsfield T|
2378 --- set a fixed TOS
\verb|T| on tunneled packets.
2379 The default value is:
\verb|inherit|.
2383 \item \verb|dev NAME|
2385 --- bind the tunnel to the device
\verb|NAME| so that
2386 tunneled packets will only be routed via this device and will
2387 not be able to escape to another device when the route to endpoint changes.
2389 \item \verb|nopmtudisc|
2391 --- disable Path MTU Discovery on this tunnel.
2392 It is enabled by default. Note that a fixed ttl is incompatible
2393 with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2395 \item \verb|key K|,
\verb|ikey K|,
\verb|okey K|
2397 --- (only GRE tunnels) use keyed GRE with key
\verb|K|.
\verb|K| is
2398 either a number or an IP address-like dotted quad.
2399 The
\verb|key| parameter sets the key to use in both directions.
2400 The
\verb|ikey| and
\verb|okey| parameters set different keys for input and output.
2403 \item \verb|csum|,
\verb|icsum|,
\verb|ocsum|
2405 --- (only GRE tunnels) generate/require checksums for tunneled packets.
2406 The
\verb|ocsum| flag calculates checksums for outgoing packets.
2407 The
\verb|icsum| flag requires that all input packets have the correct
2408 checksum. The
\verb|csum| flag is equivalent to the combination
2409 ``
\verb|icsum|
\verb|ocsum|''.
2411 \item \verb|seq|,
\verb|iseq|,
\verb|oseq|
2413 --- (only GRE tunnels) serialize packets.
2414 The
\verb|oseq| flag enables sequencing of outgoing packets.
2415 The
\verb|iseq| flag requires that all input packets are serialized.
2416 The
\verb|seq| flag is equivalent to the combination ``
\verb|iseq|
\verb|oseq|''.
2419 I think this option does not
2420 work. At least, I did not test it, did not debug it and
2421 do not even understand how it is supposed to work or for what
2422 purpose Cisco planned to use it. Do not use it.
2428 \paragraph{Example:
} Create a pointopoint IPv6 tunnel with maximal TTL of
32.
2430 netadm@amber:~ # ip tunl add Cisco mode sit remote
192.31.7.104 \
2431 local
192.203.80.142 ttl
32
2434 \subsection{{\tt ip tunnel show
} --- list tunnels
}
2436 \paragraph{Abbreviations:
} \verb|show|,
\verb|list|,
\verb|sh|,
\verb|ls|,
\verb|l|.
2439 \paragraph{Arguments:
} None.
2441 \paragraph{Output format:
}
2443 kuznet@amber:~ $ ip tunl ls Cisco
2444 Cisco: ipv6/ip remote
192.31.7.104 local
192.203.80.142 ttl
32
2447 The line starts with the tunnel device name followed by a colon.
2448 Then the tunnel mode follows. The parameters of the tunnel are listed
2449 with the same keywords that were used when creating the tunnel.
2451 \paragraph{Statistics:
}
2454 kuznet@amber:~ $ ip -s tunl ls Cisco
2455 Cisco: ipv6/ip remote
192.31.7.104 local
192.203.80.142 ttl
32
2456 RX: Packets Bytes Errors CsumErrs OutOfSeq Mcasts
2457 12566 1707516 0 0 0 0
2458 TX: Packets Bytes Errors DeadLoop NoRoute NoBufs
2459 13445 1879677 0 0 0 0
2462 Essentially, these numbers are the same as the numbers
2463 printed with
{\tt ip -s link show
}
2464 (sec.
\ref{IP-LINK-SHOW
}, p.
\pageref{IP-LINK-SHOW
}) but the tags are different
2465 to reflect that they are tunnel specific.
2467 \item \verb|CsumErrs| --- the total number of packets dropped
2468 because of checksum failures for a GRE tunnel with checksumming enabled.
2469 \item \verb|OutOfSeq| --- the total number of packets dropped
2470 because they arrived out of sequence for a GRE tunnel with
2471 serialization enabled.
2472 \item \verb|Mcasts| --- the total number of multicast packets
2473 received on a broadcast GRE tunnel.
2474 \item \verb|DeadLoop| --- the total number of packets which were not
2475 transmitted because the tunnel is looped back to itself.
2476 \item \verb|NoRoute| --- the total number of packets which were not
2477 transmitted because there is no IP route to the remote endpoint.
2478 \item \verb|NoBufs| --- the total number of packets which were not
2479 transmitted because the kernel failed to allocate a buffer.
2483 \section{{\tt ip monitor
} and
{\tt rtmon
} --- state monitoring
}
2486 The
\verb|ip| utility can monitor the state of devices, addresses
2487 and routes continuously. This option has a slightly different format.
2489 the
\verb|monitor| command is the first in the command line and then
2490 the object list follows:
2492 ip monitor
[ file FILE
] [ all | OBJECT-LIST
]
2494 \verb|OBJECT-LIST| is the list of object types that we want to monitor.
2495 It may contain
\verb|link|,
\verb|address| and
\verb|route|.
2496 If no
\verb|file| argument is given,
\verb|ip| opens RTNETLINK,
2497 listens on it and dumps state changes in the format described
2498 in previous sections.
2500 If a file name is given, it does not listen on RTNETLINK,
2501 but opens the file containing RTNETLINK messages saved in binary format
2502 and dumps them. Such a history file can be generated with the
2503 \verb|rtmon| utility. This utility has a command line syntax similar to
2505 Ideally,
\verb|rtmon| should be started before
2506 the first network configuration command is issued. F.e.\ if
2509 rtmon file /var/log/rtmon.log
2511 in a startup script, you will be able to view the full history
2514 Certainly, it is possible to start
\verb|rtmon| at any time.
2515 It prepends the history with the state snapshot dumped at the moment
2519 \section{Route realms and policy propagation,
{\tt rtacct
}}
2522 On routers using OSPF ASE or, especially, the BGP protocol, routing
2523 tables may be huge. If we want to classify or to account for the packets
2524 per route, we will have to keep lots of information. Even worse, if we
2525 want to distinguish the packets not only by their destination, but
2526 also by their source, the task gets quadratic complexity and its solution
2527 is physically impossible.
2529 One approach to propagating the policy from routing protocols
2530 to the forwarding engine has been proposed in~
\cite{IOS-BGP-PP
}.
2531 Essentially, Cisco Policy Propagation via BGP is based on the fact
2532 that dedicated routers all have the RIB (Routing Information Base)
2533 close to the forwarding engine, so policy routing rules can
2534 check all the route attributes, including ASPATH information
2535 and community strings.
2537 The Linux architecture, splitting the RIB (maintained by a user level
2538 daemon) and the kernel based FIB (Forwarding Information Base),
2539 does not allow such a simple approach.
2541 It is to our fortune because there is another solution
2542 which allows even more flexible policy and richer semantics.
2544 Namely, routes can be clustered together in user space, based on their
2545 attributes. F.e.\ a BGP router knows route ASPATH, its community;
2546 an OSPF router knows the route tag or its area. The administrator, when adding
2547 routes manually, also knows their nature. Providing that the number of such
2548 aggregates (we call them
{\em realms\/
}) is low, the task of full
2549 classification both by source and destination becomes quite manageable.
2551 So each route may be assigned to a realm. It is assumed that
2552 this identification is made by a routing daemon, but static routes
2553 can also be handled manually with
\verb|ip route| (see sec.
\ref{IP-ROUTE
},
2554 p.
\pageref{IP-ROUTE
}).
2556 There is a patch to
\verb|gated|, allowing classification of routes
2557 to realms with all the set of policy rules implemented in
\verb|gated|:
2558 by prefix, by ASPATH, by origin, by tag etc.
2561 To facilitate the construction (f.e.\ in case the routing
2562 daemon is not aware of realms), missing realms may be completed
2563 with routing policy rules, see sec.~
\ref{IP-RULE
}, p.
\pageref{IP-RULE
}.
2565 For each packet the kernel calculates a tuple of realms: source realm
2566 and destination realm, using the following algorithm:
2569 \item If the route has a realm, the destination realm of the packet is set to it.
2570 \item If the rule has a source realm, the source realm of the packet is set to it.
2571 If the destination realm was not inherited from the route and the rule has a destination realm,
2573 \item If at least one of the realms is still unknown, the kernel finds
2574 the reversed route to the source of the packet.
2575 \item If the source realm is still unknown, get it from the reversed route.
2576 \item If one of the realms is still unknown, swap the realms of reversed
2577 routes and apply step
2 again.
2580 After this procedure is completed we know what realm the packet
2581 arrived from and the realm where it is going to propagate to.
2582 If some of the realms are unknown, they are initialized to zero
2583 (or realm
\verb|unknown|).
2585 The main application of realms is the TC
\verb|route| classifier~
\cite{TC-CREF
},
2586 where they are used to help assign packets to traffic classes,
2587 to account, police and schedule them according to this
2590 A much simpler but still very useful application is incoming packet
2591 accounting by realms. The kernel gathers a packet statistics summary
2592 which can be viewed with the
\verb|rtacct| utility.
2594 kuznet@amber:~ $ rtacct russia
2595 Realm BytesTo PktsTo BytesFrom PktsFrom
2596 russia
20576778 169176 47080168 153805
2599 This shows that this router received
153805 packets from
2600 the realm
\verb|russia| and forwarded
169176 packets to
\verb|russia|.
2601 The realm
\verb|russia| consists of routes with ASPATHs not leaving
2604 Note that locally originating packets are not accounted here,
2605 \verb|rtacct| shows incoming packets only. Using the
\verb|route|
2606 classifier (see~
\cite{TC-CREF
}) you can get even more detailed
2607 accounting information about outgoing packets, optionally
2608 summarizing traffic not only by source or destination, but
2609 by any pair of source and destination realms.
2612 \begin{thebibliography
}{99}
2613 \addcontentsline{toc
}{section
}{References
}
2614 \bibitem{RFC-NDISC
} T.~Narten, E.~Nordmark, W.~Simpson.
2615 ``Neighbor Discovery for IP Version
6 (IPv6)'', RFC-
2461.
2617 \bibitem{RFC-ADDRCONF
} S.~Thomson, T.~Narten.
2618 ``IPv6 Stateless Address Autoconfiguration'', RFC-
2462.
2620 \bibitem{RFC1812
} F.~Baker.
2621 ``Requirements for IP Version
4 Routers'', RFC-
1812.
2623 \bibitem{RFC1122
} R.~T.~Braden.
2624 ``Requirements for Internet hosts --- communication layers'', RFC-
1122.
2626 \bibitem{IOS
} ``Cisco IOS Release
12.0 Network Protocols
2627 Command Reference, Part
1'' and
2628 ``Cisco IOS Release
12.0 Quality of Service Solutions
2629 Configuration Guide: Configuring Policy-Based Routing'',\\
2630 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2632 \bibitem{IP-TUNNELS
} A.~N.~Kuznetsov.
2633 ``Tunnels over IP in Linux-
2.2'', \\
2634 In:
{\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz
}.
2636 \bibitem{TC-CREF
} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2637 In:
{\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz
}.
2639 \bibitem{IOS-BGP-PP
} ``Cisco IOS Release
12.0 Quality of Service Solutions
2640 Configuration Guide: Configuring QoS Policy Propagation via
2641 Border Gateway Protocol'',\\
2642 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2644 \bibitem{RFC-DHCP
} R.~Droms.
2645 ``Dynamic Host Configuration Protocol.'', RFC-
2131
2647 \end{thebibliography
}
2653 \addcontentsline{toc
}{section
}{Appendix
}
2655 \section{Source address selection
}
2658 When a host creates an IP packet, it must select some source
2659 address. Correct source address selection is a critical procedure,
2660 because it gives the receiver the information needed to deliver a
2661 reply. If the source is selected incorrectly, in the best case,
2662 the backward path may appear different to the forward one which
2663 is harmful for performance. In the worst case, when the addresses
2664 are administratively scoped, the reply may be lost entirely.
2666 Linux-
2.2 selects source addresses using the following algorithm:
2670 The application may select a source address explicitly with
\verb|bind(
2)|
2671 syscall or supplying it to
\verb|sendmsg(
2)| via the ancillary data object
2672 \verb|IP_PKTINFO|. In this case the kernel only checks the validity
2673 of the address and never tries to ``improve'' an incorrect user choice,
2674 generating an error instead.
2676 Never say ``Never''. The sysctl option
\verb|ip_dynaddr| breaks
2677 this axiom. It has been made deliberately with the purpose
2678 of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2679 However, this hack
{\em must not\/
} be used on multihomed hosts
2680 and especially on routers: it would break them.
2684 \item Otherwise, IP routing tables can contain an explicit source
2685 address hint for this destination. The hint is set with the
\verb|src| parameter
2686 to the
\verb|ip route| command, sec.
\ref{IP-ROUTE
}, p.
\pageref{IP-ROUTE
}.
2689 \item Otherwise, the kernel searches through the list of addresses
2690 attached to the interface through which the packets will be routed.
2691 The search strategies are different for IP and IPv6. Namely:
2694 \item IPv6 searches for the first valid, not deprecated address
2695 with the same scope as the destination.
2697 \item IP searches for the first valid address with a scope wider
2698 than the scope of the destination but it prefers addresses
2699 which fall to the same subnet as the nexthop of the route
2700 to the destination. Unlike IPv6, the scopes of IPv4 destinations
2701 are not encoded in their addresses but are supplied
2702 in routing tables instead (the
\verb|scope| parameter to the
\verb|ip route| command,
2703 sec.
\ref{IP-ROUTE
}, p.
\pageref{IP-ROUTE
}).
2708 \item Otherwise, if the scope of the destination is
\verb|link| or
\verb|host|,
2709 the algorithm fails and returns a zero source address.
2711 \item Otherwise, all interfaces are scanned to search for an address
2712 with an appropriate scope. The loopback device
\verb|lo| is always the first
2713 in the search list, so that if an address with global scope (not
127.0.0.1!)
2714 is configured on loopback, it is always preferred.
2719 \section{Proxy ARP/NDISC
}
2722 Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2723 In Linux-
2.2 proxy ARP on an interface may be enabled
2724 by setting the kernel
\verb|sysctl| variable
2725 \verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to
1. After this, the router
2726 starts to answer ARP requests on the interface
\verb|<dev>|, provided
2727 the route to the requested destination does
{\em not\/
} go back via the same
2730 The variable
\verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2731 ARP on all the IP devices.
2733 However, this approach fails in the case of IPv6 because the router
2734 must join the solicited node multicast address to listen for the corresponding
2735 NDISC queries. It means that proxy NDISC is possible only on a per destination
2738 Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2739 in user space. However, similar functionality was present in BSD kernels
2740 and in Linux-
2.0, so we have to preserve it at least to the extent that
2741 is standardized in BSD.
2743 Linux-
2.0 ARP had a feature called
{\em subnet\/
} proxy ARP.
2744 It is replaced with the sysctl flag in Linux-
2.2.
2748 The
\verb|ip| utility provides a way to manage proxy ARP/NDISC
2749 with the
\verb|ip neigh| command, namely:
2751 ip neigh add proxy ADDRESS
[ dev NAME
]
2753 adds a new proxy ARP/NDISC record and
2755 ip neigh del proxy ADDRESS
[ dev NAME
]
2759 If the name of the device is not given, the router will answer solicitations
2760 for address
\verb|ADDRESS| on all devices, otherwise it will only serve
2761 the device
\verb|NAME|. Even if the proxy entry is created with
2762 \verb|ip neigh|, the router
{\em will not\/
} answer a query if the route
2763 to the destination goes back via the interface from which the solicitation
2766 It is important to emphasize that proxy entries have
{\em no\/
}
2767 parameters other than these (IP/IPv6 address and optional device).
2768 Particularly, the entry does not store any link layer address.
2769 It always advertises the station address of the interface
2770 on which it sends advertisements (i.e. it's own station address).
2772 \section{Route NAT status
}
2775 NAT (or ``Network Address Translation'') remaps some parts
2776 of the IP address space into other ones. Linux-
2.2 route NAT is supposed
2777 to be used to facilitate policy routing by rewriting addresses
2778 to other routing domains or to help while renumbering sites
2781 \paragraph{What it is not:
}
2782 It is necessary to emphasize that
{\em it is not supposed\/
}
2783 to be used to compress address space or to split load.
2784 This is not missing functionality but a design principle.
2785 Route NAT is
{\em stateless\/
}. It does not hold any state
2786 about translated sessions. This means that it handles any number
2787 of sessions flawlessly. But it also means that it is
{\em static\/
}.
2788 It cannot detect the moment when the last TCP client stops
2789 using an address. For the same reason, it will not help to split
2790 load between several servers.
2792 It is a pretty commonly held belief that it is useful to split load between
2793 several servers with NAT. This is a mistake. All you get from this
2794 is the requirement that the router keep the state of all the TCP connections
2795 going via it. Well, if the router is so powerful, run apache on it.
8)
2798 The second feature: it does not touch packet payload,
2799 does not try to ``improve'' broken protocols by looking
2800 through its data and mangling it. It mangles IP addresses,
2801 only IP addresses and nothing but IP addresses.
2802 This also, is not missing any functionality.
2804 To resume: if you need to compress address space or keep
2805 active FTP clients happy, your choice is not route NAT but masquerading,
2806 port forwarding, NAPT etc.
2808 By the way, you may also want to look at
2809 http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
2813 \paragraph{How it works.
}
2814 Some part of the address space is reserved for dummy addresses
2815 which will look for all the world like some host addresses
2816 inside your network. No other hosts may use these addresses,
2817 however other routers may also be configured to translate them.
2819 A great advantage of route NAT is that it may be used not
2820 only in stub networks but in environments with arbitrarily complicated
2821 structure. It does not firewall, it
{\em forwards.
}
2823 These addresses are selected by the
\verb|ip route| command
2824 (sec.
\ref{IP-ROUTE-ADD
}, p.
\pageref{IP-ROUTE-ADD
}). F.e.\
2826 ip route add nat
192.203.80.144 via
193.233.7.83
2828 states that the single address
192.203.80.144 is a dummy NAT address.
2829 For all the world it looks like a host address inside our network.
2830 For neighbouring hosts and routers it looks like the local address
2831 of the translating router. The router answers ARP for it, advertises
2832 this address as routed via it,
{\em et al\/
}. When the router
2833 receives a packet destined for
192.203.80.144, it replaces
2834 this address with
193.233.7.83 which is the address of some real
2835 host and forwards the packet. If you need to remap
2836 blocks of addresses, you may use a command like:
2838 ip route add nat
192.203.80.192/
26 via
193.233.7.64
2840 This command will map a block of
63 addresses
192.203.80.192-
255 to
2843 When an internal host (
193.233.7.83 in the example above)
2844 sends something to the outer world and these packets are forwarded
2845 by our router, it should translate the source address
193.233.7.83
2846 into
192.203.80.144. This task is solved by setting a special
2847 policy rule (sec.
\ref{IP-RULE-ADD
}, p.
\pageref{IP-RULE-ADD
}):
2849 ip rule add prio
320 from
193.233.7.83 nat
192.203.80.144
2851 This rule says that the source address
193.233.7.83
2852 should be translated into
192.203.80.144 before forwarding.
2853 It is important that the address after the
\verb|nat| keyword
2854 is some NAT address, declared by
{\tt ip route add nat
}.
2855 If it is just a random address the router will not map to it.
2857 The exception is when the address is a local address of this
2858 router (or
0.0.0.0) and masquerading is configured in the linux-
2.2
2859 kernel. In this case the router will masquerade the packets as this address.
2860 If
0.0.0.0 is selected, the result is equivalent to one
2861 obtained with firewalling rules. Otherwise, you have the way
2862 to order Linux to masquerade to this fixed address.
2863 NAT mechanism used in linux-
2.4 is more flexible than
2864 masquerading, so that this feature has lost meaning and disabled.
2867 If the network has non-trivial internal structure, it is
2868 useful and even necessary to add rules disabling translation
2869 when a packet does not leave this network. Let us return to the
2870 example from sec.
\ref{IP-RULE-SHOW
} (p.
\pageref{IP-RULE-SHOW
}).
2872 300: from
193.233.7.83 to
193.233.7.0/
24 lookup main
2873 310: from
193.233.7.83 to
192.203.80.0/
24 lookup main
2874 320: from
193.233.7.83 lookup inr.ruhep map-to
192.203.80.144
2876 This block of rules causes normal forwarding when
2877 packets from
193.233.7.83 do not leave networks
193.233.7/
24
2878 and
192.203.80/
24. Also, if the
\verb|inr.ruhep| table does not
2879 contain a route to the destination (which means that the routing
2880 domain owning addresses from
192.203.80/
24 is dead), no translation
2881 will occur. Otherwise, the packets are translated.
2883 \paragraph{How to only translate selected ports:
}
2884 If you only want to translate selected ports (f.e.\ http)
2885 and leave the rest intact, you may use
\verb|ipchains|
2886 to
\verb|fwmark| a class of packets.
2887 Suppose you did and all the packets from
193.233.7.83
2888 destined for port
80 are marked with marker
0x1234 in input fwchain.
2889 In this case you may replace rule \
#320 with:
2891 320: from
193.233.7.83 fwmark
1234 lookup main map-to
192.203.80.144
2893 and translation will only be enabled for outgoing http requests.
2895 \section{Example: minimal host setup
}
2896 \label{EXAMPLE-SETUP
}
2898 The following script gives an example of a fault safe
2899 setup of IP (and IPv6, if it is compiled into the kernel)
2900 in the common case of a node attached to a single broadcast
2901 network. A more advanced script, which may be used both on multihomed
2902 hosts and on routers, is described in the following
2905 The utilities used in the script may be found in the
2906 directory ftp://ftp.inr.ac.ru/ip-routing/:
2908 \item \verb|ip| --- package
\verb|iproute2|.
2909 \item \verb|arping| --- package
\verb|iputils|.
2910 \item \verb|rdisc| --- package
\verb|iputils|.
2913 It also refers to a DHCP client,
\verb|dhcpcd|. I should refrain from
2914 recommending a good DHCP client to use. All that I can
2915 say is that ISC
\verb|dhcp-
2.0b1pl6| patched with the patch that
2916 can be found in the
\verb|dhcp.bootp.rarp| subdirectory of
2917 the same ftp site
{\em does\/
} work,
2918 at least on Ethernet and Token Ring.
2925 \#
{\bf Usage:
\verb|ifone ADDRESS
[/PREFIX-LENGTH
] [DEVICE
]|
}\\
2926 \#
{\bf Parameters:
}\\
2927 \# \$
1 --- Static IP address, optionally followed by prefix length.\\
2928 \# \$
2 --- Device name. If it is missing,
\verb|eth0| is asssumed.\\
2929 \# F.e.
\verb|ifone
193.233.7.90|
2936 \# Parse IP address, splitting prefix length.
2938 if
[ "$
1" != ""
]; then
2940 if
[ "$
1" != "$ipaddr"
]; then
2945 pfx="$
{ipaddr
}/$
{pfxlen
}"
2949 \#
{\bf Step
0} --- enable loopback.\\
2951 \# This step is necessary on any networked box before attempt\\
2952 \# to configure any other device.\\
2955 ip link set up dev lo
2956 ip addr add
127.0.0.1/
8 dev lo brd + scope host
2959 \# IPv6 autoconfigure themself on loopback.\\
2961 \# If user gave loopback as device, we add the address as alias and exit.
2964 if
[ "$dev" = "lo"
]; then
2965 if
[ "$ipaddr" != "" -a "$ipaddr" != "
127.0.0.1"
]; then
2966 ip address add $ipaddr dev $dev
2973 \noindent\#
{\bf Step
1} --- enable device
\verb|$dev|
2976 if ! ip link set up dev $dev ; then
2977 echo "Cannot enable interface $dev. Aborting."
1>&
2
2982 \# The interface is
\verb|UP|. IPv6 started stateless autoconfiguration itself,\\
2983 \# and its configuration finishes here. However,\\
2984 \# IP still needs some static preconfigured address.
2987 if
[ "$ipaddr" = ""
]; then
2988 echo "No address for $dev is configured, trying DHCP..."
1>&
2
2995 \#
{\bf Step
2} --- IP Duplicate Address Detection~
\cite{RFC-DHCP
}.\\
2996 \# Send two probes and wait for result for
3 seconds.\\
2997 \# If the interface opens slower f.e.\ due to long media detection,\\
2998 \# you want to increase the timeout.\\
3001 if ! arping -q -c
2 -w
3 -D -I $dev $ipaddr ; then
3002 echo "Address $ipaddr is busy, trying DHCP..."
1>&
2
3008 \# OK, the address is unique, we may add it on the interface.\\
3010 \#
{\bf Step
3} --- Configure the address on the interface.
3014 if ! ip address add $pfx brd + dev $dev; then
3015 echo "Failed to add $pfx on $dev, trying DHCP..."
1>&
2
3021 \noindent\#
{\bf Step
4} --- Announce our presence on the link.
3023 arping -A -c
1 -I $dev $ipaddr
3026 arping -U -c
1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3030 \#
{\bf Step
5} (optional) --- Add some control routes.\\
3032 \#
1. Prohibit link local multicast addresses.\\
3033 \#
2. Prohibit link local (alias, limited) broadcast.\\
3034 \#
3. Add default multicast route.
3037 ip route add unreachable
224.0.0.0/
24
3038 ip route add unreachable
255.255.255.255
3039 if
[ `ip link ls $dev | grep -c MULTICAST` -ge
1 ]; then
3040 ip route add
224.0.0.0/
4 dev $dev scope global
3045 \#
{\bf Step
6} --- Add fallback default route with huge metric.\\
3046 \# If a proxy ARP server is present on the interface, we will be\\
3047 \# able to talk to all the Internet without further configuration.\\
3048 \# It is not so cheap though and we still hope that this route\\
3049 \# will be overridden by more correct one by rdisc.\\
3050 \# Do not make this step if the device is not ARPable,\\
3051 \# because dead nexthop detection does not work on them.
3054 if
[ "$noarp" = "
0"
]; then
3055 ip ro add default dev $dev metric
30000 scope global
3060 \#
{\bf Step
7} --- Restart router discovery and exit.
3063 killall -HUP rdisc || rdisc -fs
3068 \section{Example:
{\protect\tt ifcfg
} --- interface address management
}
3069 \label{EXAMPLE-IFCFG
}
3071 This is a simplistic script replacing one option of
\verb|ifconfig|,
3072 namely, IP address management. It not only adds
3073 addresses, but also carries out Duplicate Address Detection~
\cite{RFC-DHCP
},
3074 sends unsolicited ARP to update the caches of other hosts sharing
3075 the interface, adds some control routes and restarts Router Discovery
3076 when it is necessary.
3078 I strongly recommend using it
{\em instead\/
} of
\verb|ifconfig| both
3079 on hosts and on routers.
3085 \#
{\bf Usage:
\verb?ifcfg DEVICE
[:ALIAS
] [add|del
] ADDRESS
[/LENGTH
] [PEER
]?
}\\
3086 \#
{\bf Parameters:
}\\
3087 \# ---Device name. It may have alias suffix, separated by colon.\\
3088 \# ---Command: add, delete or stop.\\
3089 \# ---IP address, optionally followed by prefix length.\\
3090 \# ---Optional peer address for pointopoint interfaces.\\
3091 \# F.e.
\verb|ifcfg eth0
193.233.7.90/
24|
3093 \noindent\# This function determines, whether it is router or host.\\
3094 \# It returns
0, if the host is apparently not router.
3097 CheckForwarding ()
{
3099 sbase=/proc/sys/net/ipv4/conf
3101 if
[ -d $sbase
]; then
3102 for dir in $sbase/*/forwarding; do
3103 fwd=$
[$fwd + `cat $dir`
]
3112 \# This function restarts Router Discovery.\\
3116 killall -HUP rdisc || rdisc -fs
3120 \# Calculate ABC "natural" mask length\\
3121 \# Arg: \$
1 = dotquad address
3127 if
[ $class -eq
0 -o $class -ge
224 ]; then return
0
3128 elif
[ $class -ge
192 ]; then return
24
3129 elif
[ $class -ge
128 ]; then return
16
3138 \# Strip alias suffix separated by colon.
3144 if
[ "$dev" = "" -o "$
1" = "help"
]; then
3145 echo "Usage: ifcfg DEV
[[add|del
[ADDR
[/LEN
]] [PEER
] | stop
]"
1>&
2
3146 echo " add - add new address"
1>&
2
3147 echo " del - delete address"
1>&
2
3148 echo " stop - completely disable IP"
1>&
2
3157 \# Parse command. If it is ``stop'', flush and exit.
3164 if
[ "$ldev" != "$dev"
]; then
3165 echo "Cannot stop alias $ldev"
1>&
2
3168 ip -
4 addr flush dev $dev $label || exit
1
3169 if
[ $fwd -eq
0 ]; then RestartRDISC; fi
3172 deleting=
1; shift ;;
3177 \# Parse prefix, split prefix length, separated by slash.
3182 if
[ "$
1" != ""
]; then
3184 if
[ "$
1" != "$ipaddr"
]; then
3187 if
[ "$ipaddr" = ""
]; then
3188 echo "$
1 is bad IP address."
1>&
2
3195 \# If peer address is present, prefix length is
32.\\
3196 \# Otherwise, if prefix length was not given, guess it.
3200 if
[ "$peer" != ""
]; then
3201 if
[ "$pfxlen" != "" -a "$pfxlen" != "
32"
]; then
3202 echo "Peer address with non-trivial netmask."
1>&
2
3205 pfx="$ipaddr peer $peer"
3207 if
[ "$pfxlen" = ""
]; then
3211 pfx="$ipaddr/$pfxlen"
3213 if
[ "$ldev" = "$dev" -a "$ipaddr" != ""
]; then
3218 \# If deletion was requested, delete the address and restart RDISC
3221 if
[ $deleting -ne
0 ]; then
3222 ip addr del $pfx dev $dev $label || exit
1
3223 if
[ $fwd -eq
0 ]; then RestartRDISC; fi
3228 \# Start interface initialization.\\
3230 \#
{\bf Step
0} --- enable device
\verb|$dev|
3233 if ! ip link set up dev $dev ; then
3234 echo "Error: cannot enable interface $dev."
1>&
2
3237 if
[ "$ipaddr" = ""
]; then exit
0; fi
3240 \#
{\bf Step
1} --- IP Duplicate Address Detection~
\cite{RFC-DHCP
}.\\
3241 \# Send two probes and wait for result for
3 seconds.\\
3242 \# If the interface opens slower f.e.\ due to long media detection,\\
3243 \# you want to increase the timeout.\\
3246 if ! arping -q -c
2 -w
3 -D -I $dev $ipaddr ; then
3247 echo "Error: some host already uses address $ipaddr on $dev."
1>&
2
3252 \# OK, the address is unique. We may add it to the interface.\\
3254 \#
{\bf Step
2} --- Configure the address on the interface.
3257 if ! ip address add $pfx brd + dev $dev $label; then
3258 echo "Error: failed to add $pfx on $dev."
1>&
2
3262 \noindent\#
{\bf Step
3} --- Announce our presence on the link
3264 arping -q -A -c
1 -I $dev $ipaddr
3267 arping -q -U -c
1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3270 \#
{\bf Step
4} (optional) --- Add some control routes.\\
3272 \#
1. Prohibit link local multicast addresses.\\
3273 \#
2. Prohibit link local (alias, limited) broadcast.\\
3274 \#
3. Add default multicast route.
3277 ip route add unreachable
224.0.0.0/
24 >& /dev/null
3278 ip route add unreachable
255.255.255.255 >& /dev/null
3279 if
[ `ip link ls $dev | grep -c MULTICAST` -ge
1 ]; then
3280 ip route add
224.0.0.0/
4 dev $dev scope global >& /dev/null
3284 \#
{\bf Step
5} --- Add fallback default route with huge metric.\\
3285 \# If a proxy ARP server is present on the interface, we will be\\
3286 \# able to talk to all the Internet without further configuration.\\
3287 \# Do not make this step on router or if the device is not ARPable.\\
3288 \# because dead nexthop detection does not work on them.
3291 if
[ $fwd -eq
0 ]; then
3292 if
[ $noarp -eq
0 ]; then
3293 ip ro append default dev $dev metric
30000 scope global
3294 elif
[ "$peer" != ""
]; then
3295 if ping -q -c
2 -w
4 $peer ; then
3296 ip ro append default via $peer dev $dev metric
30001
3305 \# End of
{\bf MAIN()
}