doc/ip-cref.tex

   1 \documentstyle[12pt,twoside]{article}
   2 \def\TITLE{IP Command Reference}
   3 \input preamble
   4 \begin{center}
   5 \Large\bf IP Command Reference.
   6 \end{center}
   7
   8
   9 \begin{center}
  10 { \large Alexey~N.~Kuznetsov } \\
  11 \em Institute for Nuclear Research, Moscow \\
  12 \verb|kuznet@ms2.inr.ac.ru| \\
  13 \rm April 14, 1999
  14 \end{center}
  15
  16 \vspace{5mm}
  17
  18 \tableofcontents
  19
  20 \newpage
  21
  22 \section{About this document}
  23
  24 This document presents a comprehensive description of the \verb|ip| utility
  25 from the \verb|iproute2| package. It is not a tutorial or user's guide.
  26 It is a {\em dictionary\/}, not explaining terms,
  27 but translating them into other terms, which may also be unknown to the reader.
  28 However, the document is self-contained and the reader, provided they have a
  29 basic networking background, will find enough information
  30 and examples to understand and configure Linux-2.2 IP and IPv6
  31 networking.
  32
  33 This document is split into sections explaining \verb|ip| commands
  34 and options, decrypting \verb|ip| output and containing a few examples.
  35 More voluminous examples and some topics, which require more elaborate
  36 discussion, are in the appendix.
  37
  38 The paragraphs beginning with NB contain side notes, warnings about
  39 bugs and design drawbacks. They may be skipped at the first reading.
  40
  41 \section{{\tt ip} --- command syntax}
  42
  43 The generic form of an \verb|ip| command is:
  44 \begin{verbatim}
  45 ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]
  46 \end{verbatim}
  47 where \verb|OPTIONS| is a set of optional modifiers affecting the
  48 general behaviour of the \verb|ip| utility or changing its output. All options
  49 begin with the character \verb|'-'| and may be used in either long or abbreviated
  50 forms. Currently, the following options are available:
  51
  52 \begin{itemize}
  53 \item \verb|-V|, \verb|-Version|
  54
  55 --- print the version of the \verb|ip| utility and exit.
  56
  57
  58 \item \verb|-s|, \verb|-stats|, \verb|-statistics|
  59
  60 --- output more information. If the option
  61 appears twice or more, the amount of information increases.
  62 As a rule, the information is statistics or some time values.
  63
  64
  65 \item \verb|-f|, \verb|-family| followed by a protocol family
  66 identifier: \verb|inet|, \verb|inet6| or \verb|link|.
  67
  68 --- enforce the protocol family to use. If the option is not present,
  69 the protocol family is guessed from other arguments. If the rest of the command
  70 line does not give enough information to guess the family, \verb|ip| falls back to the default
  71 one, usually \verb|inet| or \verb|any|. \verb|link| is a special family
  72 identifier meaning that no networking protocol is involved.
  73
  74 \item \verb|-4|
  75
  76 --- shortcut for \verb|-family inet|.
  77
  78 \item \verb|-6|
  79
  80 --- shortcut for \verb|-family inet6|.
  81
  82 \item \verb|-0|
  83
  84 --- shortcut for \verb|-family link|.
  85
  86
  87 \item \verb|-o|, \verb|-oneline|
  88
  89 --- output each record on a single line, replacing line feeds
  90 with the \verb|'\'| character. This is convenient when you want to
  91 count records with \verb|wc| or to \verb|grep| the output. The trivial
  92 script \verb|rtpr| converts the output back into readable form.
  93
  94 \item \verb|-r|, \verb|-resolve|
  95
  96 --- use the system's name resolver to print DNS names instead of
  97 host addresses.
  98
  99 \begin{NB}
 100  Do not use this option when reporting bugs or asking for advice.
 101 \end{NB}
 102 \begin{NB}
 103  \verb|ip| never uses DNS to resolve names to addresses.
 104 \end{NB}
 105
 106 \end{itemize}
 107
 108 \verb|OBJECT| is the object to manage or to get information about.
 109 The object types currently understood by \verb|ip| are:
 110
 111 \begin{itemize}
 112 \item \verb|link| --- network device
 113 \item \verb|address| --- protocol (IP or IPv6) address on a device
 114 \item \verb|neighbour| --- ARP or NDISC cache entry
 115 \item \verb|route| --- routing table entry
 116 \item \verb|rule| --- rule in routing policy database
 117 \item \verb|maddress| --- multicast address
 118 \item \verb|mroute| --- multicast routing cache entry
 119 \item \verb|tunnel| --- tunnel over IP
 120 \end{itemize}
 121
 122 Again, the names of all objects may be written in full or
 123 abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr|
 124 or just \verb|a|.
 125
 126 \verb|COMMAND| specifies the action to perform on the object.
 127 The set of possible actions depends on the object type.
 128 As a rule, it is possible to \verb|add|, \verb|delete| and
 129 \verb|show| (or \verb|list|) objects, but some objects
 130 do not allow all of these operations or have some additional commands.
 131 The \verb|help| command is available for all objects. It prints
 132 out a list of available commands and argument syntax conventions.
 133
 134 If no command is given, some default command is assumed.
 135 Usually it is \verb|list| or, if the objects of this class
 136 cannot be listed, \verb|help|.
 137
 138 \verb|ARGUMENTS| is a list of arguments to the command.
 139 The arguments depend on the command and object. There are two types of arguments:
 140 {\em flags\/}, consisting of a single keyword, and {\em parameters\/},
 141 consisting of a keyword followed by a value. For convenience,
 142 each command has some {\em default parameter\/}
 143 which may be omitted. F.e.\ parameter \verb|dev| is the default
 144 for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent
 145 to {\tt ip link ls dev eth0}.
 146 In the command descriptions below such parameters
 147 are distinguished with the marker: ``(default)''.
 148
 149 Almost all keywords may be abbreviated with several first (or even single)
 150 letters. The shortcuts are convenient when \verb|ip| is used interactively,
 151 but they are not recommended in scripts or when reporting bugs
 152 or asking for advice. ``Officially'' allowed abbreviations are listed
 153 in the document body.
 154
 155
 156
 157 \section{{\tt ip} --- error messages}
 158
 159 \verb|ip| may fail for one of the following reasons:
 160
 161 \begin{itemize}
 162 \item
 163 A syntax error on the command line: an unknown keyword, incorrectly formatted
 164 IP address {\em et al\/}. In this case \verb|ip| prints an error message
 165 and exits. As a rule, the error message will contain information
 166 about the reason for the failure. Sometimes it also prints a help page.
 167
 168 \item
 169 The arguments did not pass verification for self-consistency.
 170
 171 \item
 172 \verb|ip| failed to compile a kernel request from the arguments
 173 because the user didn't give enough information.
 174
 175 \item
 176 The kernel returned an error to some syscall. In this case \verb|ip|
 177 prints the error message, as it is output with \verb|perror(3)|,
 178 prefixed with a comment and a syscall identifier.
 179
 180 \item
 181 The kernel returned an error to some RTNETLINK request.
 182 In this case \verb|ip| prints the error message, as it is output
 183 with \verb|perror(3)| prefixed with ``RTNETLINK answers:''.
 184
 185 \end{itemize}
 186
 187 All the operations are atomic, i.e.\
 188 if the \verb|ip| utility fails, it does not change anything
 189 in the system. One harmful exception is \verb|ip link| command
 190 (Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}),
 191 which may change only some of the device parameters given
 192 on command line.
 193
 194 It is difficult to list all the error messages (especially
 195 syntax errors). However, as a rule, their meaning is clear
 196 from the context of the command.
 197
 198 The most common mistakes are:
 199
 200 \begin{enumerate}
 201 \item Netlink is not configured in the kernel. The message is:
 202 \begin{verbatim}
 203 Cannot open netlink socket: Invalid value
 204 \end{verbatim}
 205
 206 \item RTNETLINK is not configured in the kernel. In this case
 207 one of the following messages may be printed, depending on the command:
 208 \begin{verbatim}
 209 Cannot talk to rtnetlink: Connection refused
 210 Cannot send dump request: Connection refused
 211 \end{verbatim}
 212
 213 \item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
 214 when configuring the kernel. In this case any attempt to use the
 215 \verb|ip| \verb|rule| command will fail, f.e.
 216 \begin{verbatim}
 217 kuznet@kaiser $ ip rule list
 218 RTNETLINK error: Invalid argument
 219 dump terminated
 220 \end{verbatim}
 221
 222 \end{enumerate}
 223
 224
 225 \section{{\tt ip link} --- network device configuration}
 226 \label{IP-LINK}
 227
 228 \paragraph{Object:} A \verb|link| is a network device and the corresponding
 229 commands display and change the state of devices.
 230
 231 \paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|).
 232
 233 \subsection{{\tt ip link set} --- change device attributes}
 234
 235 \paragraph{Abbreviations:} \verb|set|, \verb|s|.
 236
 237 \paragraph{Arguments:}
 238
 239 \begin{itemize}
 240 \item \verb|dev NAME| (default)
 241
 242 --- \verb|NAME| specifies the network device on which to operate.
 243
 244 \item \verb|up| and \verb|down|
 245
 246 --- change the state of the device to \verb|UP| or \verb|DOWN|.
 247
 248 \item \verb|arp on| or \verb|arp off|
 249
 250 --- change the \verb|NOARP| flag on the device.
 251
 252 \begin{NB}
 253 This operation is {\em not allowed\/} if the device is in state \verb|UP|.
 254 Though neither the \verb|ip| utility nor the kernel check for this condition.
 255 You can get unpredictable results changing this flag while the
 256 device is running.
 257 \end{NB}
 258
 259 \item \verb|multicast on| or \verb|multicast off|
 260
 261 --- change the \verb|MULTICAST| flag on the device.
 262
 263 \item \verb|dynamic on| or \verb|dynamic off|
 264
 265 --- change the \verb|DYNAMIC| flag on the device.
 266
 267 \item \verb|name NAME|
 268
 269 --- change the name of the device. This operation is not
 270 recommended if the device is running or has some addresses
 271 already configured.
 272
 273 \item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER|
 274
 275 --- change the transmit queue length of the device.
 276
 277 \item \verb|mtu NUMBER|
 278
 279 --- change the MTU of the device.
 280
 281 \item \verb|address LLADDRESS|
 282
 283 --- change the station address of the interface.
 284
 285 \item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS|
 286
 287 --- change the link layer broadcast address or the peer address when
 288 the interface is \verb|POINTOPOINT|.
 289
 290 \vskip 1mm
 291 \begin{NB}
 292 For most devices (f.e.\ for Ethernet) changing the link layer
 293 broadcast address will break networking.
 294 Do not use it, if you do not understand what this operation really does.
 295 \end{NB}
 296
 297 \end{itemize}
 298
 299 \vskip 1mm
 300 \begin{NB}
 301 The \verb|PROMISC| and \verb|ALLMULTI| flags are considered
 302 obsolete and should not be changed administratively, though
 303 the {\tt ip} utility will allow that.
 304 \end{NB}
 305
 306 \paragraph{Warning:} If multiple parameter changes are requested,
 307 \verb|ip| aborts immediately after any of the changes have failed.
 308 This is the only case when \verb|ip| can move the system to
 309 an unpredictable state. The solution is to avoid changing
 310 several parameters with one {\tt ip link set} call.
 311
 312 \paragraph{Examples:}
 313 \begin{itemize}
 314 \item \verb|ip link set dummy address 00:00:00:00:00:01|
 315
 316 --- change the station address of the interface \verb|dummy|.
 317
 318 \item \verb|ip link set dummy up|
 319
 320 --- start the interface \verb|dummy|.
 321
 322 \end{itemize}
 323
 324
 325 \subsection{{\tt ip link show} --- display device attributes}
 326 \label{IP-LINK-SHOW}
 327
 328 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
 329 \verb|l|.
 330
 331 \paragraph{Arguments:}
 332 \begin{itemize}
 333 \item \verb|dev NAME| (default)
 334
 335 --- \verb|NAME| specifies the network device to show.
 336 If this argument is omitted all devices are listed.
 337
 338 \item \verb|up|
 339
 340 --- only display running interfaces.
 341
 342 \end{itemize}
 343
 344
 345 \paragraph{Output format:}
 346
 347 \begin{verbatim}
 348 kuznet@alisa:~ $ ip link ls eth0
 349 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
 350     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
 351 kuznet@alisa:~ $ ip link ls sit0
 352 5: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue
 353     link/sit 0.0.0.0 brd 0.0.0.0
 354 kuznet@alisa:~ $ ip link ls dummy
 355 2: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop
 356     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
 357 kuznet@alisa:~ $
 358 \end{verbatim}
 359
 360
 361 The number before each colon is an {\em interface index\/} or {\em ifindex\/}.
 362 This number uniquely identifies the interface. This is followed by the {\em interface name\/}
 363 (\verb|eth0|, \verb|sit0| etc.). The interface name is also
 364 unique at every given moment. However, the interface may disappear from the
 365 list (f.e.\ when the corresponding driver module is unloaded) and another
 366 one with the same name may be created later. Besides that,
 367 the administrator may change the name of any device with
 368 \verb|ip| \verb|link| \verb|set| \verb|name|
 369 to make it more intelligible.
 370
 371 The interface name may have another name or \verb|NONE| appended
 372 after the \verb|@| sign. This means that this device is bound to some other
 373 device,
 374 i.e.\ packets send through it are encapsulated and sent via the ``master''
 375 device. If the name is \verb|NONE|, the master is unknown.
 376
 377 Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines
 378 the maximal size of data which can be sent as a single packet over this interface.
 379
 380 {\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used
 381 on the interface. Particularly, \verb|noqueue| means that this interface
 382 does not queue anything and \verb|noop| means that the interface is in blackhole
 383 mode i.e.\ all packets sent to it are immediately discarded.
 384 {\em qlen\/} is the default transmit queue length of the device measured
 385 in packets.
 386
 387 The interface flags are summarized in the angle brackets.
 388
 389 \begin{itemize}
 390 \item \verb|UP| --- the device is turned on. It is ready to accept
 391 packets for transmission and it may inject into the kernel packets received
 392 from other nodes on the network.
 393
 394 \item \verb|LOOPBACK| --- the interface does not communicate with other
 395 hosts. All packets sent through it will be returned
 396 and nothing but bounced packets can be received.
 397
 398 \item \verb|BROADCAST| --- the device has the facility to send packets
 399 to all hosts sharing the same link. A typical example is an Ethernet link.
 400
 401 \item \verb|POINTOPOINT| --- the link has only two ends with one node
 402 attached to each end. All packets sent to this link will reach the peer
 403 and all packets received by us came from this single peer.
 404
 405 If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT|
 406 are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
 407 This is the most generic type of device and the most complicated one, because
 408 the host attached to a NBMA link has no means to send to anyone
 409 without additionally configured information.
 410
 411 \item \verb|MULTICAST| --- is an advisory flag indicating that the interface
 412 is aware of multicasting i.e.\ sending packets to some subset of neighbouring
 413 nodes. Broadcasting is a particular case of multicasting, where the multicast
 414 group consists of all nodes on the link. It is important to emphasize
 415 that software {\em must not\/} interpret the absence of this flag as the inability
 416 to use multicasting on this interface. Any \verb|POINTOPOINT| and
 417 \verb|BROADCAST| link is multicasting by definition, because we have
 418 direct access to all the neighbours and, hence, to any part of them.
 419 Certainly, the use of high bandwidth multicast transfers is not recommended
 420 on broadcast-only links because of high expense, but it is not strictly
 421 prohibited.
 422
 423 \item \verb|PROMISC| --- the device listens to and feeds to the kernel all
 424 traffic on the link even if it is not destined for us, not broadcasted
 425 and not destined for a multicast group of which we are member. Usually
 426 this mode exists only on broadcast links and is used by bridges and for network
 427 monitoring.
 428
 429 \item \verb|ALLMULTI| --- the device receives all multicast packets
 430 wandering on the link. This mode is used by multicast routers.
 431
 432 \item \verb|NOARP| --- this flag is different from the other ones. It has
 433 no invariant value and its interpretation depends on the network protocols
 434 involved. As a rule, it indicates that the device needs no address
 435 resolution and that the software or hardware knows how to deliver packets
 436 without any help from the protocol stacks.
 437
 438 \item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
 439 dynamically created and destroyed.
 440
 441 \item \verb|SLAVE| --- this interface is bonded to some other interfaces
 442 to share link capacities.
 443
 444 \end{itemize}
 445
 446 \vskip 1mm
 447 \begin{NB}
 448 There are other flags but they are either obsolete (\verb|NOTRAILERS|)
 449 or not implemented (\verb|DEBUG|) or specific to some devices
 450 (\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss
 451 them here.
 452 \end{NB}
 453
 454
 455 The second line contains information on the link layer addresses
 456 associated with the device. The first word (\verb|ether|, \verb|sit|)
 457 defines the interface hardware type. This type determines the format and semantics
 458 of the addresses and is logically part of the address.
 459 The default format of the station address and the broadcast address
 460 (or the peer address for pointopoint links) is a
 461 sequence of hexadecimal bytes separated by colons, but some link
 462 types may have their natural address format, f.e.\ addresses
 463 of tunnels over IP are printed as dotted-quad IP addresses.
 464
 465 \vskip 1mm
 466 \begin{NB}
 467   NBMA links have no well-defined broadcast or peer address,
 468   however this field may contain useful information, f.e.\
 469   about the address of broadcast relay or about the address of the ARP server.
 470 \end{NB}
 471 \begin{NB}
 472 Multicast addresses are not shown by this command, see
 473 \verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this
 474 document).
 475 \end{NB}
 476
 477
 478 \paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also
 479 prints interface statistics:
 480
 481 \begin{verbatim}
 482 kuznet@alisa:~ $ ip -s link ls eth0
 483 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
 484     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
 485     RX: bytes  packets  errors  dropped overrun mcast
 486     2449949362 2786187  0       0       0       0
 487     TX: bytes  packets  errors  dropped carrier collsns
 488     178558497  1783945  332     0       332     35172
 489 kuznet@alisa:~ $
 490 \end{verbatim}
 491 \verb|RX:| and \verb|TX:| lines summarize receiver and transmitter
 492 statistics. They contain:
 493 \begin{itemize}
 494 \item \verb|bytes| --- the total number of bytes received or transmitted
 495 on the interface. This number wraps when the maximal length of the data type
 496 natural for the architecture is exceeded, so continuous monitoring requires
 497 a user level daemon snapping it periodically.
 498 \item \verb|packets| --- the total number of packets received or transmitted
 499 on the interface.
 500 \item \verb|errors| --- the total number of receiver or transmitter errors.
 501 \item \verb|dropped| --- the total number of packets dropped due to lack
 502 of resources.
 503 \item \verb|overrun| --- the total number of receiver overruns resulting
 504 in dropped packets. As a rule, if the interface is overrun, it means
 505 serious problems in the kernel or that your machine is too slow
 506 for this interface.
 507 \item \verb|mcast| --- the total number of received multicast packets. This option
 508 is only supported by a few devices.
 509 \item \verb|carrier| --- total number of link media failures f.e.\ because
 510 of lost carrier.
 511 \item \verb|collsns| --- the total number of collision events
 512 on Ethernet-like media. This number may have a different sense on other
 513 link types.
 514 \item \verb|compressed| --- the total number of compressed packets. This is
 515 available only for links using VJ header compression.
 516 \end{itemize}
 517
 518
 519 If the \verb|-s| option is entered twice or more,
 520 \verb|ip| prints more detailed statistics on receiver
 521 and transmitter errors.
 522
 523 \begin{verbatim}
 524 kuznet@alisa:~ $ ip -s -s link ls eth0
 525 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
 526     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
 527     RX: bytes  packets  errors  dropped overrun mcast
 528     2449949362 2786187  0       0       0       0
 529     RX errors: length   crc     frame   fifo    missed
 530                0        0       0       0       0
 531     TX: bytes  packets  errors  dropped carrier collsns
 532     178558497  1783945  332     0       332     35172
 533     TX errors: aborted  fifo    window  heartbeat
 534                0        0       0       332
 535 kuznet@alisa:~ $
 536 \end{verbatim}
 537 These error names are pure Ethernetisms. Other devices
 538 may have non zero values in these fields but they may be
 539 interpreted differently.
 540
 541
 542 \section{{\tt ip address} --- protocol address management}
 543
 544 \paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|.
 545
 546 \paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached
 547 to a network device. Each device must have at least one address
 548 to use the corresponding protocol. It is possible to have several
 549 different addresses attached to one device. These addresses are not
 550 discriminated, so that the term {\em alias\/} is not quite appropriate
 551 for them and we do not use it in this document.
 552
 553 The \verb|ip addr| command displays addresses and their properties,
 554 adds new addresses and deletes old ones.
 555
 556 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show|
 557 (or \verb|list|).
 558
 559
 560 \subsection{{\tt ip address add} --- add a new protocol address}
 561 \label{IP-ADDR-ADD}
 562
 563 \paragraph{Abbreviations:} \verb|add|, \verb|a|.
 564
 565 \paragraph{Arguments:}
 566
 567 \begin{itemize}
 568 \item \verb|dev NAME|
 569
 570 \noindent--- the name of the device to add the address to.
 571
 572 \item \verb|local ADDRESS| (default)
 573
 574 --- the address of the interface. The format of the address depends
 575 on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
 576 separated by colons for IPv6. The \verb|ADDRESS| may be followed by
 577 a slash and a decimal number which encodes the network prefix length.
 578
 579
 580 \item \verb|peer ADDRESS|
 581
 582 --- the address of the remote endpoint for pointopoint interfaces.
 583 Again, the \verb|ADDRESS| may be followed by a slash and a decimal number,
 584 encoding the network prefix length. If a peer address is specified,
 585 the local address {\em cannot\/} have a prefix length. The network prefix is associated
 586 with the peer rather than with the local address.
 587
 588
 589 \item \verb|broadcast ADDRESS|
 590
 591 --- the broadcast address on the interface.
 592
 593 It is possible to use the special symbols \verb|'+'| and \verb|'-'|
 594 instead of the broadcast address. In this case, the broadcast address
 595 is derived by setting/resetting the host bits of the interface prefix.
 596
 597 \vskip 1mm
 598 \begin{NB}
 599 Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast
 600 address unless explicitly requested.
 601 \end{NB}
 602
 603
 604 \item \verb|label NAME|
 605
 606 --- Each address may be tagged with a label string.
 607 In order to preserve compatibility with Linux-2.0 net aliases,
 608 this string must coincide with the name of the device or must be prefixed
 609 with the device name followed by colon.
 610
 611
 612 \item \verb|scope SCOPE_VALUE|
 613
 614 --- the scope of the area where this address is valid.
 615 The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|.
 616 Predefined scope values are:
 617
 618  \begin{itemize}
 619         \item \verb|global| --- the address is globally valid.
 620         \item \verb|site| --- (IPv6 only) the address is site local,
 621         i.e.\ it is valid inside this site.
 622         \item \verb|link| --- the address is link local, i.e.\
 623         it is valid only on this device.
 624         \item \verb|host| --- the address is valid only inside this host.
 625  \end{itemize}
 626
 627 Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document)
 628 contains more details on address scopes.
 629
 630 \end{itemize}
 631
 632 \paragraph{Examples:}
 633 \begin{itemize}
 634 \item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host|
 635
 636 --- add the usual loopback address to the loopback device.
 637
 638 \item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias|
 639
 640 --- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask
 641 \verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias|
 642 to the interface \verb|eth0|.
 643 \end{itemize}
 644
 645
 646 \subsection{{\tt ip address delete} --- delete a protocol address}
 647
 648 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
 649
 650 \paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|.
 651 The device name is a required argument. The rest are optional.
 652 If no arguments are given, the first address is deleted.
 653
 654 \paragraph{Examples:}
 655 \begin{itemize}
 656 \item \verb|ip addr del 127.0.0.1/8 dev lo|
 657
 658 --- deletes the loopback address from the loopback device.
 659 It would be best not to repeat this experiment.
 660
 661 \item Disable IP on the interface \verb|eth0|:
 662 \begin{verbatim}
 663   while ip -f inet addr del dev eth0; do
 664     : nothing
 665   done
 666 \end{verbatim}
 667 Another method to disable IP on an interface using {\tt ip addr flush}
 668 may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}.
 669
 670 \end{itemize}
 671
 672
 673 \subsection{{\tt ip address show} --- display protocol addresses}
 674
 675 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
 676 \verb|l|.
 677
 678 \paragraph{Arguments:}
 679
 680 \begin{itemize}
 681 \item \verb|dev NAME| (default)
 682
 683 --- the name of the device.
 684
 685 \item \verb|scope SCOPE_VAL|
 686
 687 --- only list addresses with this scope.
 688
 689 \item \verb|to PREFIX|
 690
 691 --- only list addresses matching this prefix.
 692
 693 \item \verb|label PATTERN|
 694
 695 --- only list addresses with labels matching the \verb|PATTERN|.
 696 \verb|PATTERN| is a usual shell style pattern.
 697
 698
 699 \item \verb|dynamic| and \verb|permanent|
 700
 701 --- (IPv6 only) only list addresses installed due to stateless
 702 address configuration or only list permanent (not dynamic) addresses.
 703
 704 \item \verb|tentative|
 705
 706 --- (IPv6 only) only list addresses which did not pass duplicate
 707 address detection.
 708
 709 \item \verb|deprecated|
 710
 711 --- (IPv6 only) only list deprecated addresses.
 712
 713
 714 \item  \verb|primary| and \verb|secondary|
 715
 716 --- only list primary (or secondary) addresses.
 717
 718 \end{itemize}
 719
 720
 721 \paragraph{Output format:}
 722
 723 \begin{verbatim}
 724 kuznet@alisa:~ $ ip addr ls eth0
 725 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
 726     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
 727     inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0
 728     inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic
 729        valid_lft forever preferred_lft 604746sec
 730     inet6 fe80::2a0:ccff:fe66:1878/10 scope link
 731 kuznet@alisa:~ $
 732 \end{verbatim}
 733
 734 The first two lines coincide with the output of \verb|ip link ls|.
 735 It is natural to interpret link layer addresses
 736 as addresses of the protocol family \verb|AF_PACKET|.
 737
 738 Then the list of IP and IPv6 addresses follows, accompanied by
 739 additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD},
 740 p.\pageref{IP-ADDR-ADD} above), flags and the address label.
 741
 742 Address flags are set by the kernel and cannot be changed
 743 administratively. Currently, the following flags are defined:
 744
 745 \begin{enumerate}
 746 \item \verb|secondary|
 747
 748 --- the address is not used when selecting the default source address
 749 of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.).
 750 An IP address becomes secondary if another address with the same
 751 prefix bits already exists. The first address is primary.
 752 It is the leader of the group of all secondary addresses. When the leader
 753 is deleted, all secondaries are purged too.
 754 There is a tweak in \verb|/proc/sys/net/ipv4/conf/<dev>/promote_secondaries|
 755 which activate secondaries promotion when a primary is deleted.
 756 To permanently enable this feature on all devices add
 757 \verb|net.ipv4.conf.all.promote_secondaries=1| to \verb|/etc/sysctl.conf|.
 758 This tweak is available in linux 2.6.15 and later.
 759
 760
 761 \item \verb|dynamic|
 762
 763 --- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}.
 764 In this case the output also contains information on times, when
 765 the address is still valid. After \verb|preferred_lft| expires the address is
 766 moved to the deprecated state. After \verb|valid_lft| expires the address
 767 is finally invalidated.
 768
 769 \item \verb|deprecated|
 770
 771 --- the address is deprecated, i.e.\ it is still valid, but cannot
 772 be used by newly created connections.
 773
 774 \item \verb|tentative|
 775
 776 --- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF}
 777 is still not complete or failed.
 778
 779 \end{enumerate}
 780
 781
 782 \subsection{{\tt ip address flush} --- flush protocol addresses}
 783 \label{IP-ADDR-FLUSH}
 784
 785 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
 786
 787 \paragraph{Description:}This command flushes the protocol addresses
 788 selected by some criteria.
 789
 790 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
 791 The difference is that it does not run when no arguments are given.
 792
 793 \paragraph{Warning:} This command (and other \verb|flush| commands
 794 described below) is pretty dangerous. If you make a mistake, it will
 795 not forgive it, but will cruelly purge all the addresses.
 796
 797 \paragraph{Statistics:} With the \verb|-statistics| option, the command
 798 becomes verbose. It prints out the number of deleted addresses and the number
 799 of rounds made to flush the address list. If this option is given
 800 twice, \verb|ip addr flush| also dumps all the deleted addresses
 801 in the format described in the previous subsection.
 802
 803 \paragraph{Example:} Delete all the addresses from the private network
 804 10.0.0.0/8:
 805 \begin{verbatim}
 806 netadm@amber:~ # ip -s -s a f to 10/8
 807 2: dummy    inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy
 808 3: eth0    inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0
 809 4: eth1    inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1
 810
 811 *** Round 1, deleting 3 addresses ***
 812 *** Flush is complete after 1 round ***
 813 netadm@amber:~ #
 814 \end{verbatim}
 815 Another instructive example is disabling IP on all the Ethernets:
 816 \begin{verbatim}
 817 netadm@amber:~ # ip -4 addr flush label "eth*"
 818 \end{verbatim}
 819 And the last example shows how to flush all the IPv6 addresses
 820 acquired by the host from stateless address autoconfiguration
 821 after you enabled forwarding or disabled autoconfiguration.
 822 \begin{verbatim}
 823 netadm@amber:~ # ip -6 addr flush dynamic
 824 \end{verbatim}
 825
 826
 827
 828 \section{{\tt ip neighbour} --- neighbour/arp tables management}
 829
 830 \paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|,
 831 \verb|n|.
 832
 833 \paragraph{Object:} \verb|neighbour| objects establish bindings between protocol
 834 addresses and link layer addresses for hosts sharing the same link.
 835 Neighbour entries are organized into tables. The IPv4 neighbour table
 836 is known by another name --- the ARP table.
 837
 838 The corresponding commands display neighbour bindings
 839 and their properties, add new neighbour entries and delete old ones.
 840
 841 \paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|,
 842 \verb|delete|, \verb|flush| and \verb|show| (or \verb|list|).
 843
 844 \paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH}
 845 describes how to manage proxy ARP/NDISC with the \verb|ip| utility.
 846
 847
 848 \subsection{{\tt ip neighbour add} --- add a new neighbour entry\\
 849         {\tt ip neighbour change} --- change an existing entry\\
 850         {\tt ip neighbour replace} --- add a new entry or change an existing one}
 851
 852 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
 853 \verb|replace|, \verb|repl|.
 854
 855 \paragraph{Description:} These commands create new neighbour records
 856 or update existing ones.
 857
 858 \paragraph{Arguments:}
 859
 860 \begin{itemize}
 861 \item \verb|to ADDRESS| (default)
 862
 863 --- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
 864
 865 \item \verb|dev NAME|
 866
 867 --- the interface to which this neighbour is attached.
 868
 869
 870 \item \verb|lladdr LLADDRESS|
 871
 872 --- the link layer address of the neighbour. \verb|LLADDRESS| can also be
 873 \verb|null|.
 874
 875 \item \verb|nud NUD_STATE|
 876
 877 --- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour
 878 Unreachability Detection''. The state can take one of the following values:
 879
 880 \begin{enumerate}
 881 \item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
 882 administratively.
 883 \item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
 884 this entry will be made but it can be removed when its lifetime expires.
 885 \item \verb|reachable| --- the neighbour entry is valid until the reachability
 886 timeout expires.
 887 \item \verb|stale| --- the neighbour entry is valid but suspicious.
 888 This option to \verb|ip neigh| does not change the neighbour state if
 889 it was valid and the address is not changed by this command.
 890 \end{enumerate}
 891
 892 \end{itemize}
 893
 894 \paragraph{Examples:}
 895 \begin{itemize}
 896 \item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm|
 897
 898 --- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
 899
 900 \item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable|
 901
 902 --- change its state to \verb|reachable|.
 903 \end{itemize}
 904
 905
 906 \subsection{{\tt ip neighbour delete} --- delete a neighbour entry}
 907
 908 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
 909
 910 \paragraph{Description:} This command invalidates a neighbour entry.
 911
 912 \paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|,
 913 except that \verb|lladdr| and \verb|nud| are ignored.
 914
 915
 916 \paragraph{Example:}
 917 \begin{itemize}
 918 \item \verb|ip neigh del 10.0.0.3 dev eth0|
 919
 920 --- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
 921
 922 \end{itemize}
 923
 924 \begin{NB}
 925  The deleted neighbour entry will not disappear from the tables
 926  immediately. If it is in use it cannot be deleted until the last
 927  client releases it. Otherwise it will be destroyed during
 928  the next garbage collection.
 929 \end{NB}
 930
 931
 932 \paragraph{Warning:} Attempts to delete or manually change
 933 a \verb|noarp| entry created by the kernel may result in unpredictable behaviour.
 934 Particularly, the kernel may try to resolve this address even
 935 on a \verb|NOARP| interface or if the address is multicast or broadcast.
 936
 937
 938 \subsection{{\tt ip neighbour show} --- list neighbour entries}
 939
 940 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|.
 941
 942 \paragraph{Description:}This commands displays neighbour tables.
 943
 944 \paragraph{Arguments:}
 945
 946 \begin{itemize}
 947
 948 \item \verb|to ADDRESS| (default)
 949
 950 --- the prefix selecting the neighbours to list.
 951
 952 \item \verb|dev NAME|
 953
 954 --- only list the neighbours attached to this device.
 955
 956 \item \verb|unused|
 957
 958 --- only list neighbours which are not currently in use.
 959
 960 \item \verb|nud NUD_STATE|
 961
 962 --- only list neighbour entries in this state. \verb|NUD_STATE| takes
 963 values listed below or the special value \verb|all| which means all states.
 964 This option may occur more than once. If this option is absent, \verb|ip|
 965 lists all entries except for \verb|none| and \verb|noarp|.
 966
 967 \end{itemize}
 968
 969
 970 \paragraph{Output format:}
 971
 972 \begin{verbatim}
 973 kuznet@alisa:~ $ ip neigh ls
 974 :: dev lo lladdr 00:00:00:00:00:00 nud noarp
 975 fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \
 976     nud stale
 977 0.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp
 978 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable
 979 193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale
 980 kuznet@alisa:~ $
 981 \end{verbatim}
 982
 983 The first word of each line is the protocol address of the neighbour.
 984 Then the device name follows. The rest of the line describes the contents of
 985 the neighbour entry identified by the pair (device, address).
 986
 987 \verb|lladdr| is the link layer address of the neighbour.
 988
 989 \verb|nud| is the state of the ``neighbour unreachability detection'' machine
 990 for this entry. The detailed description of the neighbour
 991 state machine can be found in~\cite{RFC-NDISC}. Here is the full list
 992 of the states with short descriptions:
 993
 994 \begin{enumerate}
 995 \item\verb|none| --- the state of the neighbour is void.
 996 \item\verb|incomplete| --- the neighbour is in the process of resolution.
 997 \item\verb|reachable| --- the neighbour is valid and apparently reachable.
 998 \item\verb|stale| --- the neighbour is valid, but is probably already
 999 unreachable, so the kernel will try to check it at the first transmission.
1000 \item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
1001 for confirmation.
1002 \item\verb|probe| --- the delay timer expired but no confirmation was received.
1003 The kernel has started to probe the neighbour with ARP/NDISC messages.
1004 \item\verb|failed| --- resolution has failed.
1005 \item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1006 will be made.
1007 \item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator
1008 may remove the entry from the neighbour table.
1009 \end{enumerate}
1010
1011 The link layer address is valid in all states except for \verb|none|,
1012 \verb|failed| and \verb|incomplete|.
1013
1014 IPv6 neighbours can be marked with the additional flag \verb|router|
1015 which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}.
1016
1017 \paragraph{Statistics:} The \verb|-statistics| option displays some usage
1018 statistics, f.e.\
1019
1020 \begin{verbatim}
1021 kuznet@alisa:~ $ ip -s n ls 193.233.7.254
1022 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1023     nud reachable
1024 kuznet@alisa:~ $
1025 \end{verbatim}
1026
1027 Here \verb|ref| is the number of users of this entry
1028 and \verb|used| is a triplet of time intervals in seconds
1029 separated by slashes. In this case they show that:
1030
1031 \begin{enumerate}
1032 \item the entry was used 12 seconds ago.
1033 \item the entry was confirmed 13 seconds ago.
1034 \item the entry was updated 20 seconds ago.
1035 \end{enumerate}
1036
1037 \subsection{{\tt ip neighbour flush} --- flush neighbour entries}
1038
1039 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1040
1041 \paragraph{Description:}This command flushes neighbour tables, selecting
1042 entries to flush by some criteria.
1043
1044 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
1045 The differences are that it does not run when no arguments are given,
1046 and that the default neighbour states to be flushed do not include
1047 \verb|permanent| and \verb|noarp|.
1048
1049
1050 \paragraph{Statistics:} With the \verb|-statistics| option, the command
1051 becomes verbose. It prints out the number of deleted neighbours and the number
1052 of rounds made to flush the neighbour table. If the option is given
1053 twice, \verb|ip neigh flush| also dumps all the deleted neighbours
1054 in the format described in the previous subsection.
1055
1056 \paragraph{Example:}
1057 \begin{verbatim}
1058 netadm@alisa:~ # ip -s -s n f 193.233.7.254
1059 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1060     nud reachable
1061
1062 *** Round 1, deleting 1 entries ***
1063 *** Flush is complete after 1 round ***
1064 netadm@alisa:~ #
1065 \end{verbatim}
1066
1067
1068 \section{{\tt ip route} --- routing table management}
1069 \label{IP-ROUTE}
1070
1071 \paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|.
1072
1073 \paragraph{Object:} \verb|route| entries in the kernel routing tables keep
1074 information about paths to other networked nodes.
1075
1076 Each route entry has a {\em key\/} consisting of a {\em prefix\/}
1077 (i.e.\ a pair containing a network address and the length of its mask) and,
1078 optionally, the TOS value. An IP packet matches the route if the highest
1079 bits of its destination address are equal to the route prefix at least
1080 up to the prefix length and if the TOS of the route is zero or equal to
1081 the TOS of the packet.
1082
1083 If several routes match the packet, the following pruning rules
1084 are used to select the best one (see~\cite{RFC1812}):
1085 \begin{enumerate}
1086 \item The longest matching prefix is selected. All shorter ones
1087 are dropped.
1088
1089 \item If the TOS of some route with the longest prefix is equal to the TOS
1090 of the packet, the routes with different TOS are dropped.
1091
1092 If no exact TOS match was found and routes with TOS=0 exist,
1093 the rest of routes are pruned.
1094
1095 Otherwise, the route lookup fails.
1096
1097 \item If several routes remain after the previous steps, then
1098 the routes with the best preference values are selected.
1099
1100 \item If we still have several routes, then the {\em first\/} of them
1101 is selected.
1102
1103 \begin{NB}
1104  Note the ambiguity of the last step. Unfortunately, Linux
1105  historically allows such a bizarre situation. The sense of the
1106 word ``first'' depends on the order of route additions and it is practically
1107 impossible to maintain a bundle of such routes in this order.
1108 \end{NB}
1109
1110 For simplicity we will limit ourselves to the case where such a situation
1111 is impossible and routes are uniquely identified by the triplet
1112 \{prefix, tos, preference\}. Actually, it is impossible to create
1113 non-unique routes with \verb|ip| commands described in this section.
1114
1115 One useful exception to this rule is the default route on non-forwarding
1116 hosts. It is ``officially'' allowed to have several fallback routes
1117 when several routers are present on directly connected networks.
1118 In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122}
1119 controlled by neighbour unreachability detection and by advice
1120 from transport protocols to select a working router, so the order
1121 of the routes is not essential. However, in this case,
1122 fiddling with default routes manually is not recommended. Use the Router Discovery
1123 protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP})
1124 instead. Actually, Linux-2.2 IPv6 does not give user level applications
1125 any access to default routes.
1126 \end{enumerate}
1127
1128 Certainly, the steps above are not performed exactly
1129 in this sequence. Instead, the routing table in the kernel is kept
1130 in some data structure to achieve the final result
1131 with minimal cost. However, not depending on a particular
1132 routing algorithm implemented in the kernel, we can summarize
1133 the statements above as: a route is identified by the triplet
1134 \{prefix, tos, preference\}. This {\em key\/} lets us locate
1135 the route in the routing table.
1136
1137 \paragraph{Route attributes:} Each route key refers to a routing
1138 information record containing
1139 the data required to deliver IP packets (f.e.\ output device and
1140 next hop router) and some optional attributes (f.e. the path MTU or
1141 the preferred source address when communicating with this destination).
1142 These attributes are described in the following subsection.
1143
1144 \paragraph{Route types:} \label{IP-ROUTE-TYPES}
1145 It is important that the set
1146 of required and optional attributes depend on the route {\em type\/}.
1147 The most important route type
1148 is \verb|unicast|. It describes real paths to other hosts.
1149 As a rule, common routing tables contain only such routes. However,
1150 there are other types of routes with different semantics. The
1151 full list of types understood by Linux-2.2 is:
1152 \begin{itemize}
1153 \item \verb|unicast| --- the route entry describes real paths to the
1154 destinations covered by the route prefix.
1155 \item \verb|unreachable| --- these destinations are unreachable. Packets
1156 are discarded and the ICMP message {\em host unreachable\/} is generated.
1157 The local senders get an \verb|EHOSTUNREACH| error.
1158 \item \verb|blackhole| --- these destinations are unreachable. Packets
1159 are discarded silently. The local senders get an \verb|EINVAL| error.
1160 \item \verb|prohibit| --- these destinations are unreachable. Packets
1161 are discarded and the ICMP message {\em communication administratively
1162 prohibited\/} is generated. The local senders get an \verb|EACCES| error.
1163 \item \verb|local| --- the destinations are assigned to this
1164 host. The packets are looped back and delivered locally.
1165 \item \verb|broadcast| --- the destinations are broadcast addresses.
1166 The packets are sent as link broadcasts.
1167 \item \verb|throw| --- a special control route used together with policy
1168 rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup
1169 in this table is terminated pretending that no route was found.
1170 Without policy routing it is equivalent to the absence of the route in the routing
1171 table. The packets are dropped and the ICMP message {\em net unreachable\/}
1172 is generated. The local senders get an \verb|ENETUNREACH| error.
1173 \item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1174 are considered to be dummy (or external) addresses which require translation
1175 to real (or internal) ones before forwarding. The addresses to translate to
1176 are selected with the attribute \verb|via|. More about NAT is
1177 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
1178 \item \verb|anycast| --- ({\em not implemented\/}) the destinations are
1179 {\em anycast\/} addresses assigned to this host. They are mainly equivalent
1180 to \verb|local| with one difference: such addresses are invalid when used
1181 as the source address of any packet.
1182 \item \verb|multicast| --- a special type used for multicast routing.
1183 It is not present in normal routing tables.
1184 \end{itemize}
1185
1186 \paragraph{Route tables:} Linux-2.2 can pack routes into several routing
1187 tables identified by a number in the range from 1 to 255 or by
1188 name from the file \verb|/etc/iproute2/rt_tables|. By default all normal
1189 routes are inserted into the \verb|main| table (ID 254) and the kernel only uses
1190 this table when calculating routes.
1191
1192 Actually, one other table always exists, which is invisible but
1193 even more important. It is the \verb|local| table (ID 255). This table
1194 consists of routes for local and broadcast addresses. The kernel maintains
1195 this table automatically and the administrator usually need not modify it
1196 or even look at it.
1197
1198 The multiple routing tables enter the game when {\em policy routing\/}
1199 is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}.
1200 In this case, the table identifier effectively becomes
1201 one more parameter, which should be added to the triplet
1202 \{prefix, tos, preference\} to uniquely identify the route.
1203
1204
1205 \subsection{{\tt ip route add} --- add a new route\\
1206         {\tt ip route change} --- change a route\\
1207         {\tt ip route replace} --- change a route or add a new one}
1208 \label{IP-ROUTE-ADD}
1209
1210 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
1211         \verb|replace|, \verb|repl|.
1212
1213
1214 \paragraph{Arguments:}
1215 \begin{itemize}
1216 \item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default)
1217
1218 --- the destination prefix of the route. If \verb|TYPE| is omitted,
1219 \verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE|
1220 are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed
1221 by a slash and the prefix length. If the length of the prefix is missing,
1222 \verb|ip| assumes a full-length host route. There is also a special
1223 \verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or
1224 to IPv6 \verb|::/0|.
1225
1226 \item \verb|tos TOS| or \verb|dsfield TOS|
1227
1228 --- the Type Of Service (TOS) key. This key has no associated mask and
1229 the longest match is understood as: First, compare the TOS
1230 of the route and of the packet. If they are not equal, then the packet
1231 may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal
1232 number or an identifier from {\tt /etc/iproute2/rt\_dsfield}.
1233
1234
1235 \item \verb|metric NUMBER| or \verb|preference NUMBER|
1236
1237 --- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number.
1238
1239 \item \verb|table TABLEID|
1240
1241 --- the table to add this route to.
1242 \verb|TABLEID| may be a number or a string from the file
1243 \verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1244 \verb|ip| assumes the \verb|main| table, with the exception of
1245 \verb|local|, \verb|broadcast| and \verb|nat| routes, which are
1246 put into the \verb|local| table by default.
1247
1248 \item \verb|dev NAME|
1249
1250 --- the output device name.
1251
1252 \item \verb|via ADDRESS|
1253
1254 --- the address of the nexthop router. Actually, the sense of this field depends
1255 on the route type. For normal \verb|unicast| routes it is either the true nexthop
1256 router or, if it is a direct route installed in BSD compatibility mode,
1257 it can be a local address of the interface.
1258 For NAT routes it is the first address of the block of translated IP destinations.
1259
1260 \item \verb|src ADDRESS|
1261
1262 --- the source address to prefer when sending to the destinations
1263 covered by the route prefix.
1264
1265 \item \verb|realm REALMID|
1266
1267 --- the realm to which this route is assigned.
1268 \verb|REALMID| may be a number or a string from the file
1269 \verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS})
1270 contains more information on realms.
1271
1272 \item \verb|mtu MTU| or \verb|mtu lock MTU|
1273
1274 --- the MTU along the path to the destination. If the modifier \verb|lock| is
1275 not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1276 If the modifier \verb|lock| is used, no path MTU discovery will be tried,
1277 all packets will be sent without the DF bit in IPv4 case
1278 or fragmented to MTU for IPv6.
1279
1280 \item \verb|window NUMBER|
1281
1282 --- the maximal window for TCP to advertise to these destinations,
1283 measured in bytes. It limits maximal data bursts that our TCP
1284 peers are allowed to send to us.
1285
1286 \item \verb|rtt NUMBER|
1287
1288 --- the initial RTT (``Round Trip Time'') estimate.
1289
1290
1291 \item \verb|rttvar NUMBER|
1292
1293 --- \threeonly the initial RTT variance estimate.
1294
1295
1296 \item \verb|ssthresh NUMBER|
1297
1298 --- \threeonly an estimate for the initial slow start threshold.
1299
1300
1301 \item \verb|cwnd NUMBER|
1302
1303 --- \threeonly the clamp for congestion window. It is ignored if the \verb|lock|
1304     flag is not used.
1305
1306
1307 \item \verb|advmss NUMBER|
1308
1309 --- \threeonly the MSS (``Maximal Segment Size'') to advertise to these
1310     destinations when establishing TCP connections. If it is not given,
1311     Linux uses a default value calculated from the first hop device MTU.
1312
1313 \begin{NB}
1314   If the path to these destination is asymmetric, this guess may be wrong.
1315 \end{NB}
1316
1317 \item \verb|reordering NUMBER|
1318
1319 --- \threeonly Maximal reordering on the path to this destination.
1320     If it is not given, Linux uses the value selected with \verb|sysctl|
1321     variable \verb|net/ipv4/tcp_reordering|.
1322
1323
1324
1325 \item \verb|nexthop NEXTHOP|
1326
1327 --- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value
1328 with its own syntax similar to the top level argument lists:
1329 \begin{itemize}
1330 \item \verb|via ADDRESS| is the nexthop router.
1331 \item \verb|dev NAME| is the output device.
1332 \item \verb|weight NUMBER| is a weight for this element of a multipath
1333 route reflecting its relative bandwidth or quality.
1334 \end{itemize}
1335
1336 \item \verb|scope SCOPE_VAL|
1337
1338 --- the scope of the destinations covered by the route prefix.
1339 \verb|SCOPE_VAL| may be a number or a string from the file
1340 \verb|/etc/iproute2/rt_scopes|.
1341 If this parameter is omitted,
1342 \verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast|
1343 routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes
1344 and scope \verb|host| for \verb|local| routes.
1345
1346 \item \verb|protocol RTPROTO|
1347
1348 --- the routing protocol identifier of this route.
1349 \verb|RTPROTO| may be a number or a string from the file
1350 \verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1351 not given, \verb|ip| assumes protocol \verb|boot| (i.e.\
1352 it assumes the route was added by someone who doesn't
1353 understand what they are doing). Several protocol values have a fixed interpretation.
1354 Namely:
1355 \begin{itemize}
1356 \item \verb|redirect| --- the route was installed due to an ICMP redirect.
1357 \item \verb|kernel| --- the route was installed by the kernel during
1358 autoconfiguration.
1359 \item \verb|boot| --- the route was installed during the bootup sequence.
1360 If a routing daemon starts, it will purge all of them.
1361 \item \verb|static| --- the route was installed by the administrator
1362 to override dynamic routing. Routing daemon will respect them
1363 and, probably, even advertise them to its peers.
1364 \item \verb|ra| --- the route was installed by Router Discovery protocol.
1365 \end{itemize}
1366 The rest of the values are not reserved and the administrator is free
1367 to assign (or not to assign) protocol tags. At least, routing
1368 daemons should take care of setting some unique protocol values,
1369 f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos|
1370 database.
1371
1372
1373 \item \verb|onlink|
1374
1375 --- pretend that the nexthop is directly attached to this link,
1376 even if it does not match any interface prefix. One application of this
1377 option may be found in~\cite{IP-TUNNELS}.
1378
1379 \item \verb|equalize|
1380
1381 --- allow packet by packet randomization on multipath routes.
1382 Without this modifier, the route will be frozen to one selected
1383 nexthop, so that load splitting will only occur on per-flow base.
1384 \verb|equalize| only works if the kernel is patched.
1385
1386
1387 \end{itemize}
1388
1389
1390 \begin{NB}
1391   Actually there are more commands: \verb|prepend| does the same
1392   thing as classic \verb|route add|, i.e.\ adds a route, even if another
1393   route to the same destination exists. Its opposite case is \verb|append|,
1394   which adds the route to the end of the list. Avoid these
1395   features.
1396 \end{NB}
1397 \begin{NB}
1398   More sad news, IPv6 only understands the \verb|append| command correctly.
1399   All the others are translated into \verb|append| commands. Certainly,
1400   this will change in the future.
1401 \end{NB}
1402
1403 \paragraph{Examples:}
1404 \begin{itemize}
1405 \item add a plain route to network 10.0.0/24 via gateway 193.233.7.65
1406 \begin{verbatim}
1407   ip route add 10.0.0/24 via 193.233.7.65
1408 \end{verbatim}
1409 \item change it to a direct route via the \verb|dummy| device
1410 \begin{verbatim}
1411   ip ro chg 10.0.0/24 dev dummy
1412 \end{verbatim}
1413 \item add a default multipath route splitting the load between \verb|ppp0|
1414 and \verb|ppp1|
1415 \begin{verbatim}
1416   ip route add default scope global nexthop dev ppp0 \
1417                                     nexthop dev ppp1
1418 \end{verbatim}
1419 Note the scope value. It is not necessary but it informs the kernel
1420 that this route is gatewayed rather than direct. Actually, if you
1421 know the addresses of remote endpoints it would be better to use the
1422 \verb|via| parameter.
1423 \item announce that the address 192.203.80.144 is not a real one, but
1424 should be translated to 193.233.7.83 before forwarding
1425 \begin{verbatim}
1426   ip route add nat 192.203.80.144 via 193.233.7.83
1427 \end{verbatim}
1428 Backward translation is setup with policy rules described
1429 in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}).
1430 \end{itemize}
1431
1432 \subsection{{\tt ip route delete} --- delete a route}
1433
1434 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
1435
1436 \paragraph{Arguments:} \verb|ip route del| has the same arguments as
1437 \verb|ip route add|, but their semantics are a bit different.
1438
1439 Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|)
1440 select the route to delete. If optional attributes are present, \verb|ip|
1441 verifies that they coincide with the attributes of the route to delete.
1442 If no route with the given key and attributes was found, \verb|ip route del|
1443 fails.
1444 \begin{NB}
1445 Linux-2.0 had the option to delete a route selected only by prefix address,
1446 ignoring its length (i.e.\ netmask). This option no longer exists
1447 because it was ambiguous. However, look at {\tt ip route flush}
1448 (sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which
1449 provides similar and even richer functionality.
1450 \end{NB}
1451
1452 \paragraph{Example:}
1453 \begin{itemize}
1454 \item delete the multipath route created by the command in previous subsection
1455 \begin{verbatim}
1456   ip route del default scope global nexthop dev ppp0 \
1457                                     nexthop dev ppp1
1458 \end{verbatim}
1459 \end{itemize}
1460
1461
1462
1463 \subsection{{\tt ip route show} --- list routes}
1464
1465 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
1466
1467 \paragraph{Description:} the command displays the contents of the routing tables
1468 or the route(s) selected by some criteria.
1469
1470
1471 \paragraph{Arguments:}
1472 \begin{itemize}
1473 \item \verb|to SELECTOR| (default)
1474
1475 --- only select routes from the given range of destinations. \verb|SELECTOR|
1476 consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|)
1477 and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter
1478 than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table.
1479 \verb|match PREFIX| selects routes with prefixes not longer than
1480 \verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|,
1481 \verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and
1482 \verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|)
1483 selects routes with this exact prefix. If neither of these options
1484 are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table.
1485
1486
1487 \item \verb|tos TOS| or \verb|dsfield TOS|
1488
1489  --- only select routes with the given TOS.
1490
1491
1492 \item \verb|table TABLEID|
1493
1494  --- show the routes from this table(s). The default setting is to show
1495 \verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table
1496 or one of the special values:
1497   \begin{itemize}
1498   \item \verb|all| --- list all of the tables.
1499   \item \verb|cache| --- dump the routing cache.
1500   \end{itemize}
1501 \begin{NB}
1502   IPv6 has a single table. However, splitting it into \verb|main|, \verb|local|
1503   and \verb|cache| is emulated by the \verb|ip| utility.
1504 \end{NB}
1505
1506 \item \verb|cloned| or \verb|cached|
1507
1508 --- list cloned routes i.e.\ routes which were dynamically forked from
1509 other routes because some route attribute (f.e.\ MTU) was updated.
1510 Actually, it is equivalent to \verb|table cache|.
1511
1512 \item \verb|from SELECTOR|
1513
1514 --- the same syntax as for \verb|to|, but it binds the source address range
1515 rather than destinations. Note that the \verb|from| option only works with
1516 cloned routes.
1517
1518 \item \verb|protocol RTPROTO|
1519
1520 --- only list routes of this protocol.
1521
1522
1523 \item \verb|scope SCOPE_VAL|
1524
1525 --- only list routes with this scope.
1526
1527 \item \verb|type TYPE|
1528
1529 --- only list routes of this type.
1530
1531 \item \verb|dev NAME|
1532
1533 --- only list routes going via this device.
1534
1535 \item \verb|via PREFIX|
1536
1537 --- only list routes going via the nexthop routers selected by \verb|PREFIX|.
1538
1539 \item \verb|src PREFIX|
1540
1541 --- only list routes with preferred source addresses selected
1542 by \verb|PREFIX|.
1543
1544 \item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM|
1545
1546 --- only list routes with these realms.
1547
1548 \end{itemize}
1549
1550 \paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp|
1551 on a router:
1552 \begin{verbatim}
1553 kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1554    1413    9891    79010
1555 kuznet@amber:~ $
1556 \end{verbatim}
1557 To count the size of the routing cache, we have to use the \verb|-o| option
1558 because cached attributes can take more than one line of output:
1559 \begin{verbatim}
1560 kuznet@amber:~ $ ip -o ro ls cloned | wc
1561    159    2543    18707
1562 kuznet@amber:~ $
1563 \end{verbatim}
1564
1565
1566 \paragraph{Output format:} The output of this command consists
1567 of per route records separated by line feeds.
1568 However, some records may consist
1569 of more than one line: particularly, this is the case when the route
1570 is cloned or you requested additional statistics. If the
1571 \verb|-o| option was given, then line feeds separating lines inside
1572 records are replaced with the backslash sign.
1573
1574 The output has the same syntax as arguments given to {\tt ip route add},
1575 so that it can be understood easily. F.e.\
1576 \begin{verbatim}
1577 kuznet@amber:~ $ ip ro ls 193.233.7/24
1578 193.233.7.0/24 dev eth0  proto gated/conn  scope link \
1579     src 193.233.7.65 realms inr.ac
1580 kuznet@amber:~ $
1581 \end{verbatim}
1582
1583 If you list cloned entries, the output contains other attributes which
1584 are evaluated during route calculation and updated during route
1585 lifetime. An example of the output is:
1586 \begin{verbatim}
1587 kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache
1588 193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1589   realms inr.ac/inr.ac
1590     cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1591 193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
1592     cache  mtu 1500 rtt 300
1593 kuznet@amber:~ $
1594 \end{verbatim}
1595 \begin{NB}
1596   \label{NB-strange-route}
1597   The route looks a bit strange, doesn't it? Did you notice that
1598   it is a path from 193.233.7.82 back to 193.233.82? Well, you will
1599   see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness})
1600   how it appeared.
1601 \end{NB}
1602 The second line, starting with the word \verb|cache|, shows
1603 additional attributes which normal routes do not possess.
1604 Cached flags are summarized in angle brackets:
1605 \begin{itemize}
1606 \item \verb|local| --- packets are delivered locally.
1607 It stands for loopback unicast routes, for broadcast routes
1608 and for multicast routes, if this host is a member of the corresponding
1609 group.
1610
1611 \item \verb|reject| --- the path is bad. Any attempt to use it results
1612 in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}).
1613
1614 \item \verb|mc| --- the destination is multicast.
1615
1616 \item \verb|brd| --- the destination is broadcast.
1617
1618 \item \verb|src-direct| --- the source is on a directly connected
1619 interface.
1620
1621 \item \verb|redirected| --- the route was created by an ICMP Redirect.
1622
1623 \item \verb|redirect| --- packets going via this route will
1624 trigger an ICMP redirect.
1625
1626 \item \verb|fastroute| --- the route is eligible to be used for fastroute.
1627
1628 \item \verb|equalize| --- make packet by packet randomization
1629 along this path.
1630
1631 \item \verb|dst-nat| --- the destination address requires translation.
1632
1633 \item \verb|src-nat| --- the source address requires translation.
1634
1635 \item \verb|masq| --- the source address requires masquerading.
1636 This feature disappeared in linux-2.4.
1637
1638 \item \verb|notify| --- ({\em not implemented}) change/deletion
1639 of this route will trigger RTNETLINK notification.
1640 \end{itemize}
1641
1642 Then some optional attributes follow:
1643 \begin{itemize}
1644 \item \verb|error| --- on \verb|reject| routes it is error code
1645 returned to local senders when they try to use this route.
1646 These error codes are translated into ICMP error codes, sent to remote
1647 senders, according to the rules described above in the subsection
1648 devoted to route types (p.\pageref{IP-ROUTE-TYPES}).
1649 \label{IP-ROUTE-GET-error}
1650
1651 \item \verb|expires| --- this entry will expire after this timeout.
1652
1653 \item \verb|iif| --- the packets for this path are expected to arrive
1654 on this interface.
1655 \end{itemize}
1656
1657 \paragraph{Statistics:} With the \verb|-statistics| option, more
1658 information about this route is shown:
1659 \begin{itemize}
1660 \item \verb|users| --- the number of users of this entry.
1661 \item \verb|age| --- shows when this route was last used.
1662 \item \verb|used| --- the number of lookups of this route since its creation.
1663 \end{itemize}
1664
1665
1666 \subsection{{\tt ip route flush} --- flush routing tables}
1667 \label{IP-ROUTE-FLUSH}
1668
1669 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1670
1671 \paragraph{Description:} this command flushes routes selected
1672 by some criteria.
1673
1674 \paragraph{Arguments:} the arguments have the same syntax and semantics
1675 as the arguments of \verb|ip route show|, but routing tables are not
1676 listed but purged. The only difference is the default action: \verb|show|
1677 dumps all the IP main routing table but \verb|flush| prints the helper page.
1678 The reason for this difference does not require any explanation, does it?
1679
1680
1681 \paragraph{Statistics:} With the \verb|-statistics| option, the command
1682 becomes verbose. It prints out the number of deleted routes and the number
1683 of rounds made to flush the routing table. If the option is given
1684 twice, \verb|ip route flush| also dumps all the deleted routes
1685 in the format described in the previous subsection.
1686
1687 \paragraph{Examples:} The first example flushes all the
1688 gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1689 \begin{verbatim}
1690 netadm@amber:~ # ip -4 ro flush scope global type unicast
1691 \end{verbatim}
1692 This option deserves to be put into a scriptlet \verb|routef|.
1693 \begin{NB}
1694 This option was described in the \verb|route(8)| man page borrowed
1695 from BSD, but was never implemented in Linux.
1696 \end{NB}
1697
1698 The second example flushes all IPv6 cloned routes:
1699 \begin{verbatim}
1700 netadm@amber:~ # ip -6 -s -s ro flush cache
1701 3ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \
1702   dev eth0  metric 0
1703     cache  used 2 age 12sec mtu 1500 rtt 300
1704 3ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \
1705   dev eth0  metric 0
1706     cache  used 2 age 15sec mtu 1500 rtt 300
1707 3ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \
1708   dev eth0  metric 0
1709     cache  users 1 used 1 age 23sec mtu 1500 rtt 300
1710 3ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \
1711   dev eth1  metric 0
1712     cache  used 2 age 20sec mtu 1500 rtt 300
1713 3ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \
1714   dev eth1  metric 0
1715     cache  used 2 age 33sec mtu 1500 rtt 300
1716 ff02::1 via ff02::1 dev eth1  metric 0
1717     cache  users 1 used 1 age 45sec mtu 1500 rtt 300
1718
1719 *** Round 1, deleting 6 entries ***
1720 *** Flush is complete after 1 round ***
1721 netadm@amber:~ # ip -6 -s -s ro flush cache
1722 Nothing to flush.
1723 netadm@amber:~ #
1724 \end{verbatim}
1725
1726 The third example flushes BGP routing tables after a \verb|gated|
1727 death.
1728 \begin{verbatim}
1729 netadm@amber:~ # ip ro ls proto gated/bgp | wc
1730    1408    9856    78730
1731 netadm@amber:~ # ip -s ro f proto gated/bgp
1732
1733 *** Round 1, deleting 1408 entries ***
1734 *** Flush is complete after 1 round ***
1735 netadm@amber:~ # ip ro f proto gated/bgp
1736 Nothing to flush.
1737 netadm@amber:~ # ip ro ls proto gated/bgp
1738 netadm@amber:~ #
1739 \end{verbatim}
1740
1741
1742 \subsection{{\tt ip route get} --- get a single route}
1743 \label{IP-ROUTE-GET}
1744
1745 \paragraph{Abbreviations:} \verb|get|, \verb|g|.
1746
1747 \paragraph{Description:} this command gets a single route to a destination
1748 and prints its contents exactly as the kernel sees it.
1749
1750 \paragraph{Arguments:}
1751 \begin{itemize}
1752 \item \verb|to ADDRESS| (default)
1753
1754 --- the destination address.
1755
1756 \item \verb|from ADDRESS|
1757
1758 --- the source address.
1759
1760 \item \verb|tos TOS| or \verb|dsfield TOS|
1761
1762 --- the Type Of Service.
1763
1764 \item \verb|iif NAME|
1765
1766 --- the device from which this packet is expected to arrive.
1767
1768 \item \verb|oif NAME|
1769
1770 --- force the output device on which this packet will be routed.
1771
1772 \item \verb|connected|
1773
1774 --- if no source address (option \verb|from|) was given, relookup
1775 the route with the source set to the preferred address received from the first lookup.
1776 If policy routing is used, it may be a different route.
1777
1778 \end{itemize}
1779
1780 Note that this operation is not equivalent to \verb|ip route show|.
1781 \verb|show| shows existing routes. \verb|get| resolves them and
1782 creates new clones if necessary. Essentially, \verb|get|
1783 is equivalent to sending a packet along this path.
1784 If the \verb|iif| argument is not given, the kernel creates a route
1785 to output packets towards the requested destination.
1786 This is equivalent to pinging the destination
1787 with a subsequent {\tt ip route ls cache}, however, no packets are
1788 actually sent. With the \verb|iif| argument, the kernel pretends
1789 that a packet arrived from this interface and searches for
1790 a path to forward the packet.
1791
1792 \paragraph{Output format:} This command outputs routes in the same
1793 format as \verb|ip route ls|.
1794
1795 \paragraph{Examples:}
1796 \begin{itemize}
1797 \item Find a route to output packets to 193.233.7.82:
1798 \begin{verbatim}
1799 kuznet@amber:~ $ ip route get 193.233.7.82
1800 193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
1801     cache  mtu 1500 rtt 300
1802 kuznet@amber:~ $
1803 \end{verbatim}
1804
1805 \item Find a route to forward packets arriving on \verb|eth0|
1806 from 193.233.7.82 and destined for 193.233.7.82:
1807 \begin{verbatim}
1808 kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0
1809 193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1810   realms inr.ac/inr.ac
1811     cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1812 kuznet@amber:~ $
1813 \end{verbatim}
1814 \begin{NB}
1815   \label{NB-nature-of-strangeness}
1816   This is the command that created the funny route from 193.233.7.82
1817   looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}).
1818   Note the \verb|redirect| flag on it.
1819 \end{NB}
1820
1821 \item Find a multicast route for packets arriving on \verb|eth0|
1822 from host 193.233.7.82 and destined for multicast group 224.2.127.254
1823 (it is assumed that a multicast routing daemon is running.
1824 In this case, it is \verb|pimd|)
1825 \begin{verbatim}
1826 kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0
1827 multicast 224.2.127.254 from 193.233.7.82 dev lo  \
1828   src 193.233.7.65 realms inr.ac/cosmos
1829     cache <mc> iif eth0 Oifs: eth1 pimreg
1830 kuznet@amber:~ $
1831 \end{verbatim}
1832 This route differs from the ones seen before. It contains a ``normal'' part
1833 and a ``multicast'' part. The normal part is used to deliver (or not to
1834 deliver) the packet to local IP listeners. In this case the router
1835 is not a member
1836 of this group, so that route has no \verb|local| flag and only
1837 forwards packets. The output device for such entries is always loopback.
1838 The multicast part consists of an additional \verb|Oifs:| list showing
1839 the output interfaces.
1840 \end{itemize}
1841
1842
1843 It is time for a more complicated example. Let us add an invalid
1844 gatewayed route for a destination which is really directly connected:
1845 \begin{verbatim}
1846 netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254
1847 netadm@alisa:~ # ip route get 193.233.7.98
1848 193.233.7.98 via 193.233.7.254 dev eth0  src 193.233.7.90
1849     cache  mtu 1500 rtt 3072
1850 netadm@alisa:~ #
1851 \end{verbatim}
1852 and probe it with ping:
1853 \begin{verbatim}
1854 netadm@alisa:~ # ping -n 193.233.7.98
1855 PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes
1856 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
1857 64 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms
1858 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
1859 64 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms
1860 64 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms
1861 64 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms
1862 64 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms
1863 ^C
1864 --- 193.233.7.98 ping statistics ---
1865 5 packets transmitted, 5 packets received, 0% packet loss
1866 round-trip min/avg/max = 0.4/1.3/3.5 ms
1867 netadm@alisa:~ #
1868 \end{verbatim}
1869 What happened? Router 193.233.7.254 understood that we have a much
1870 better path to the destination and sent us an ICMP redirect message.
1871 We may retry \verb|ip route get| to see what we have in the routing
1872 tables now:
1873 \begin{verbatim}
1874 netadm@alisa:~ # ip route get 193.233.7.98
1875 193.233.7.98 dev eth0  src 193.233.7.90
1876     cache <redirected>  mtu 1500 rtt 3072
1877 netadm@alisa:~ #
1878 \end{verbatim}
1879
1880
1881
1882 \section{{\tt ip rule} --- routing policy database management}
1883 \label{IP-RULE}
1884
1885 \paragraph{Abbreviations:} \verb|rule|, \verb|ru|.
1886
1887 \paragraph{Object:} \verb|rule|s in the routing policy database control
1888 the route selection algorithm.
1889
1890 Classic routing algorithms used in the Internet make routing decisions
1891 based only on the destination address of packets (and in theory,
1892 but not in practice, on the TOS field). The seminal review of classic
1893 routing algorithms and their modifications can be found in~\cite{RFC1812}.
1894
1895 In some circumstances we want to route packets differently depending not only
1896 on destination addresses, but also on other packet fields: source address,
1897 IP protocol, transport protocol ports or even packet payload.
1898 This task is called ``policy routing''.
1899
1900 \begin{NB}
1901   ``policy routing'' $\neq$ ``routing policy''.
1902
1903 \noindent       ``policy routing'' $=$ ``cunning routing''.
1904
1905 \noindent       ``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1906 \end{NB}
1907
1908 To solve this task, the conventional destination based routing table, ordered
1909 according to the longest match rule, is replaced with a ``routing policy
1910 database'' (or RPDB), which selects routes
1911 by executing some set of rules. The rules may have lots of keys of different
1912 natures and therefore they have no natural ordering, but one imposed
1913 by the administrator. Linux-2.2 RPDB is a linear list of rules
1914 ordered by numeric priority value.
1915 RPDB explicitly allows matching a few packet fields:
1916
1917 \begin{itemize}
1918 \item packet source address.
1919 \item packet destination address.
1920 \item TOS.
1921 \item incoming interface (which is packet metadata, rather than a packet field).
1922 \end{itemize}
1923
1924 Matching IP protocols and transport ports is also possible,
1925 indirectly, via \verb|ipchains|, by exploiting their ability
1926 to mark some classes of packets with \verb|fwmark|. Therefore,
1927 \verb|fwmark| is also included in the set of keys checked by rules.
1928
1929 Each policy routing rule consists of a {\em selector\/} and an {\em action\/}
1930 predicate. The RPDB is scanned in the order of increasing priority. The selector
1931 of each rule is applied to \{source address, destination address, incoming
1932 interface, tos, fwmark\} and, if the selector matches the packet,
1933 the action is performed.  The action predicate may return with success.
1934 In this case, it will either give a route or failure indication
1935 and the RPDB lookup is terminated. Otherwise, the RPDB program
1936 continues on the next rule.
1937
1938 What is the action, semantically? The natural action is to select the
1939 nexthop and the output device. This is what
1940 Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''.
1941 The Linux-2.2 approach is more flexible. The action includes
1942 lookups in destination-based routing tables and selecting
1943 a route from these tables according to the classic longest match algorithm.
1944 The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1945 when a second level routing table contains a single default route.
1946 Recall that Linux-2.2 supports multiple tables
1947 managed with the \verb|ip route| command, described in the previous section.
1948
1949 At startup time the kernel configures the default RPDB consisting of three
1950 rules:
1951
1952 \begin{enumerate}
1953 \item Priority: 0, Selector: match anything, Action: lookup routing
1954 table \verb|local| (ID 255).
1955 The \verb|local| table is a special routing table containing
1956 high priority control routes for local and broadcast addresses.
1957
1958 Rule 0 is special. It cannot be deleted or overridden.
1959
1960
1961 \item Priority: 32766, Selector: match anything, Action: lookup routing
1962 table \verb|main| (ID 254).
1963 The \verb|main| table is the normal routing table containing all non-policy
1964 routes. This rule may be deleted and/or overridden with other
1965 ones by the administrator.
1966
1967 \item Priority: 32767, Selector: match anything, Action: lookup routing
1968 table \verb|default| (ID 253).
1969 The \verb|default| table is empty. It is reserved for some
1970 post-processing if no previous default rules selected the packet.
1971 This rule may also be deleted.
1972
1973 \end{enumerate}
1974
1975 Do not confuse routing tables with rules: rules point to routing tables,
1976 several rules may refer to one routing table and some routing tables
1977 may have no rules pointing to them. If the administrator deletes all the rules
1978 referring to a table, the table is not used, but it still exists
1979 and will disappear only after all the routes contained in it are deleted.
1980
1981
1982 \paragraph{Rule attributes:} Each RPDB entry has additional
1983 attributes. F.e.\ each rule has a pointer to some routing
1984 table. NAT and masquerading rules have an attribute to select new IP
1985 address to translate/masquerade. Besides that, rules have some
1986 optional attributes, which routes have, namely \verb|realms|.
1987 These values do not override those contained in the routing tables. They
1988 are only used if the route did not select any attributes.
1989
1990
1991 \paragraph{Rule types:} The RPDB may contain rules of the following
1992 types:
1993 \begin{itemize}
1994 \item \verb|unicast| --- the rule prescribes to return the route found
1995 in the routing table referenced by the rule.
1996 \item \verb|blackhole| --- the rule prescribes to silently drop the packet.
1997 \item \verb|unreachable| --- the rule prescribes to generate a ``Network
1998 is unreachable'' error.
1999 \item \verb|prohibit| --- the rule prescribes to generate
2000 ``Communication is administratively prohibited'' error.
2001 \item \verb|nat| --- the rule prescribes to translate the source address
2002 of the IP packet into some other value. More about NAT is
2003 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
2004 \end{itemize}
2005
2006
2007 \paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show|
2008 (or \verb|list|).
2009
2010 \subsection{{\tt ip rule add} --- insert a new rule\\
2011         {\tt ip rule delete} --- delete a rule}
2012 \label{IP-RULE-ADD}
2013
2014 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|,
2015         \verb|d|.
2016
2017 \paragraph{Arguments:}
2018
2019 \begin{itemize}
2020 \item \verb|type TYPE| (default)
2021
2022 --- the type of this rule. The list of valid types was given in the previous
2023 subsection.
2024
2025 \item \verb|from PREFIX|
2026
2027 --- select the source prefix to match.
2028
2029 \item \verb|to PREFIX|
2030
2031 --- select the destination prefix to match.
2032
2033 \item \verb|iif NAME|
2034
2035 --- select the incoming device to match. If the interface is loopback,
2036 the rule only matches packets originating from this host. This means that you
2037 may create separate routing tables for forwarded and local packets and,
2038 hence, completely segregate them.
2039
2040 \item \verb|tos TOS| or \verb|dsfield TOS|
2041
2042 --- select the TOS value to match.
2043
2044 \item \verb|fwmark MARK|
2045
2046 --- select the \verb|fwmark| value to match.
2047
2048 \item \verb|priority PREFERENCE|
2049
2050 --- the priority of this rule. Each rule should have an explicitly
2051 set {\em unique\/} priority value.
2052 \begin{NB}
2053   Really, for historical reasons \verb|ip rule add| does not require a
2054   priority value and allows them to be non-unique.
2055   If the user does not supplied a priority, it is selected by the kernel.
2056   If the user creates a rule with a priority value that
2057   already exists, the kernel does not reject the request. It adds
2058   the new rule before all old rules of the same priority.
2059
2060   It is mistake in design, no more. And it will be fixed one day,
2061   so do not rely on this feature. Use explicit priorities.
2062 \end{NB}
2063
2064
2065 \item \verb|table TABLEID|
2066
2067 --- the routing table identifier to lookup if the rule selector matches.
2068
2069 \item \verb|realms FROM/TO|
2070
2071 --- Realms to select if the rule matched and the routing table lookup
2072 succeeded. Realm \verb|TO| is only used if the route did not select
2073 any realm.
2074
2075 \item \verb|nat ADDRESS|
2076
2077 --- The base of the IP address block to translate (for source addresses).
2078 The \verb|ADDRESS| may be either the start of the block of NAT addresses
2079 (selected by NAT routes) or in linux-2.2 a local host address (or even zero).
2080 In the last case the router does not translate the packets,
2081 but masquerades them to this address; this feature disappered in 2.4.
2082 More about NAT is in Appendix~\ref{ROUTE-NAT},
2083 p.\pageref{ROUTE-NAT}.
2084
2085 \end{itemize}
2086
2087 \paragraph{Warning:} Changes to the RPDB made with these commands
2088 do not become active immediately. It is assumed that after
2089 a script finishes a batch of updates, it flushes the routing cache
2090 with \verb|ip route flush cache|.
2091
2092 \paragraph{Examples:}
2093 \begin{itemize}
2094 \item Route packets with source addresses from 192.203.80/24
2095 according to routing table \verb|inr.ruhep|:
2096 \begin{verbatim}
2097 ip ru add from 192.203.80.0/24 table inr.ruhep prio 220
2098 \end{verbatim}
2099
2100 \item Translate packet source address 193.233.7.83 into 192.203.80.144
2101 and route it according to table \#1 (actually, it is \verb|inr.ruhep|):
2102 \begin{verbatim}
2103 ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320
2104 \end{verbatim}
2105
2106 \item Delete the unused default rule:
2107 \begin{verbatim}
2108 ip ru del prio 32767
2109 \end{verbatim}
2110
2111 \end{itemize}
2112
2113
2114
2115 \subsection{{\tt ip rule show} --- list rules}
2116 \label{IP-RULE-SHOW}
2117
2118 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2119
2120
2121 \paragraph{Arguments:} Good news, this is one command that has no arguments.
2122
2123 \paragraph{Output format:}
2124
2125 \begin{verbatim}
2126 kuznet@amber:~ $ ip ru ls
2127 0:      from all lookup local
2128 200:    from 192.203.80.0/24 to 193.233.7.0/24 lookup main
2129 210:    from 192.203.80.0/24 to 192.203.80.0/24 lookup main
2130 220:    from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
2131 300:    from 193.233.7.83 to 193.233.7.0/24 lookup main
2132 310:    from 193.233.7.83 to 192.203.80.0/24 lookup main
2133 320:    from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2134 32766:  from all lookup main
2135 kuznet@amber:~ $
2136 \end{verbatim}
2137
2138 In the first column is the rule priority value followed
2139 by a colon. Then the selectors follow. Each key is prefixed
2140 with the same keyword that was used to create the rule.
2141
2142 The keyword \verb|lookup| is followed by a routing table identifier,
2143 as it is recorded in the file \verb|/etc/iproute2/rt_tables|.
2144
2145 If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword
2146 \verb|map-to| followed by the start of the block of addresses to map.
2147
2148 The sense of this example is pretty simple. The prefixes
2149 192.203.80.0/24 and 193.233.7.0/24 form the internal network, but
2150 they are routed differently when the packets leave it.
2151 Besides that, the host 193.233.7.83 is translated into
2152 another prefix to look like 192.203.80.144 when talking
2153 to the outer world.
2154
2155
2156
2157 \section{{\tt ip maddress} --- multicast addresses management}
2158 \label{IP-MADDR}
2159
2160 \paragraph{Object:} \verb|maddress| objects are multicast addresses.
2161
2162 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|).
2163
2164 \subsection{{\tt ip maddress show} --- list multicast addresses}
2165
2166 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2167
2168 \paragraph{Arguments:}
2169
2170 \begin{itemize}
2171
2172 \item \verb|dev NAME| (default)
2173
2174 --- the device name.
2175
2176 \end{itemize}
2177
2178 \paragraph{Output format:}
2179
2180 \begin{verbatim}
2181 kuznet@alisa:~ $ ip maddr ls dummy
2182 2:  dummy
2183     link  33:33:00:00:00:01
2184     link  01:00:5e:00:00:01
2185     inet  224.0.0.1 users 2
2186     inet6 ff02::1
2187 kuznet@alisa:~ $
2188 \end{verbatim}
2189
2190 The first line of the output shows the interface index and its name.
2191 Then the multicast address list follows. Each line starts with the
2192 protocol identifier. The word \verb|link| denotes a link layer
2193 multicast addresses.
2194
2195 If a multicast address has more than one user, the number
2196 of users is shown after the \verb|users| keyword.
2197
2198 One additional feature not present in the example above
2199 is the \verb|static| flag, which indicates that the address was joined
2200 with \verb|ip maddr add|. See the following subsection.
2201
2202
2203
2204 \subsection{{\tt ip maddress add} --- add a multicast address\\
2205             {\tt ip maddress delete} --- delete a multicast address}
2206
2207 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|.
2208
2209 \paragraph{Description:} these commands attach/detach
2210 a static link layer multicast address to listen on the interface.
2211 Note that it is impossible to join protocol multicast groups
2212 statically. This command only manages link layer addresses.
2213
2214
2215 \paragraph{Arguments:}
2216
2217 \begin{itemize}
2218 \item \verb|address LLADDRESS| (default)
2219
2220 --- the link layer multicast address.
2221
2222 \item \verb|dev NAME|
2223
2224 --- the device to join/leave this multicast address.
2225
2226 \end{itemize}
2227
2228
2229 \paragraph{Example:} Let us continue with the example from the previous subsection.
2230
2231 \begin{verbatim}
2232 netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy
2233 netadm@alisa:~ # ip -0 maddr ls dummy
2234 2:  dummy
2235     link  33:33:00:00:00:01 users 2 static
2236     link  01:00:5e:00:00:01
2237 netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy
2238 \end{verbatim}
2239
2240 \begin{NB}
2241  Neither \verb|ip| nor the kernel check for multicast address validity.
2242  Particularly, this means that you can try to load a unicast address
2243  instead of a multicast address. Most drivers will ignore such addresses,
2244  but several (f.e.\ Tulip) will intern it to their on-board filter.
2245  The effects may be strange. Namely, the addresses become additional
2246  local link addresses and, if you loaded the address of another host
2247  to the router, wait for duplicated packets on the wire.
2248  It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2249  This feature is really more useful for traffic monitoring, but using it
2250  with Linux-2.2 you {\em have to\/} be sure that the host is not
2251  a router and, especially, that it is not a transparent proxy or masquerading
2252  agent.
2253 \end{NB}
2254
2255
2256
2257 \section{{\tt ip mroute} --- multicast routing cache management}
2258 \label{IP-MROUTE}
2259
2260 \paragraph{Abbreviations:} \verb|mroute|, \verb|mr|.
2261
2262 \paragraph{Object:} \verb|mroute| objects are multicast routing cache
2263 entries created by a user level mrouting daemon
2264 (f.e.\ \verb|pimd| or \verb|mrouted|).
2265
2266 Due to the limitations of the current interface to the multicast routing
2267 engine, it is impossible to change \verb|mroute| objects administratively,
2268 so we may only display them. This limitation will be removed
2269 in the future.
2270
2271 \paragraph{Commands:} \verb|show| (or \verb|list|).
2272
2273
2274 \subsection{{\tt ip mroute show} --- list mroute cache entries}
2275
2276 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2277
2278 \paragraph{Arguments:}
2279
2280 \begin{itemize}
2281 \item \verb|to PREFIX| (default)
2282
2283 --- the prefix selecting the destination multicast addresses to list.
2284
2285
2286 \item \verb|iif NAME|
2287
2288 --- the interface on which multicast packets are received.
2289
2290
2291 \item \verb|from PREFIX|
2292
2293 --- the prefix selecting the IP source addresses of the multicast route.
2294
2295
2296 \end{itemize}
2297
2298 \paragraph{Output format:}
2299
2300 \begin{verbatim}
2301 kuznet@amber:~ $ ip mroute ls
2302 (193.232.127.6, 224.0.1.39)      Iif: unresolved
2303 (193.232.244.34, 224.0.1.40)     Iif: unresolved
2304 (193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg
2305 kuznet@amber:~ $
2306 \end{verbatim}
2307
2308 Each line shows one (S,G) entry in the multicast routing cache,
2309 where S is the source address and G is the multicast group. \verb|Iif| is
2310 the interface on which multicast packets are expected to arrive.
2311 If the word \verb|unresolved| is there instead of the interface name,
2312 it means that the routing daemon still hasn't resolved this entry.
2313 The keyword \verb|oifs| is followed by a list of output interfaces, separated
2314 by spaces. If a multicast routing entry is created with non-trivial
2315 TTL scope, administrative distances are appended to the device names
2316 in the \verb|oifs| list.
2317
2318 \paragraph{Statistics:} The \verb|-statistics| option also prints the
2319 number of packets and bytes forwarded along this route and
2320 the number of packets that arrived on the wrong interface, if this number is not zero.
2321
2322 \begin{verbatim}
2323 kuznet@amber:~ $ ip -s mr ls 224.66/16
2324 (193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg
2325   9383 packets, 300256 bytes
2326 kuznet@amber:~ $
2327 \end{verbatim}
2328
2329
2330 \section{{\tt ip tunnel} --- tunnel configuration}
2331 \label{IP-TUNNEL}
2332
2333 \paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|.
2334
2335 \paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating
2336 packets in IPv4 packets and then sending them over the IP infrastructure.
2337
2338 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show|
2339 (or \verb|list|).
2340
2341 \paragraph{See also:} A more informal discussion of tunneling
2342 over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}.
2343
2344 \subsection{{\tt ip tunnel add} --- add a new tunnel\\
2345         {\tt ip tunnel change} --- change an existing tunnel\\
2346         {\tt ip tunnel delete} --- destroy a tunnel}
2347
2348 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
2349 \verb|delete|, \verb|del|, \verb|d|.
2350
2351
2352 \paragraph{Arguments:}
2353
2354 \begin{itemize}
2355
2356 \item \verb|name NAME| (default)
2357
2358 --- select the tunnel device name.
2359
2360 \item \verb|mode MODE|
2361
2362 --- set the tunnel mode. Three modes are currently available:
2363         \verb|ipip|, \verb|sit| and \verb|gre|.
2364
2365 \item \verb|remote ADDRESS|
2366
2367 --- set the remote endpoint of the tunnel.
2368
2369 \item \verb|local ADDRESS|
2370
2371 --- set the fixed local address for tunneled packets.
2372 It must be an address on another interface of this host.
2373
2374 \item \verb|ttl N|
2375
2376 --- set a fixed TTL \verb|N| on tunneled packets.
2377         \verb|N| is a number in the range 1--255. 0 is a special value
2378         meaning that packets inherit the TTL value.
2379                 The default value is: \verb|inherit|.
2380
2381 \item \verb|tos T| or \verb|dsfield T|
2382
2383 --- set a fixed TOS \verb|T| on tunneled packets.
2384                 The default value is: \verb|inherit|.
2385
2386
2387
2388 \item \verb|dev NAME|
2389
2390 --- bind the tunnel to the device \verb|NAME| so that
2391         tunneled packets will only be routed via this device and will
2392         not be able to escape to another device when the route to endpoint changes.
2393
2394 \item \verb|nopmtudisc|
2395
2396 --- disable Path MTU Discovery on this tunnel.
2397         It is enabled by default. Note that a fixed ttl is incompatible
2398         with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2399
2400 \item \verb|key K|, \verb|ikey K|, \verb|okey K|
2401
2402 --- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is
2403         either a number or an IP address-like dotted quad.
2404    The \verb|key| parameter sets the key to use in both directions.
2405    The \verb|ikey| and \verb|okey| parameters set different keys for input and output.
2406
2407
2408 \item \verb|csum|, \verb|icsum|, \verb|ocsum|
2409
2410 --- (only GRE tunnels) generate/require checksums for tunneled packets.
2411    The \verb|ocsum| flag calculates checksums for outgoing packets.
2412    The \verb|icsum| flag requires that all input packets have the correct
2413    checksum. The \verb|csum| flag is equivalent to the combination
2414   ``\verb|icsum| \verb|ocsum|''.
2415
2416 \item \verb|seq|, \verb|iseq|, \verb|oseq|
2417
2418 --- (only GRE tunnels) serialize packets.
2419    The \verb|oseq| flag enables sequencing of outgoing packets.
2420    The \verb|iseq| flag requires that all input packets are serialized.
2421    The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''.
2422
2423 \begin{NB}
2424  I think this option does not
2425         work. At least, I did not test it, did not debug it and
2426         do not even understand how it is supposed to work or for what
2427         purpose Cisco planned to use it. Do not use it.
2428 \end{NB}
2429
2430
2431 \end{itemize}
2432
2433 \paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32.
2434 \begin{verbatim}
2435 netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \
2436     local 192.203.80.142 ttl 32
2437 \end{verbatim}
2438
2439 \subsection{{\tt ip tunnel show} --- list tunnels}
2440
2441 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2442
2443
2444 \paragraph{Arguments:} None.
2445
2446 \paragraph{Output format:}
2447 \begin{verbatim}
2448 kuznet@amber:~ $ ip tunl ls Cisco
2449 Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32
2450 kuznet@amber:~ $
2451 \end{verbatim}
2452 The line starts with the tunnel device name followed by a colon.
2453 Then the tunnel mode follows. The parameters of the tunnel are listed
2454 with the same keywords that were used when creating the tunnel.
2455
2456 \paragraph{Statistics:}
2457
2458 \begin{verbatim}
2459 kuznet@amber:~ $ ip -s tunl ls Cisco
2460 Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32
2461 RX: Packets    Bytes        Errors CsumErrs OutOfSeq Mcasts
2462     12566      1707516      0      0        0        0
2463 TX: Packets    Bytes        Errors DeadLoop NoRoute  NoBufs
2464     13445      1879677      0      0        0        0
2465 kuznet@amber:~ $
2466 \end{verbatim}
2467 Essentially, these numbers are the same as the numbers
2468 printed with {\tt ip -s link show}
2469 (sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different
2470 to reflect that they are tunnel specific.
2471 \begin{itemize}
2472 \item \verb|CsumErrs| --- the total number of packets dropped
2473 because of checksum failures for a GRE tunnel with checksumming enabled.
2474 \item \verb|OutOfSeq| --- the total number of packets dropped
2475 because they arrived out of sequence for a GRE tunnel with
2476 serialization enabled.
2477 \item \verb|Mcasts| --- the total number of multicast packets
2478 received on a broadcast GRE tunnel.
2479 \item \verb|DeadLoop| --- the total number of packets which were not
2480 transmitted because the tunnel is looped back to itself.
2481 \item \verb|NoRoute| --- the total number of packets which were not
2482 transmitted because there is no IP route to the remote endpoint.
2483 \item \verb|NoBufs| --- the total number of packets which were not
2484 transmitted because the kernel failed to allocate a buffer.
2485 \end{itemize}
2486
2487
2488 \section{{\tt ip monitor} and {\tt rtmon} --- state monitoring}
2489 \label{IP-MONITOR}
2490
2491 The \verb|ip| utility can monitor the state of devices, addresses
2492 and routes continuously. This option has a slightly different format.
2493 Namely,
2494 the \verb|monitor| command is the first in the command line and then
2495 the object list follows:
2496 \begin{verbatim}
2497   ip monitor [ file FILE ] [ all | OBJECT-LIST ]
2498 \end{verbatim}
2499 \verb|OBJECT-LIST| is the list of object types that we want to monitor.
2500 It may contain \verb|link|, \verb|address| and \verb|route|.
2501 If no \verb|file| argument is given, \verb|ip| opens RTNETLINK,
2502 listens on it and dumps state changes in the format described
2503 in previous sections.
2504
2505 If a file name is given, it does not listen on RTNETLINK,
2506 but opens the file containing RTNETLINK messages saved in binary format
2507 and dumps them. Such a history file can be generated with the
2508 \verb|rtmon| utility. This utility has a command line syntax similar to
2509 \verb|ip monitor|.
2510 Ideally, \verb|rtmon| should be started before
2511 the first network configuration command is issued. F.e.\ if
2512 you insert:
2513 \begin{verbatim}
2514   rtmon file /var/log/rtmon.log
2515 \end{verbatim}
2516 in a startup script, you will be able to view the full history
2517 later.
2518
2519 Certainly, it is possible to start \verb|rtmon| at any time.
2520 It prepends the history with the state snapshot dumped at the moment
2521 of starting.
2522
2523
2524 \section{Route realms and policy propagation, {\tt rtacct}}
2525 \label{RT-REALMS}
2526
2527 On routers using OSPF ASE or, especially, the BGP protocol, routing
2528 tables may be huge. If we want to classify or to account for the packets
2529 per route, we will have to keep lots of information. Even worse, if we
2530 want to distinguish the packets not only by their destination, but
2531 also by their source, the task gets quadratic complexity and its solution
2532 is physically impossible.
2533
2534 One approach to propagating the policy from routing protocols
2535 to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}.
2536 Essentially, Cisco Policy Propagation via BGP is based on the fact
2537 that dedicated routers all have the RIB (Routing Information Base)
2538 close to the forwarding engine, so policy routing rules can
2539 check all the route attributes, including ASPATH information
2540 and community strings.
2541
2542 The Linux architecture, splitting the RIB (maintained by a user level
2543 daemon) and the kernel based FIB (Forwarding Information Base),
2544 does not allow such a simple approach.
2545
2546 It is to our fortune because there is another solution
2547 which allows even more flexible policy and richer semantics.
2548
2549 Namely, routes can be clustered together in user space, based on their
2550 attributes.  F.e.\ a BGP router knows route ASPATH, its community;
2551 an OSPF router knows the route tag or its area. The administrator, when adding
2552 routes manually, also knows their nature. Providing that the number of such
2553 aggregates (we call them {\em realms\/}) is low, the task of full
2554 classification both by source and destination becomes quite manageable.
2555
2556 So each route may be assigned to a realm. It is assumed that
2557 this identification is made by a routing daemon, but static routes
2558 can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE},
2559 p.\pageref{IP-ROUTE}).
2560 \begin{NB}
2561   There is a patch to \verb|gated|, allowing classification of routes
2562   to realms with all the set of policy rules implemented in \verb|gated|:
2563   by prefix, by ASPATH, by origin, by tag etc.
2564 \end{NB}
2565
2566 To facilitate the construction (f.e.\ in case the routing
2567 daemon is not aware of realms), missing realms may be completed
2568 with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}.
2569
2570 For each packet the kernel calculates a tuple of realms: source realm
2571 and destination realm, using the following algorithm:
2572
2573 \begin{enumerate}
2574 \item If the route has a realm, the destination realm of the packet is set to it.
2575 \item If the rule has a source realm, the source realm of the packet is set to it.
2576 If the destination realm was not inherited from the route and the rule has a destination realm,
2577 it is also set.
2578 \item If at least one of the realms is still unknown, the kernel finds
2579 the reversed route to the source of the packet.
2580 \item If the source realm is still unknown, get it from the reversed route.
2581 \item If one of the realms is still unknown, swap the realms of reversed
2582 routes and apply step 2 again.
2583 \end{enumerate}
2584
2585 After this procedure is completed we know what realm the packet
2586 arrived from and the realm where it is going to propagate to.
2587 If some of the realms are unknown, they are initialized to zero
2588 (or realm \verb|unknown|).
2589
2590 The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF},
2591 where they are used to help assign packets to traffic classes,
2592 to account, police and schedule them according to this
2593 classification.
2594
2595 A much simpler but still very useful application is incoming packet
2596 accounting by realms. The kernel gathers a packet statistics summary
2597 which can be viewed with the \verb|rtacct| utility.
2598 \begin{verbatim}
2599 kuznet@amber:~ $ rtacct russia
2600 Realm      BytesTo    PktsTo     BytesFrom  PktsFrom
2601 russia     20576778   169176     47080168   153805
2602 kuznet@amber:~ $
2603 \end{verbatim}
2604 This shows that this router received 153805 packets from
2605 the realm \verb|russia| and forwarded 169176 packets to \verb|russia|.
2606 The realm \verb|russia| consists of routes with ASPATHs not leaving
2607 Russia.
2608
2609 Note that locally originating packets are not accounted here,
2610 \verb|rtacct| shows incoming packets only. Using the \verb|route|
2611 classifier (see~\cite{TC-CREF}) you can get even more detailed
2612 accounting information about outgoing packets, optionally
2613 summarizing traffic not only by source or destination, but
2614 by any pair of source and destination realms.
2615
2616
2617 \begin{thebibliography}{99}
2618 \addcontentsline{toc}{section}{References}
2619 \bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson.
2620 ``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461.
2621
2622 \bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten.
2623 ``IPv6 Stateless Address Autoconfiguration'', RFC-2462.
2624
2625 \bibitem{RFC1812} F.~Baker.
2626 ``Requirements for IP Version 4 Routers'', RFC-1812.
2627
2628 \bibitem{RFC1122} R.~T.~Braden.
2629 ``Requirements for Internet hosts --- communication layers'', RFC-1122.
2630
2631 \bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols
2632 Command Reference, Part 1'' and
2633 ``Cisco IOS Release 12.0 Quality of Service Solutions
2634 Configuration Guide: Configuring Policy-Based Routing'',\\
2635 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2636
2637 \bibitem{IP-TUNNELS} A.~N.~Kuznetsov.
2638 ``Tunnels over IP in Linux-2.2'', \\
2639 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2640
2641 \bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2642 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2643
2644 \bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions
2645 Configuration Guide: Configuring QoS Policy Propagation via
2646 Border Gateway Protocol'',\\
2647 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2648
2649 \bibitem{RFC-DHCP} R.~Droms.
2650 ``Dynamic Host Configuration Protocol.'', RFC-2131
2651
2652 \end{thebibliography}
2653
2654
2655
2656
2657 \appendix
2658 \addcontentsline{toc}{section}{Appendix}
2659
2660 \section{Source address selection}
2661 \label{ADDR-SEL}
2662
2663 When a host creates an IP packet, it must select some source
2664 address. Correct source address selection is a critical procedure,
2665 because it gives the receiver the information needed to deliver a
2666 reply. If the source is selected incorrectly, in the best case,
2667 the backward path may appear different to the forward one which
2668 is harmful for performance. In the worst case, when the addresses
2669 are administratively scoped, the reply may be lost entirely.
2670
2671 Linux-2.2 selects source addresses using the following algorithm:
2672
2673 \begin{itemize}
2674 \item
2675 The application may select a source address explicitly with \verb|bind(2)|
2676 syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object
2677 \verb|IP_PKTINFO|. In this case the kernel only checks the validity
2678 of the address and never tries to ``improve'' an incorrect user choice,
2679 generating an error instead.
2680 \begin{NB}
2681  Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks
2682  this axiom. It has been made deliberately with the purpose
2683  of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2684  However, this hack {\em must not\/} be used on multihomed hosts
2685  and especially on routers: it would break them.
2686 \end{NB}
2687
2688
2689 \item Otherwise, IP routing tables can contain an explicit source
2690 address hint for this destination. The hint is set with the \verb|src| parameter
2691 to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}.
2692
2693
2694 \item Otherwise, the kernel searches through the list of addresses
2695 attached to the interface through which the packets will be routed.
2696 The search strategies are different for IP and IPv6. Namely:
2697
2698 \begin{itemize}
2699 \item IPv6 searches for the first valid, not deprecated address
2700 with the same scope as the destination.
2701
2702 \item IP searches for the first valid address with a scope wider
2703 than the scope of the destination but it prefers addresses
2704 which fall to the same subnet as the nexthop of the route
2705 to the destination. Unlike IPv6, the scopes of IPv4 destinations
2706 are not encoded in their addresses but are supplied
2707 in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command,
2708 sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}).
2709
2710 \end{itemize}
2711
2712
2713 \item Otherwise, if the scope of the destination is \verb|link| or \verb|host|,
2714 the algorithm fails and returns a zero source address.
2715
2716 \item Otherwise, all interfaces are scanned to search for an address
2717 with an appropriate scope. The loopback device \verb|lo| is always the first
2718 in the search list, so that if an address with global scope (not 127.0.0.1!)
2719 is configured on loopback, it is always preferred.
2720
2721 \end{itemize}
2722
2723
2724 \section{Proxy ARP/NDISC}
2725 \label{PROXY-NEIGH}
2726
2727 Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2728 In Linux-2.2 proxy ARP on an interface may be enabled
2729 by setting the kernel \verb|sysctl| variable
2730 \verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router
2731 starts to answer ARP requests on the interface \verb|<dev>|, provided
2732 the route to the requested destination does {\em not\/} go back via the same
2733 device.
2734
2735 The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2736 ARP on all the IP devices.
2737
2738 However, this approach fails in the case of IPv6 because the router
2739 must join the solicited node multicast address to listen for the corresponding
2740 NDISC queries. It means that proxy NDISC is possible only on a per destination
2741 basis.
2742
2743 Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2744 in user space. However, similar functionality was present in BSD kernels
2745 and in Linux-2.0, so we have to preserve it at least to the extent that
2746 is standardized in BSD.
2747 \begin{NB}
2748   Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP.
2749   It is replaced with the sysctl flag in Linux-2.2.
2750 \end{NB}
2751
2752
2753 The \verb|ip| utility provides a way to manage proxy ARP/NDISC
2754 with the \verb|ip neigh| command, namely:
2755 \begin{verbatim}
2756   ip neigh add proxy ADDRESS [ dev NAME ]
2757 \end{verbatim}
2758 adds a new proxy ARP/NDISC record and
2759 \begin{verbatim}
2760   ip neigh del proxy ADDRESS [ dev NAME ]
2761 \end{verbatim}
2762 deletes it.
2763
2764 If the name of the device is not given, the router will answer solicitations
2765 for address \verb|ADDRESS| on all devices, otherwise it will only serve
2766 the device \verb|NAME|. Even if the proxy entry is created with
2767 \verb|ip neigh|, the router {\em will not\/} answer a query if the route
2768 to the destination goes back via the interface from which the solicitation
2769 was received.
2770
2771 It is important to emphasize that proxy entries have {\em no\/}
2772 parameters other than these (IP/IPv6 address and optional device).
2773 Particularly, the entry does not store any link layer address.
2774 It always advertises the station address of the interface
2775 on which it sends advertisements (i.e. it's own station address).
2776
2777 \section{Route NAT status}
2778 \label{ROUTE-NAT}
2779
2780 NAT (or ``Network Address Translation'') remaps some parts
2781 of the IP address space into other ones. Linux-2.2 route NAT is supposed
2782 to be used to facilitate policy routing by rewriting addresses
2783 to other routing domains or to help while renumbering sites
2784 to another prefix.
2785
2786 \paragraph{What it is not:}
2787 It is necessary to emphasize that {\em it is not supposed\/}
2788 to be used to compress address space or to split load.
2789 This is not missing functionality but a design principle.
2790 Route NAT is {\em stateless\/}. It does not hold any state
2791 about translated sessions. This means that it handles any number
2792 of sessions flawlessly. But it also means that it is {\em static\/}.
2793 It cannot detect the moment when the last TCP client stops
2794 using an address. For the same reason, it will not help to split
2795 load between several servers.
2796 \begin{NB}
2797 It is a pretty commonly held belief that it is useful to split load between
2798 several servers with NAT. This is a mistake. All you get from this
2799 is the requirement that the router keep the state of all the TCP connections
2800 going via it. Well, if the router is so powerful, run apache on it. 8)
2801 \end{NB}
2802
2803 The second feature: it does not touch packet payload,
2804 does not try to ``improve'' broken protocols by looking
2805 through its data and mangling it. It mangles IP addresses,
2806 only IP addresses and nothing but IP addresses.
2807 This also, is not missing any functionality.
2808
2809 To resume: if you need to compress address space or keep
2810 active FTP clients happy, your choice is not route NAT but masquerading,
2811 port forwarding, NAPT etc.
2812 \begin{NB}
2813 By the way, you may also want to look at
2814 http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
2815 \end{NB}
2816
2817
2818 \paragraph{How it works.}
2819 Some part of the address space is reserved for dummy addresses
2820 which will look for all the world like some host addresses
2821 inside your network. No other hosts may use these addresses,
2822 however other routers may also be configured to translate them.
2823 \begin{NB}
2824 A great advantage of route NAT is that it may be used not
2825 only in stub networks but in environments with arbitrarily complicated
2826 structure. It does not firewall, it {\em forwards.}
2827 \end{NB}
2828 These addresses are selected by the \verb|ip route| command
2829 (sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\
2830 \begin{verbatim}
2831   ip route add nat 192.203.80.144 via 193.233.7.83
2832 \end{verbatim}
2833 states that the single address 192.203.80.144 is a dummy NAT address.
2834 For all the world it looks like a host address inside our network.
2835 For neighbouring hosts and routers it looks like the local address
2836 of the translating router. The router answers ARP for it, advertises
2837 this address as routed via it, {\em et al\/}. When the router
2838 receives a packet destined for 192.203.80.144, it replaces
2839 this address with 193.233.7.83 which is the address of some real
2840 host and forwards the packet. If you need to remap
2841 blocks of addresses, you may use a command like:
2842 \begin{verbatim}
2843   ip route add nat 192.203.80.192/26 via 193.233.7.64
2844 \end{verbatim}
2845 This command will map a block of 63 addresses 192.203.80.192-255 to
2846 193.233.7.64-127.
2847
2848 When an internal host (193.233.7.83 in the example above)
2849 sends something to the outer world and these packets are forwarded
2850 by our router, it should translate the source address 193.233.7.83
2851 into 192.203.80.144. This task is solved by setting a special
2852 policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}):
2853 \begin{verbatim}
2854   ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144
2855 \end{verbatim}
2856 This rule says that the source address 193.233.7.83
2857 should be translated into 192.203.80.144 before forwarding.
2858 It is important that the address after the \verb|nat| keyword
2859 is some NAT address, declared by {\tt ip route add nat}.
2860 If it is just a random address the router will not map to it.
2861 \begin{NB}
2862 The exception is when the address is a local address of this
2863 router (or 0.0.0.0) and masquerading is configured in the linux-2.2
2864 kernel. In this case the router will masquerade the packets as this address.
2865 If 0.0.0.0 is selected, the result is equivalent to one
2866 obtained with firewalling rules. Otherwise, you have the way
2867 to order Linux to masquerade to this fixed address.
2868 NAT mechanism used in linux-2.4 is more flexible than
2869 masquerading, so that this feature has lost meaning and disabled.
2870 \end{NB}
2871
2872 If the network has non-trivial internal structure, it is
2873 useful and even necessary to add rules disabling translation
2874 when a packet does not leave this network. Let us return to the
2875 example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}).
2876 \begin{verbatim}
2877 300:    from 193.233.7.83 to 193.233.7.0/24 lookup main
2878 310:    from 193.233.7.83 to 192.203.80.0/24 lookup main
2879 320:    from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2880 \end{verbatim}
2881 This block of rules causes normal forwarding when
2882 packets from 193.233.7.83 do not leave networks 193.233.7/24
2883 and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not
2884 contain a route to the destination (which means that the routing
2885 domain owning addresses from 192.203.80/24 is dead), no translation
2886 will occur. Otherwise, the packets are translated.
2887
2888 \paragraph{How to only translate selected ports:}
2889 If you only want to translate selected ports (f.e.\ http)
2890 and leave the rest intact, you may use \verb|ipchains|
2891 to \verb|fwmark| a class of packets.
2892 Suppose you did and all the packets from 193.233.7.83
2893 destined for port 80 are marked with marker 0x1234 in input fwchain.
2894 In this case you may replace rule \#320 with:
2895 \begin{verbatim}
2896 320:    from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144
2897 \end{verbatim}
2898 and translation will only be enabled for outgoing http requests.
2899
2900 \section{Example: minimal host setup}
2901 \label{EXAMPLE-SETUP}
2902
2903 The following script gives an example of a fault safe
2904 setup of IP (and IPv6, if it is compiled into the kernel)
2905 in the common case of a node attached to a single broadcast
2906 network. A more advanced script, which may be used both on multihomed
2907 hosts and on routers, is described in the following
2908 section.
2909
2910 The utilities used in the script may be found in the
2911 directory ftp://ftp.inr.ac.ru/ip-routing/:
2912 \begin{enumerate}
2913 \item \verb|ip| --- package \verb|iproute2|.
2914 \item \verb|arping| --- package \verb|iputils|.
2915 \item \verb|rdisc| --- package \verb|iputils|.
2916 \end{enumerate}
2917 \begin{NB}
2918 It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from
2919 recommending a good DHCP client to use. All that I can
2920 say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that
2921 can be found in the \verb|dhcp.bootp.rarp| subdirectory of
2922 the same ftp site {\em does\/} work,
2923 at least on Ethernet and Token Ring.
2924 \end{NB}
2925
2926 \begin{verbatim}
2927 #! /bin/bash
2928 \end{verbatim}
2929 \begin{flushleft}
2930 \# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\
2931 \# {\bf Parameters:}\\
2932 \# \$1 --- Static IP address, optionally followed by prefix length.\\
2933 \# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\
2934 \# F.e. \verb|ifone 193.233.7.90|
2935 \end{flushleft}
2936 \begin{verbatim}
2937 dev=$2
2938 : ${dev:=eth0}
2939 ipaddr=
2940 \end{verbatim}
2941 \# Parse IP address, splitting prefix length.
2942 \begin{verbatim}
2943 if [ "$1" != "" ]; then
2944   ipaddr=${1%/*}
2945   if [ "$1" != "$ipaddr" ]; then
2946     pfxlen=${1#*/}
2947   fi
2948   : ${pfxlen:=24}
2949 fi
2950 pfx="${ipaddr}/${pfxlen}"
2951 \end{verbatim}
2952
2953 \begin{flushleft}
2954 \# {\bf Step 0} --- enable loopback.\\
2955 \#\\
2956 \# This step is necessary on any networked box before attempt\\
2957 \# to configure any other device.\\
2958 \end{flushleft}
2959 \begin{verbatim}
2960 ip link set up dev lo
2961 ip addr add 127.0.0.1/8 dev lo brd + scope host
2962 \end{verbatim}
2963 \begin{flushleft}
2964 \# IPv6 autoconfigure themself on loopback.\\
2965 \#\\
2966 \# If user gave loopback as device, we add the address as alias and exit.
2967 \end{flushleft}
2968 \begin{verbatim}
2969 if [ "$dev" = "lo" ]; then
2970   if [ "$ipaddr" != "" -a  "$ipaddr" != "127.0.0.1" ]; then
2971     ip address add $ipaddr dev $dev
2972     exit $?
2973   fi
2974   exit 0
2975 fi
2976 \end{verbatim}
2977
2978 \noindent\# {\bf Step 1} --- enable device \verb|$dev|
2979
2980 \begin{verbatim}
2981 if ! ip link set up dev $dev ; then
2982   echo "Cannot enable interface $dev. Aborting." 1>&2
2983   exit 1
2984 fi
2985 \end{verbatim}
2986 \begin{flushleft}
2987 \# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\
2988 \# and its configuration finishes here. However,\\
2989 \# IP still needs some static preconfigured address.
2990 \end{flushleft}
2991 \begin{verbatim}
2992 if [ "$ipaddr" = "" ]; then
2993   echo "No address for $dev is configured, trying DHCP..." 1>&2
2994   dhcpcd
2995   exit $?
2996 fi
2997 \end{verbatim}
2998
2999 \begin{flushleft}
3000 \# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3001 \# Send two probes and wait for result for 3 seconds.\\
3002 \# If the interface opens slower f.e.\ due to long media detection,\\
3003 \# you want to increase the timeout.\\
3004 \end{flushleft}
3005 \begin{verbatim}
3006 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3007   echo "Address $ipaddr is busy, trying DHCP..." 1>&2
3008   dhcpcd
3009   exit $?
3010 fi
3011 \end{verbatim}
3012 \begin{flushleft}
3013 \# OK, the address is unique, we may add it on the interface.\\
3014 \#\\
3015 \# {\bf Step 3} --- Configure the address on the interface.
3016 \end{flushleft}
3017
3018 \begin{verbatim}
3019 if ! ip address add $pfx brd + dev $dev; then
3020   echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2
3021   dhcpcd
3022   exit $?
3023 fi
3024 \end{verbatim}
3025
3026 \noindent\# {\bf Step 4} --- Announce our presence on the link.
3027 \begin{verbatim}
3028 arping -A -c 1 -I $dev $ipaddr
3029 noarp=$?
3030 ( sleep 2;
3031   arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3032 \end{verbatim}
3033
3034 \begin{flushleft}
3035 \# {\bf Step 5} (optional) --- Add some control routes.\\
3036 \#\\
3037 \# 1. Prohibit link local multicast addresses.\\
3038 \# 2. Prohibit link local (alias, limited) broadcast.\\
3039 \# 3. Add default multicast route.
3040 \end{flushleft}
3041 \begin{verbatim}
3042 ip route add unreachable 224.0.0.0/24
3043 ip route add unreachable 255.255.255.255
3044 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3045   ip route add 224.0.0.0/4 dev $dev scope global
3046 fi
3047 \end{verbatim}
3048
3049 \begin{flushleft}
3050 \# {\bf Step 6} --- Add fallback default route with huge metric.\\
3051 \# If a proxy ARP server is present on the interface, we will be\\
3052 \# able to talk to all the Internet without further configuration.\\
3053 \# It is not so cheap though and we still hope that this route\\
3054 \# will be overridden by more correct one by rdisc.\\
3055 \# Do not make this step if the device is not ARPable,\\
3056 \# because dead nexthop detection does not work on them.
3057 \end{flushleft}
3058 \begin{verbatim}
3059 if [ "$noarp" = "0" ]; then
3060   ip ro add default dev $dev metric 30000 scope global
3061 fi
3062 \end{verbatim}
3063
3064 \begin{flushleft}
3065 \# {\bf Step 7} --- Restart router discovery and exit.
3066 \end{flushleft}
3067 \begin{verbatim}
3068 killall -HUP rdisc || rdisc -fs
3069 exit 0
3070 \end{verbatim}
3071
3072
3073 \section{Example: {\protect\tt ifcfg} --- interface address management}
3074 \label{EXAMPLE-IFCFG}
3075
3076 This is a simplistic script replacing one option of \verb|ifconfig|,
3077 namely, IP address management. It not only adds
3078 addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP},
3079 sends unsolicited ARP to update the caches of other hosts sharing
3080 the interface, adds some control routes and restarts Router Discovery
3081 when it is necessary.
3082
3083 I strongly recommend using it {\em instead\/} of \verb|ifconfig| both
3084 on hosts and on routers.
3085
3086 \begin{verbatim}
3087 #! /bin/bash
3088 \end{verbatim}
3089 \begin{flushleft}
3090 \# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\
3091 \# {\bf Parameters:}\\
3092 \# ---Device name. It may have alias suffix, separated by colon.\\
3093 \# ---Command: add, delete or stop.\\
3094 \# ---IP address, optionally followed by prefix length.\\
3095 \# ---Optional peer address for pointopoint interfaces.\\
3096 \# F.e. \verb|ifcfg eth0 193.233.7.90/24|
3097
3098 \noindent\# This function determines, whether it is router or host.\\
3099 \# It returns 0, if the host is apparently not router.
3100 \end{flushleft}
3101 \begin{verbatim}
3102 CheckForwarding () {
3103   local sbase fwd
3104   sbase=/proc/sys/net/ipv4/conf
3105   fwd=0
3106   if [ -d $sbase ]; then
3107     for dir in $sbase/*/forwarding; do
3108       fwd=$[$fwd + `cat $dir`]
3109     done
3110   else
3111     fwd=2
3112   fi
3113   return $fwd
3114 }
3115 \end{verbatim}
3116 \begin{flushleft}
3117 \# This function restarts Router Discovery.\\
3118 \end{flushleft}
3119 \begin{verbatim}
3120 RestartRDISC () {
3121   killall -HUP rdisc || rdisc -fs
3122 }
3123 \end{verbatim}
3124 \begin{flushleft}
3125 \# Calculate ABC "natural" mask length\\
3126 \# Arg: \$1 = dotquad address
3127 \end{flushleft}
3128 \begin{verbatim}
3129 ABCMaskLen () {
3130   local class;
3131   class=${1%%.*}
3132   if [ $class -eq 0 -o $class -ge 224 ]; then return 0
3133   elif [ $class -ge 192 ]; then return 24
3134   elif [ $class -ge 128 ]; then return 16
3135   else  return 8 ; fi
3136 }
3137 \end{verbatim}
3138
3139
3140 \begin{flushleft}
3141 \# {\bf MAIN()}\\
3142 \#\\
3143 \# Strip alias suffix separated by colon.
3144 \end{flushleft}
3145 \begin{verbatim}
3146 label="label $1"
3147 ldev=$1
3148 dev=${1%:*}
3149 if [ "$dev" = "" -o "$1" = "help" ]; then
3150   echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
3151   echo "       add - add new address" 1>&2
3152   echo "       del - delete address" 1>&2
3153   echo "       stop - completely disable IP" 1>&2
3154   exit 1
3155 fi
3156 shift
3157
3158 CheckForwarding
3159 fwd=$?
3160 \end{verbatim}
3161 \begin{flushleft}
3162 \# Parse command. If it is ``stop'', flush and exit.
3163 \end{flushleft}
3164 \begin{verbatim}
3165 deleting=0
3166 case "$1" in
3167 add) shift ;;
3168 stop)
3169   if [ "$ldev" != "$dev" ]; then
3170     echo "Cannot stop alias $ldev" 1>&2
3171     exit 1;
3172   fi
3173   ip -4 addr flush dev $dev $label || exit 1
3174   if [ $fwd -eq 0 ]; then RestartRDISC; fi
3175   exit 0 ;;
3176 del*)
3177   deleting=1; shift ;;
3178 *)
3179 esac
3180 \end{verbatim}
3181 \begin{flushleft}
3182 \# Parse prefix, split prefix length, separated by slash.
3183 \end{flushleft}
3184 \begin{verbatim}
3185 ipaddr=
3186 pfxlen=
3187 if [ "$1" != "" ]; then
3188   ipaddr=${1%/*}
3189   if [ "$1" != "$ipaddr" ]; then
3190     pfxlen=${1#*/}
3191   fi
3192   if [ "$ipaddr" = "" ]; then
3193     echo "$1 is bad IP address." 1>&2
3194     exit 1
3195   fi
3196 fi
3197 shift
3198 \end{verbatim}
3199 \begin{flushleft}
3200 \# If peer address is present, prefix length is 32.\\
3201 \# Otherwise, if prefix length was not given, guess it.
3202 \end{flushleft}
3203 \begin{verbatim}
3204 peer=$1
3205 if [ "$peer" != "" ]; then
3206   if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then
3207     echo "Peer address with non-trivial netmask." 1>&2
3208     exit 1
3209   fi
3210   pfx="$ipaddr peer $peer"
3211 else
3212   if [ "$pfxlen" = "" ]; then
3213     ABCMaskLen $ipaddr
3214     pfxlen=$?
3215   fi
3216   pfx="$ipaddr/$pfxlen"
3217 fi
3218 if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then
3219   label=
3220 fi
3221 \end{verbatim}
3222 \begin{flushleft}
3223 \# If deletion was requested, delete the address and restart RDISC
3224 \end{flushleft}
3225 \begin{verbatim}
3226 if [ $deleting -ne 0 ]; then
3227   ip addr del $pfx dev $dev $label || exit 1
3228   if [ $fwd -eq 0 ]; then RestartRDISC; fi
3229   exit 0
3230 fi
3231 \end{verbatim}
3232 \begin{flushleft}
3233 \# Start interface initialization.\\
3234 \#\\
3235 \# {\bf Step 0} --- enable device \verb|$dev|
3236 \end{flushleft}
3237 \begin{verbatim}
3238 if ! ip link set up dev $dev ; then
3239   echo "Error: cannot enable interface $dev." 1>&2
3240   exit 1
3241 fi
3242 if [ "$ipaddr" = "" ]; then exit 0; fi
3243 \end{verbatim}
3244 \begin{flushleft}
3245 \# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3246 \# Send two probes and wait for result for 3 seconds.\\
3247 \# If the interface opens slower f.e.\ due to long media detection,\\
3248 \# you want to increase the timeout.\\
3249 \end{flushleft}
3250 \begin{verbatim}
3251 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3252   echo "Error: some host already uses address $ipaddr on $dev." 1>&2
3253   exit 1
3254 fi
3255 \end{verbatim}
3256 \begin{flushleft}
3257 \# OK, the address is unique. We may add it to the interface.\\
3258 \#\\
3259 \# {\bf Step 2} --- Configure the address on the interface.
3260 \end{flushleft}
3261 \begin{verbatim}
3262 if ! ip address add $pfx brd + dev $dev $label; then
3263   echo "Error: failed to add $pfx on $dev." 1>&2
3264   exit 1
3265 fi
3266 \end{verbatim}
3267 \noindent\# {\bf Step 3} --- Announce our presence on the link
3268 \begin{verbatim}
3269 arping -q -A -c 1 -I $dev $ipaddr
3270 noarp=$?
3271 ( sleep 2 ;
3272   arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3273 \end{verbatim}
3274 \begin{flushleft}
3275 \# {\bf Step 4} (optional) --- Add some control routes.\\
3276 \#\\
3277 \# 1. Prohibit link local multicast addresses.\\
3278 \# 2. Prohibit link local (alias, limited) broadcast.\\
3279 \# 3. Add default multicast route.
3280 \end{flushleft}
3281 \begin{verbatim}
3282 ip route add unreachable 224.0.0.0/24 >& /dev/null
3283 ip route add unreachable 255.255.255.255 >& /dev/null
3284 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3285   ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null
3286 fi
3287 \end{verbatim}
3288 \begin{flushleft}
3289 \# {\bf Step 5} --- Add fallback default route with huge metric.\\
3290 \# If a proxy ARP server is present on the interface, we will be\\
3291 \# able to talk to all the Internet without further configuration.\\
3292 \# Do not make this step on router or if the device is not ARPable.\\
3293 \# because dead nexthop detection does not work on them.
3294 \end{flushleft}
3295 \begin{verbatim}
3296 if [ $fwd -eq 0 ]; then
3297   if [ $noarp -eq 0 ]; then
3298     ip ro append default dev $dev metric 30000 scope global
3299   elif [ "$peer" != "" ]; then
3300     if ping -q -c 2 -w 4 $peer ; then
3301       ip ro append default via $peer dev $dev metric 30001
3302     fi
3303   fi
3304   RestartRDISC
3305 fi
3306
3307 exit 0
3308 \end{verbatim}
3309 \begin{flushleft}
3310 \# End of {\bf MAIN()}
3311 \end{flushleft}
3312
3313
3314 \end{document}