git.proxmox.com Git - mirror

ptm-integration.patch

Integrates Prescriptive Topology Module(ptm) into quagga.

If this module is enabled, link ups are notified only after the link is verified
as being connected to the neighbor specified. The neighbor specification and
checking is done by the ptm daemon.

iquagga-faster-compile.patch

Avoid cleaning up the source tree and running reconf every time. Allows
recompilation of only those files that have been modified since last
run. Relies on the existence of config.status file to decide if we've
run the first time or subsequent times.

Do not allow a program outside Quagga to delete a Quagga route from the kernel.
To delete a Quagga route, do it inside Quagga.

bgpd-maxmed-administrative-onstartup.patch

COMMAND:

Possible forms of the command configuration:

[no] bgp max-med administrative
[no] bgp max-med administrative <max-med-value>
[no] bgp max-med on-startup <period>
[no] bgp max-med on-startup <period> <max-med-value>

DESCRIPTION:

'administrative' takes effect from the time of the config until the config is
removed.

'on-startup' is effective only at the startup time for the given '<period>'
after the first peer is established.

'<max-med-value>' is used as the MED value to be sent out when the max-med
is effective. Default max-med value is 4294967294.

NOTE:
When max-med is active, MED is changed only in the outgoing attributes to the
peers, it doesn't modify any MED specific state of the attributes in BGP on
the local node.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>

Reduce the amount of time it takes to bring up a large number of peers

Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>

bgpd-scale-update-delay-packing.patch

ISSUE:
During startup, BGP update prefix packing wasnt optimal and route installation
was found to be spread over.

SOLUTION:
With this patch, update-delay post processing is serialized to achieve:
a. better peer update packing
    (which helps in reducing total number of BGP update packets)
b. installation of the resulting routes in zebra as close to each others
    as possible.
    (which can help zebra batch its processing and updates to Kernel better)

bgpd: bgpd-ibgp-policy-out-allow-mods.patch

BGPd: Allow route-map policy modifications to also affect route reflectors.

By default, attribute modification via route-map policy out is ignored on
reflected routes. This patch provides an option to allow this modification
to occur. Once enabled, it affects all reflected routes.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>

bgpd: bgpd-routemap-match-localpref.patch

BGPd: Support matching on local preference in route-map

This patch adds support for matching on local preference in
BGP route-map.

'neighbor <if-name> interface' config support in BGP including RA/Zebra changes.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Dinesh Dutt <ddutt@cumulusnetworks.com>

Save the last message from a peer that caused us to send a NOTIFICATION

Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>

BGP doesn't count a route with an unreachable nexthop in PfxRcd

Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>

bgpd: bgpd-fsm-fix.patch

BGP: Fix FSM to handle active/passive connections better

The existing code didn't work well when dual connections resulted between
peers during session bringup. This patch fixes that.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>

bgpd: bgpd-mrai.patch

BGP: Event-driven route announcement taking into account min route advertisement interval

ISSUE

BGP starts the routeadv timer (peer->t_routeadv) to expire in 1 sec
when a peer is established. From then on, the timer expires
periodically based on the configured MRAI value (default: 30sec for
EBGP, 5sec for IBGP).  At the expiry, the write thread is triggered
that takes the routes from peer's sync FIFO (adj-rib-out) and sends
UPDATEs. This has a few drawbacks:

(1) Delay in new route announcement: Even when the last UPDATE message
    was sent a while back, the next route change will necessarily have
    to wait for routeadv expiry
(2) CPU usage: The timer is always armed. If the operator chooses to
    configure a lower value of MRAI (zero second is a preferred choice
    in many deployments) for better convergence, it leads to high CPU
    usage for BGP process, even at the times of no network churn.

PATCH

Make the route advertisement event-driven - When routes are added to
peer's sync FIFO, check if the routeadv timer needs to be adjusted (or
started). Conversely, do not arm the routeadv timer unconditionally.

The patch also addresses route announcements during read-only mode
(update-delay).  During read-only mode operation, the routeadv timer
is not started. When BGP comes out of read-only mode and all the
routes are processed, the timer is started for all peers with zero
expiry, so that the UPDATEs can be sent all at once. This leads to
(near-)optimal UPDATE packing.

Finally, the patch makes the "max # packets to write to peer socket at
a time" configurable. Currently it is hard-coded to 10. The command is
at the top router-bgp mode and is called "write-quanta <number>". It
is a useful convergence parameter to tweak.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>

bgpd: bgpd-peer-outq.patch

BGP: Show more meaningful outq value in 'show ip bgp summary' output.

'outq' field in 'show ip bgp sum' displays the number of formatted packets
to a peer. Since the route announcement follows an input-buffered pattern
(i.e. adj-rib-out is a separate queue of routes per peer and packets are
formatted from the routes at the time of TCP write), the outq field doesn't
show any interesting data worth watching.

The patch is to display the adj-rib-out queue depth instead.

signed-off-by: pmohapat@cumulusnetworks.com
reviewed-by: dwalton@cumulusnetworks.com

bgpd: bgpd-fix-ipv6-afi-parser-node.patch

BGPd: Make ipv6 unicast/multicast address-family work

In the absence of this patch, attempting to type "address-family ipv6 unicast"
would result in an "Ambiguous command" error and in the case of
"address-family ipv6 multicast", the command would silently fail, without the
prompt dropping into the address-family mode.

The cause is how the parse tree is constructed for ipv6 address family. There
was an error in extract.pl.in script and in vtysh.c files which assumed that
there was only address family ipv6 command, without unicast or multicast and
so the command was failing.

Add a null check in bgp_address_del() function when connected
addresses are removed.

The peer-groups parser is missing advertisement-interval and 'timers connect'

Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>

zebra: zebra-use-fixed-metric-cost.patch

Zebra: Use a fixed route metric when populating kernel

The route metric is not used by the Linux kernel and is irrelevant to
the forwarding decision made by the kernel. Metric is a parameter used
only by a routing protocol to compute best path(s) and to communicate this
info to its peers. Consequently, there is no value in pushing the metric
provided by a protocol daemon to the kernel.

There is a significant advantage, at least on the Linux kernel, in pushing
a constant metric with a route populated by zebra. The metric is used as a
priority field in the kernel and modifying the metric due to say topology
changes causes multiple routes to be inserted into the kernel, with differing
priorities instead of replacing the existing one. This prevents us from
using replace semantic when a route changes.

So, this patch pushes a constant metric with a route populated by zebra.

bgpd: bgpd-table-map.patch

COMMAND:

table-map <route-map-name>

DESCRIPTION:

This feature is used to apply a route-map on route updates from BGP to Zebra.
All the applicable match operations are allowed, such as match on prefix,
next-hop, communities, etc. Set operations for this attach-point are limited
to metric and next-hop only. Any operation of this feature does not affect
BGPs internal RIB.

Supported for ipv4 and ipv6 address families. It works on multi-paths as well,
however, metric setting is based on the best-path only.

IMPLEMENTATION NOTES:

The route-map application at this point is not supposed to modify any of BGP
route's attributes (anything in bgp_info for that matter). To achieve that,
creating a copy of the bgp_attr was inevitable. Implementation tries to keep
the memory footprint low, code comments do point out the rationale behind a
few choices made.

bgp_zebra_announce() was already a big routine, adding this feature would
extend it further. Patch has created a few smaller routines/macros whereever
possible to keep the size of the routine in check without compromising on the
readability of the code/flow inside this routine.

For updating a partially filtered route (with its nexthops), BGP to Zebra
replacement semantic of the next-hops serves the purpose well. However, with
this patch there could be some redundant withdraws each time BGP announces a
route thats (all the nexthops) gets denied by the route-map application.
Handling of this case could be optimized by keeping state with the prefix and
the nexthops in BGP. The patch doesn't optimizing that case, as even with the
redundant withdraws the total number of updates to zebra are still be capped
by the total number of routes in the table.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>

Patch to produce output of BGP commands in csv format. Useful for easier scripting.

nexthop-tracking.patch

quagga: nexthop-tracking.patch

Add next hop tracking support to Quagga. Complete documentation in doc/next-hop-tracking.txt.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>

bgpd: bgpd-update-delay.patch

COMMAND:

'update-delay <max-delay in seconds> [<establish-wait in seconds>]'

DESCRIPTION:

This feature is used to enable read-only mode on BGP process restart or when
BGP process is cleared using 'clear ip bgp *'. When applicable, read-only mode
would begin as soon as the first peer reaches Established state and a timer
for <max-delay> seconds is started.

During this mode BGP doesn't run any best-path or generate any updates to its
peers. This mode continues until:

1. All the configured peers, except the shutdown peers, have sent explicit EOR
(End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached
Established is considered an implicit-EOR.
   If the <establish-wait> optional value is given, then BGP will wait for
   peers to reach establish from the begining of the update-delay till the
   establish-wait period is over, i.e. the minimum set of established peers for
   which EOR is expected would be peers established during the establish-wait
   window, not necessarily all the configured neighbors.
2. max-delay period is over.

On hitting any of the above two conditions, BGP resumes the decision process
and generates updates to its peers.

Default <max-delay> is 0, i.e. the feature is off by default.

This feature can be useful in reducing CPU/network used as BGP restarts/clears.
Particularly useful in the topologies where BGP learns a prefix from many peers.
Intermediate bestpaths are possible for the same prefix as peers get established
and start receiving updates at different times. This feature should offer a
value-add if the network has a high number of such prefixes.

IMPLEMENTATION OBJECTIVES:

Given this is an optional feature, minimized the code-churn. Used existing
constructs wherever possible (existing queue-plug/unplug were used to achieve
delay and resume of best-paths/update-generation). As a result, no new
data-structure(s) had to be defined and allocated. When the feature is disabled,
the new node is not exercised for the most part.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
             Dinesh Dutt <ddutt@cumulusnetworks.com>

bgpd: bgpd-restart-bit-fix.patch

ISSUE:

Quagga BGP doesn't send or use the restart-bit via the Graceful-Restart(GR)
capability. GR capability implementation isn't complete as per the RFC.

PATCH:

Patch uses BGP instance creation as the beginning of the startup period,
and 'restart_time' is taken as the startup period. As a result, BGP will
set the restart bit in the GR capability of the OPEN messages during the
startup period.

As an indication of quagga implementation's capability of sending End-Of-RIB,
helping a restarting neighbor, quagga BGP will now send global GR capability
irrespective of the graceful-restart config in BGP and the address-family
specific GR capability will be sent only if the GR config is present.
Forwarding bit is not set assuming its not preserved.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>

Remove incorrect SNMP error when installing debian package

ospfd: ospfv2-fix-interface-mode-cmd.patch

SYMPTOM:

Interface mode OSPF area configuration is not retained after restarting quagga.

Example -

quagga(config)# interface swp49
quagga(config-if)# ip ospf area 0.0.0.0

quagga# sh run
<snip>
interface swp49
ip ospf area 0.0.0.0
ipv6 nd suppress-ra
link-detect
!

quagga# write memory

* Restart quagga at this point*

quagga# sh run
<snip>
interface swp49
ipv6 nd suppress-ra
link-detect
!

ISSUE:

The issue is that the interface mode commands can reach the OSPF process even
before 'router ospf' command that initializes the default OSPF instance, this
is not getting handled properly in OSPF process.

FIX:

Initialize the default OSPF instance during OSPF process initializations, which
is before 'router ospf' command is received in OSPF process. So, when interface
mode command is received, it is guaranteed to have ospf instance to work with.

Other way could be to call ospf_get() instead of ospf_lookup() while processing
the config command callbacks, although OSPF needs to have at least one instance
structure anyways, therefore calling it unconditionally in OSPF initializations
should be fine too.

There could be more elaborate fix(es) possible to handle this, like adding some
ordering mechanism for commands as they are read by a process, or storing the
received command and applying it after the commands its dependent upon are
processed. For the issue at hand, initializing the default instance in main()
serves the purpose well.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>

cluster-id length equality for multipath

A fat tree topology running IBGP gets into two issues with anycast address
routing. Consider the following topology:

        R9   R10
          x x
  R3   R4     R7   R8
     x           x
  R1   R2     R5   R6
  |    |      |    |
10/8 10/8  10/8   S

Let's remind ourselves of BGP decision process steps:

1. Highest Local Preference
2. Shortest AS Path Length
3. Lowest Origin Type
4. Lowest MED (Multi-Exit Discriminator)
5. Prefer External to Internal
6. Closest Egress (Lowest IGP Distance)
7. Tie Breaking (Lowest-Router-ID)
8. Tie Breaking (Lowest-cluster-list length)
9. Tie Breaking (Lowest-neighbor-address)

Without any policies, steps 1-6 will almost always evaluate identically for
all paths received on any router in the above topology. Let's assume that
the router-ids follow the following inequality: R1 < R2 < R5 < R6. Owing to
the 7th step above, all routers will now choose R1's path as the best. This
is undesirable. As an example, traffic from S to 10/8 will follow the path
S -> R6 -> R7 -> R9 -> R4 -> R2 -> 10/8 instead of S -> R6 -> R7 -> R5 -> 10/8.
Furthermore, once R7 (& R8) chooses R1's path as the best, it would withdraw
its path learned through (R5, R6) from (R9, R10). This leads to inefficient
load balancing - e.g. R9 can't do ECMP across all available egresses -
(R1, R2, R5).

The patch addresses these issues by noting that that cluster list is always
carried along with the routes and its length is a good indicator of IBGP
hops. It thus makes sense to compare that as an extension to metric after
step 6. That automatically ensures correct multipath computation.

Unfortunately a partial deployment of this in a generic topology (note:
fat-tree/clos topologies work fine) may lead to potential loops. It needs
to be looked into.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>

Add set ipv6 next-hop peer-address command.

IPv4 has the ability to specify the peer address with the keyword peer-address.
IPv6 mandates the use of a specific global or local address only in setting the
next-hop in routemaps. This makes it cumbersome to configure some large networks
with BGP and IPv6. This patch fixes that deficiency.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>

IPv6 multipath is broken in BGP if nexthop contains only global address.
IPv6 always uses both nextop IPv6 address and ifIndex in sending routes down to
zebra. In cases where only the global IPv6 address is present in the nexthop
information, the existing code doesn't set the ifIndex. An example of such a
case is when a route-map isused with "set ipv6 next-hop" and only global
address is specified. This code causes the ifIndex to be determined and
set thereby fixing the multipath programming.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>

Enable support for BGP IPV6 multipath.

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com>

When an LSA is flushed we need to update the timestamps for them. This
allows for the node to give the neighbor sufficient time to send back
an acknowledgement before retransmission kicks in.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Reviewed-by: James Li <jli@cumulusnetworks.com>

Section 16.0 of rfc2328 (OSPF) specifies that the short-path
calculation to a node should be constructed with the sum of all path
costs (metrics) to the node (pretty simple huh).  There is a usage of
metric typified by the "max-metric router-lsa" command in many
networking stacks that allows a router to gracefully "remove" itself
from a topology by advertising the maximum value of metric in it's
router LSAs (16 bits of "1").  In this case, the router will continue
to forward any traffic sent to it while these "max-metric" LSAs are
propagated through the network; at which point, the router can be
taken out of service.

The correct handling of this in ospfd would use this metric as part of
the calculation, disuading other routers from using it for transit
traffic (assuming a better path exits).  Unfortunately, the ospfd
behavior is to remove these links from the SPF calculation.  This
patch changes the behavior to omit this exception handling.

Signed-off-by: JR Rivers <jrrivers@cumulusnetworks.com>

enable autoreconf so that Makefile.in is regenerated in the cumulus build.
This is necessary for the added .c files and modified Makefile.am files
in our patches.

This patch enables support for multipath for IPV6. The nexthop information
from the protocols have ifindices and nexthop addresses in two different
structures. This patch combines them to ensure that the correct APIs can
be called. Also, given that IPV6 Linux implementation does not support the
rta_XXX APIs for multipath, the communication with the kernel is in terms
of a single nh/ifindex pair.

Modified the number of writes to service at most
20 interfaces.

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>

In the "write" direction added a command to ensure that Quagga
is able to send out K (=3 by default) packets per thread-write.

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Reviewed-by: JR Rivers <jrrivers@cumulusnetworks.com>

Set the RCVBUF and SNDBUF sizes to the maximum possible amount to
handle traffic under duress.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Ayan Banerjee <ayan@cumulusnetworks.com>

Ensuring that all interfaces are served in a round robin fashion during
write.

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com>

Add a null check to ensure that another thread
does not access the address in the deletion path
of the interface.

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>

Send ARP requests proactively during OSPF Adjacency formation.

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Reviewed-by: JR Rivers <jrrivers@cumulusnetworks.com>
Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>

Added show command to display only the routes in the RIB that
does not count ECMPs

Signed-off-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Reviewed-by: JR Rivers <jrrivers@cumulusnetworks.com>

Bump default netlink socket buffer size to 4M.

Signed-off-by: Nolan Leake <nolan@cumulusnetworks.com>
Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>

zebra-non-default-table-for-static-routes.patch

Make static IPv6 routes respect non-default routing tables.

Signed-off-by: Nolan Leake <nolan@cumulusnetworks.com>
Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>

ospfd-spf-stats.patch

Compute and display SPF execution statistics

Detailed SPF statistics, all around time spent executing various pieces of SPF
such as the SPF algorithm itself, installing routes, pruning unreachable networks
etc.

Reason codes for firing up SPF are:
R - Router LSA, N - Network LSA, S - Summary LSA, ABR - ABR status change,
ASBR - ASBR Status Change, AS - ASBR Summary, M - MaxAge

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: JR Rivers <jrrivers@cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Reviewed-by: Ayan Banerjee <ayan@cumulusnetworks.com>

zebra-enable-link-detect-by-default.patch

zebra: Set link-detect on by default

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt at cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma at cumulusnetworks.com>

logrotate-add-size.patch

rotate logs by size

conditional-quagga-pdf.patch

The building of quagga.pdf requires the convert program out of the imagemagick
package. Getting this to run correctly in the scratchbox2 environment is
painful. Conditionally generate documentation during native compilation.

Debian Packaging Files

This is the collection of Debian packaging files used to compile
our code

*: merge branch stable/0.99.23

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

lib: unset ZEBRA_IFA_PEER if no dst addr present (BZ#801)

On OpenBSD, carp interfaces claim to be PtP interfaces with a 0.0.0.0/0
peer address.  We process those in zebra and try to send them to
clients, at which point they get encoded as all-0.  The client code,
however, decodes that to a NULL pointer instead of 0.0.0.0.  This later
turns into a SEGV when CONNECTED_PREFIX sees that ZEBRA_IFA_PEER is set
and tries to access the peer prefix.

This is a band-aid fix for stable/0.99.23, a long-term solution needs
some conceptual improvements on the entire thing.

(The usefulness of a PtP-to-0.0.0.0/0 is a separate question;  at this
point dropping the peer prefix seems the least intrusive solution.)

Reported-by: Laurent Lavaud <laurent.lavaud@ladtech.fr>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

isisd: type mix-up in 28a8cfc "don't require IPv4"

Whoops, these are in6_addrs, not prefix_ipv6... funnily enough, it does the
right thing either way, if it compiles, which it only does on Linux because
IN6_IS_ADDR_LINKLOCAL contains a cast to the right type. On BSD there is no
such cast, hence it explodes on trying to compile, trying to access struct
members of in6_addrs while operating on prefix_ipv6...

Fixes: 28a8cfc ("isisd: don't require IPv4 for adjacency")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: memmove needed in community_del_val

In bgpd/bgp_community_del_val memcpy is used for potentially overlapping
regions which is *not* safe. It may "work" in some cases but is not
guaranteed to work in all cases. The case that I saw fail was on an
x86_64 architecture with the number of bytes being moved/copied equal to
8.

The way the code is written the uint32_t pointers will always differ by
1, which is equivalent to a memcpy/memmove of regions that are 4 bytes
away from one another. So the code failed while copying an 8 byte region
to an address that is 4 bytes lower i.e. overlapping regions.

Interestingly, the same architecture had no problems with a 12 byte
copy.

When the code failed the communities were [200,300,400] and a call was
made to delete the 200 community. The result of this was an array that
looked like [400,400] which was uniquified to [400]. Of course the
expected result should have been [300, 400].

One additional point - in our production environment memmove would not
*link* without including <string.h> but in an isolated quagga git repo
this #include does not seem to be required and I see memmove is used in
vtysh.c without this #include either.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

build: do not assume glibc on linux

The whole IPv6 stack detection could need refactoring. But this
fixes the linux check to not assume glibc. Fixes build against
musl c-library.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

zebra: fix struct msghdr initializers

struct msghdr field orders are not strictly specified in POSIX.
Improve portability by using designated initializer. This fixes
build against musl c-library where struct msghdr is POSIX
compliant (Linux kernel and glibc definitions are non-conforming).

As the result is also more readable, struct iovec initilizers
were also converted.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

*: fix detection and usage of sys/cdefs.h

This header is non-standard (though present on many systems) and
there is no standard for what it should or should not define.

Remove it where it is not really needed. But add also a configure
check, so it can be used if available but otherwise fallback to
defining the needed macroes.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

isisd: don't require IPv4 for adjacency

This was precluding isisd from IPv6-only operation; no adjacency would
come up unless there was IPv4 in parallel.

Reported-by: Martin Winter <mwinter@netdef.org>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

tests/bgpd: don't hardcode error number (fix f57000c)

f57000c ("bgpd: don't send NOTIFY twice for malformed attrs") introduces
BGP_ATTR_PARSE_ERROR_NOTIFYPLS as additional error code that implies the
caller should sent a NOTIFY and convert it to BGP_ATTR_PARSE_ERROR.
Sadly, the latter was hardcoded in bgp_mp_attr_test.c, which now didn't
consider the new value to be an error.

Make the testcase treat all nonzero values as error without discern.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ripd: use only one constant for derivation

RIP_MAX_RTE is defined in ripd.h as 25 but is in fact the
result of a formula. More over it is not used in the code:
the code itself includes the fomula. This makes it un-clear
for maintenance.

Signed-off-by: Feng Lu <lu.feng@6wind.com>
Reviewed-by: Alain Ritoux <alain.ritoux@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

*: merge branch stable/0.99.23

bgp extcommunity fixes from stable branch

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: don't send NOTIFY twice for malformed attrs

Most of the attribute parsing functions were already sending a notify,
let's clean up the code to make it happen only once.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix IP endianness in debug message

inet_ntop expects network byte order.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix memory leak on malformed attribute

When bgp_attr_parse returns BGP_ATTR_PARSE_ERROR, it may already have
parsed and allocated some attributes before hitting that error. Free
the attr's data before returning.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix double free after extcommunity set (BZ#799)

The route-map extcommunity set code was incorrectly assuming that it
owns the intern'd struct ecommunity reference.  In reality, the intern'd
reference belongs to bgp_update_receive() and we're not supposed to
touch it in the route-map code.

Instead, like all the other set commands, we use a on-heap but
non-intern'd ecommunity to set the new value.  This is then either
intern'd in bgp_update_main/_rsclient() through bgp_attr_intern(), or
free'd through bgp_attr_flush().

This fixes Bugzilla #799, which is that bgpd otherwise crashes with a
double free.  The ecommunity got unintern'd first in the route-map set
command, then in bgp_update_receive().

Debugged-by: Milan Kocian <milon@wq.cz>
Reported-by: Florian S <florian@herrenlohe.de>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: remove duplicate route-map extcommunity code

route_set_ecommunity_rt and _soo share almost all of their code.
Let's remove one of the redundant copies.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix some bgp_update_main() attribute leaks

bgp_update_main() wasn't doing anything to release attribute values
set from route maps for two of its error paths. To fix, pull up the
appropriate cleanup from further down and apply it here.

bgp_update_rsclient() doesn't have the issue since it immediately
does bgp_attr_intern() on the results from bgp_{export,import}_modifier.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: merge branch tteras/master~3 (b304dcb)

route-map tidying + next-hop-self all

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: route-map: share aspath object compilation code where possible

Signed-off-by: Timo Teräs <timo.teras@iki.fi>

bgpd: implement "next-hop-self all"

As specified in:
http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/command/irg-cr-book/bgp-m1.html#wp4972925610

This allows overriding next-hop for ibgp learned routes on an
RR for reflected routes.

Especially useful for using iBGP in DMVPN setups. See:
http://blog.ipspace.net/2014/04/changes-in-ibgp-next-hop-processing.html

Signed-off-by: Timo Teräs <timo.teras@iki.fi>

bgpd: fix route-map comments

Signed-off-by: Timo Teräs <timo.teras@iki.fi>

release: 0.99.23

doc: update NEWS for 0.99.23 changes

*: nuke ^L (page feed)

Quagga sources have inherited a slew of Page Feed (^L, \xC) characters
from ancient history. Among other things, these break patchwork's
XML-RPC API because \xC is not a valid character in XML documents.

Nuke them from high orbit.

Patches can be adapted simply by:
sed -e 's%^L%%' -i filename.patch
(you can type page feeds in some environments with Ctrl-V Ctrl-L)

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

build: Quagga 0.99.23-rc1

this is not a full release version, so neither release notes nor
documentation are updated yet. Also, signing the tag with my private
GPG key instead of the Quagga one.

isisd: ignore the unrecognized TLVs

When processing LSPDUs, the unrecognized TLVs/sub-TLVs should be
silently ignored.

In parse_tlvs(), ISIS_WARNING is returned once an unrecognized TLV
exists. It breaks the processing in lsp_authentication_check() and
lsp_update_data(). So remove it.

Signed-off-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ripd: fix "show ip rip status" documentation

The command was mis-named in the documentation as "show ip protocols".

Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ripd & ripngd: avoid the zero interface metric

The interface metric is initialized to 0 in the commit db19c85:
zebra: set metric for directly connected routes via netlink to 0

Ripd and ripngd must be aware of it and avoid increase the
route metric by 0.

Signed-off-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

lib: remove redundant and incorrect sys/fcntl.h include

POSIX defines <fcntl.h>, <sys/fcntl.h> is the same thing. However,
it should not be used as it's existence can depend on C-library
implementation. E.g. musl gives warning if <sys/fcntl.h> is used.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix crash when allowas-in is done on inactive peer

When allowas-in is changed on a peer that is not up, BGP would crash
trying to do route_refresh. If peer is not up, there is no need
to do notification or send.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: efficient NLRI packing for AFs != ipv4-unicast

ISSUE:

  Currently, for non-ipv4-unicast address families where prefixes are
  encoded in MP_REACH/MP_UNREACH attributes, BGP ends up sending one
  prefix per UPDATE message. This is quite inefficient. The patch
  addresses the issue.

PATCH:

  We introduce a scratch buffer in the peer structure that stores the
  MP_REACH/MP_UNREACH attributes for non-ipv4-unicast families. This
  enables us to encode multiple prefixes. In the end, the two buffers
  are merged to create the UPDATE packet.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
[DL: removed no longer existing bgp_packet_withdraw prototype]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: don't compare next-hop to router-id

While announcing a path to a peer, the code currently compares the path's
next-hop with the peer's router-id. This can lead to problems as the router
IDs are unique only within an AS. Suppose AS 1 sends route with next-hop
10.1.1.1. It is possible that the speaker has an established BGP peering
with a router in AS 2 with router ID 10.1.1.1. The route will not be
advertised to that peer in AS 2.

The patch removes this check.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

vtysh: fix build against readline 6.3

readline 6.3 removes some old deprecated funnily-named types. This
updates vtysh to use the new types so it builds again.

Reported-by: Joel Teichroeb <klusark@archlinux.invalid>
References: https://bugs.archlinux.org/task/39495
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

zebra: raise the privileges before calling socket()

Because of recent changes when creating AF_NETLINK socket, kernel will
cache capabilities of the caller and if file descriptor is used or
otherwise handed to another process it will check that current user has
necessary capabilities to use the socket. Hence we need to ensure we
have necessary capabilities when creating the socket and at the time we
use the socket.

See: http://www.spinics.net/lists/netdev/msg280198.html

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: Fix condition allowas-in in rsclient code

Currently when you set neighbour's 'allowas-in' option on route server side
you get redistribution of the prefixes from this neighbour's table into all
neighbour's tables which have the same AS number. I think that wanted behaviour
is to allow import prefixes from neighbour's tables with the same AS num
into neighbour which has 'allowas-in' option set.

Signed-off-by: Milan Kocian <milon@wq.cz>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: support TTL-security with iBGP

Traditionally, ttl-security feature has been associated with EBGP
sessions as those identify directly connected external peers. The
GTSM RFC (rfc 5082) does not make any restrictions on type of
peering. In fact, it is beneficial to support ttl-security for both
EBGP and IBGP sessions. Specifically, in data centers, there are
directly connected IBGP peerings that will benefit from the protection
ttl-security provides.

Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
[DL: function refactoring split out into previous 2 patches.  changes:
     - bgp_set_socket_ttl(): ret type int -> void
     - is_ebgp_multihop_configured(): stripped peer == NULL check
     - comments/whitespace]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: factor out eBGP multihop check

The check for an eBGP multihop configuration is unwieldy; factor it out
into a separate function.

[DL: originally by Dinesh G Dutt <ddutt@cumulusnetworks.com>,
split off from the next commit]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: factor out TTL setting

TTL/min TTL are set from both bgp_accept() and bgp_connect(). Factor
them out so the following change to enable iBGP GTSM becomes more
readable.

[DL: originally by Dinesh G Dutt <ddutt@cumulusnetworks.com>,
split off from the next commit]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix fast external fallover behavior

ISSUES

1. When an interface goes down, the zclient callbacks are invoked
   in the following order: (a) address_delete() that removes the
   connected address list: ifp->connected, (b) interface_down()
   that performs "fast external fallover" operation. The operation
   relies on ifp->connected to look for peers that should be brought
   down. That's a cyclic dependency.

2. 'ttl-security' configuration handler sets peer->ttl to
   MAXTTL (so that BGP packets are sent with TTL=255, as per the
   requirement of ttl-security). This, however, is incompatible
   with 'fast external fallover' as the fallover operation checks
   for (ttl == 1) to determine directly connected peers.

3. The current fallover operation does not work for IPv6 address family.

PATCH

1. The patch removes the dependency on 'ifp->connected' list for fast
   fallover. The peer already contains a nexthop structure that reflects
   the peering address. The nexthop structure has a pointer to the
   interface (ifp) that peering address resolves to. Everytime the TCP
   connection succeeds, the ifp is updated. The patch uses this ifp in
   the interface_down() callback for a match for the peers that should be
   brought down.

2. The evaluation for directly connected peering is enhanced as
   'peer->ttl == 1' OR 'peer->gtsm_hops == 1'. Thus a ttl-security
   configuration on the peer with one hop is directly connected and
   should be brought down under 'fast external fallover'.

3. Because of fix (1), IPv6 address family works automatically.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: increase TCP socket buffer size

BGP does not respond fairly in high scale. As the number of BGP peers
and prefixes increase, triggers like interface flaps which lead to BGP
peer flaps, cause blockage in bgp_write.

BGP does handle the cases of TCP socket buffer full by queuing a write
event back, there is no functional issue there as such. Still,
increasing the peer socket buffer size should help reduce event queueing
in BGP.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
[DL: patch split, this is item 3.]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: fix O_NONBLOCK on outgoing connects

BGP was setting sockets to be non-blocking only for the accepted passive
peers. As a fix, setting the BGP sockets to be non-blocking even for
the active peers.

Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
[DL: patch split, this is item 1.]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: send notify in OpenSent when stopping manually

The issue it fixes is that the notification message is not sent to a
second peer when bgp is stopped manually.

According to BGP RFC4271, section 8.2.2, regarding the FSM transitions,
in OpenSent state:

If a ManualStop event (Event 2) is issued in the OpenSent state, the
local system:

  * sends the NOTIFICATION with a Cease,
  * sets the ConnectRetryTimer to zero,
  * releases all BGP resources,
  * drops the TCP connection,
  * sets the ConnectRetryCounter to zero, and
  * changes its state to Idle.

I've added a check for OpenSent state when the notification is sent from
the functions which are called from the CLI commands which
directly/indirectly stop/restart BGP.

Acked-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: display multipath status in "show ip bgp"

The output of "show ip bg" does not show whether and which routes are
installed as multipath routes along the best route:

BGP table version is 0, local router ID is 10.10.100.209
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i1.0.0.0/24       10.10.100.1              1    111      0 15169 i
* i                 10.10.100.2              1    111      0 15169 i
* i                 10.10.100.3              1    111      0 65100 15169 i

This patch adds a new status code that is showing exactly which routes
are used as multipath:

BGP table version is 0, local router ID is 10.10.100.209
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
              i internal, r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i1.0.0.0/24       10.10.100.1              1    111      0 15169 i
*=i                 10.10.100.2              1    111      0 15169 i
* i                 10.10.100.3              1    111      0 65100 15169 i

The inconsistency in the status code legend ("i - internal" vs. "i internal")
inherent from old IOS was fixed. It had to be touched anyways.

Signed-off-by: Boian Bonev <bbonev at ipacct.com>
[DL: rewrap long line, clean whitespace in same chunk]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: track correct originator-id in reflected routes

ISSUE:

Suppose route1 and route2 received from route-reflector-client1 and client2
respectively have identical attributes. The current logic of creating the
adj-rib-out for a peer threads the 'adv' structures for both routes against
the same attribute. This results in 'bgp_update_packet()' to pack those
routes in the same UPDATE message with one attr structure formatted. The
originator-id is thus set according to the first route's received router id.
This is incorrect.

PATCH:

Fix bgp_announce_check() function to set the originator-id in the
advertising attr structure. Also, fix the attribute hash function and
compare function to consider originator-id. Otherwise attributes where all
fields except the originator-id are identical get merged into one memory
location.

Signed-off-by: Pradosh Mohapatra <pmohapat at cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma at cumulusnetworks.com>
Reviewed-by: Ken Yin <kyin at cumulusnetworks.com>
[DL: whitespace changes dropped]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: add 'bgp bestpath as-path multipath-relax'

Compute multipath in BGP based on AS_PATH hop count match. If the knob
is turned on, it is not required to have an exact match of AS_PATHs
(provided other multipath conditions are met, of course).

Signed-off-by: Pradosh Mohapatra <pmohapat at cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt at cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

bgpd: use ATTR_FLAG_BIT() for BGP_ATTR_ values

* bgp_attr.c: this UNSET_FLAG()s are bogus. I did a quick review and
I think that they could not cause any bug anyway.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: add debug messages for router lsa-generation

Add log messages to lsa_link_broadcast_set so it becomes more
apparent why a particular broadcast interface was added as
transit or stub interface.

Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: For an ABR, ensure the right LSID is MaxAge'd

PROBLEM:

Accurate garbage collection of maxage LSAs. The global OSPF structure has
a maxage_lsa tree - the key to the tree is <ls-id, adv-router> tuple. Suppose
the ABR has multiple areas and has originated some intra-area LSAs. The
key for all those LSAs is the same. The code then ends up in a state where
all but the first LSA do not get cleaned up from the areas' LSDB. A subsequent
event would readvertise those LSAs.

PATCH:

Since the LSA is going to stick around till it actually gets cleaned up by
the maxage_walker, make the LSA pointer as the key. Each distinct LSA that
gets maxage'd then gets added to the tree and will get cleaned up correctly.

Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
[CF: Use CHAR_BIT; use uintptr_t; use sizeof(field) instead of sizeof(type)]
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
[DL: this must remain a temporary fix! needs to be redone after 0.99.23]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: clarify indentation and comments in ospf_lsa_maxage_delete

Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: fix a reference counting issue introduced by commit 4de8bf0011

Commit 4de8bf0011 added a return statement to a loop iterating over a
route_table. That loop uses route_top/route_next.

As commit 4de8bf0011 failed to add a route_node_unlock before the
return statement, a reference is leaked when this codepath is taken.

Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: check the LS-Ack's recentness instead of only comparing the #seq

ISSUE:

RTA(DR)-----(BackupDR)RTB

RTA advertises a new LSA to RTB, and then flushes the LSA (with setting
the age of the LSA to MaxAge) within 1 second. Then the LSA is deleted
from RTA, while it still exists on RTB with non-MaxAge and can not be
flushed any more.

FIX:

The reason can be explained in below:

a) RTA -- new LSA, #seq=1 --> RTB (RTB will send the delayed Ack in 1s)
b) RTA -- MaxAge LSA, #seq=1 --> RTB (RTB discards it for the MIN_LS_ARRIVAL)
c) RTA <-- Ack for the new LSA, #seq=1 -- RTB (RTA accepts it)

In the step c), ospf_ls_ack() compares the #seq of the entry in the LS-Ack
with that of local MaxAge LSA. The #seq of the two entries are same. So
the Ack is accepted and the LSA is removed from the retransmit-list (while
it should not).

In RFC2328, section  13.7.  Receiving link state acknowledgments:

o   If the acknowledgment is for the same instance that is  <==
    contained on the list, remove the item from the list and
    examine the next acknowledgment.  Otherwise:

where "same instance" does not mean the same #seq. We must call
ospf_lsa_more_recent() to check whether the two instances are same.

Signed-off-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: don't allow to set network type on loopback interfaces

OSPFd only allocates some stub information for loopback interfaces.
This causes a crash when the interface state machine is started on
that interface by configuring a different network type.

It doesn't make much sense to configure the network type of a loopback
interface, therefore, just forbid it.

See also bugzilla #670.

Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>

ospfd: run DR election prior to LSA regeneration

The results from DR election are used when constructing router-LSAs.
E.g. they are used to determine whether a broadcast interface should
be added with a link type of stub interface or transit interface.

Therefore, we should run DR election prior before regenerating LSAs.

Before commit c363d3861b5384a31465a72ddc3b0f6ff007a95a the DR election
was called synchronously prior to router-LSA regeneration which was run
asynchronously.

This fixes bug #761 on the Quagga bugzilla.

Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>