]> git.proxmox.com Git - mirror_frr.git/blame - doc/developer/next-hop-tracking.rst
Merge pull request #13430 from opensourcerouting/feature/rip_allow-ecmp_limit
[mirror_frr.git] / doc / developer / next-hop-tracking.rst
CommitLineData
b22ba015
QY
1Next Hop Tracking
2==================
3
4Next hop tracking is an optimization feature that reduces the processing time
5involved in the BGP bestpath algorithm by monitoring changes to the routing
6table.
7
8Background
9-----------
10
11Recursive routes are of the form:
12
13::
14
15 p/m --> n
16 [Ex: 1.1.0.0/16 --> 2.2.2.2]
17
18where 'n' itself is resolved through another route as follows:
19
20::
21
22 p2/m --> h, interface
23 [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0]
24
25Usually, BGP routes are recursive in nature and BGP nexthops get resolved
26through an IGP route. IGP usually adds its routes pointing to an interface
27(these are called non-recursive routes).
28
29When BGP receives a recursive route from a peer, it needs to validate the
30nexthop. The path is marked valid or invalid based on the reachability status
31of the nexthop. Nexthop validation is also important for BGP decision process
32as the metric to reach the nexthop is a parameter to best path selection
33process.
34
35As it goes with routing, this is a dynamic process. Route to the nexthop can
36change. The nexthop can become unreachable or reachable. In the current BGP
37implementation, the nexthop validation is done periodically in the scanner run.
38The default scanner run interval is one minute. Every minute, the scanner task
39walks the entire BGP table. It checks the validity of each nexthop with Zebra
40(the routing table manager) through a request and response message exchange
41between BGP and Zebra process. BGP process is blocked for that duration. The
42mechanism has two major drawbacks:
43
44- The scanner task runs to completion. That can potentially starve the other
45 tasks for long periods of time, based on the BGP table size and number of
46 nexthops.
47
48- Convergence around routing changes that affect the nexthops can be long
49 (around a minute with the default intervals). The interval can be shortened
50 to achieve faster reaction time, but it makes the first problem worse, with
51 the scanner task consuming most of the CPU resources.
52
53The next-hop tracking feature makes this process event-driven. It eliminates
54periodic nexthop validation and introduces an asynchronous communication path
55between BGP and Zebra for route change notifications that can then be acted
56upon.
57
58Goal
59----
60
61Stating the obvious, the main goal is to remove the two limitations we
62discussed in the previous section. The goals, in a constructive tone,
63are the following:
64
65- **Fairness**: the scanner run should not consume an unjustly high amount of
66 CPU time. This should give an overall good performance and response time to
67 other events (route changes, session events, IO/user interface).
68
69- **Convergence**: BGP must react to nexthop changes instantly and provide
70 sub-second convergence. This may involve diverting the routes from one
71 nexthop to another.
72
73Overview of changes
74------------------------
75
76The changes are in both BGP and Zebra modules. The short summary is
77the following:
78
79- Zebra implements a registration mechanism by which clients can
80 register for next hop notification. Consequently, it maintains a
81 separate table, per (VRF, AF) pair, of next hops and interested
82 client-list per next hop.
83
84- When the main routing table changes in Zebra, it evaluates the next
85 hop table: for each next hop, it checks if the route table
86 modifications have changed its state. If so, it notifies the
87 interested clients.
88
89- BGP is one such client. It registers the next hops corresponding to
90 all of its received routes/paths. It also threads the paths against
91 each nexthop structure.
92
93- When BGP receives a next hop notification from Zebra, it walks the
94 corresponding path list. It makes them valid or invalid depending
95 on the next hop notification. It then re-computes best path for the
96 corresponding destination. This may result in re-announcing those
97 destinations to peers.
98
99Design
100------
101
102Modules
75ca3b11 103^^^^^^^
b22ba015
QY
104
105The core design introduces an "nht" (next hop tracking) module in BGP
106and "rnh" (recursive nexthop) module in Zebra. The "nht" module
107provides the following APIs:
108
109+----------------------------+--------------------------------------------------+
110| Function | Action |
111+============================+==================================================+
112| bgp_find_or_add_nexthop() | find or add a nexthop in BGP nexthop table |
113+----------------------------+--------------------------------------------------+
b22ba015
QY
114| bgp_parse_nexthop_update() | parse a nexthop update message coming from zebra |
115+----------------------------+--------------------------------------------------+
116
117The "rnh" module provides the following APIs:
118
119+----------------------------+----------------------------------------------------------------------------------------------------------+
120| Function | Action |
121+============================+==========================================================================================================+
122| zebra_add_rnh() | add a recursive nexthop |
123+----------------------------+----------------------------------------------------------------------------------------------------------+
124| zebra_delete_rnh() | delete a recursive nexthop |
125+----------------------------+----------------------------------------------------------------------------------------------------------+
126| zebra_lookup_rnh() | lookup a recursive nexthop |
127+----------------------------+----------------------------------------------------------------------------------------------------------+
128| zebra_add_rnh_client() | register a client for nexthop notifications against a recursive nexthop |
129+----------------------------+----------------------------------------------------------------------------------------------------------+
130| zebra_remove_rnh_client() | remove the client registration for a recursive nexthop |
131+----------------------------+----------------------------------------------------------------------------------------------------------+
132| zebra_evaluate_rnh_table() | (re)evaluate the recursive nexthop table (most probably because the main routing table has changed). |
133+----------------------------+----------------------------------------------------------------------------------------------------------+
134| zebra_cleanup_rnh_client() | Cleanup a client from the "rnh" module data structures (most probably because the client is going away). |
135+----------------------------+----------------------------------------------------------------------------------------------------------+
136
1374.2. Control flow
138
139The next hop registration control flow is the following:
140
141::
142
143 <==== BGP Process ====>|<==== Zebra Process ====>
144 |
145 receive module nht module | zserv module rnh module
146 ----------------------------------------------------------------------
147 | | |
148 bgp_update_ | | |
149 main() | bgp_find_or_add_ | |
150 | nexthop() | |
151 | | |
152 | | zserv_nexthop_ |
153 | | register() |
154 | | | zebra_add_rnh()
155 | | |
76bd1499 156
b22ba015
QY
157
158The next hop notification control flow is the following:
159
160::
161
162 <==== Zebra Process ====>|<==== BGP Process ====>
163 |
164 rib module rnh module | zebra module nht module
165 ----------------------------------------------------------------------
166 | | |
167 meta_queue_ | | |
168 process() | zebra_evaluate_ | |
169 | rnh_table() | |
170 | | |
171 | | bgp_read_nexthop_ |
172 | | update() |
173 | | | bgp_parse_
174 | | | nexthop_update()
175 | | |
176
177
178zclient message format
75ca3b11 179^^^^^^^^^^^^^^^^^^^^^^
b22ba015
QY
180
181ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are
182encoded in the following way:
183
184::
185
186 . 0 1 2 3
187 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
188 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
189 | AF | prefix len |
190 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
191 . Nexthop prefix .
192 . .
193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
194 . .
195 . .
196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
197 | AF | prefix len |
198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
199 . Nexthop prefix .
200 . .
201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
76bd1499 202
b22ba015
QY
203
204``ZEBRA_NEXTHOP_UPDATE`` message is encoded as follows:
205
206::
207
208 . 0 1 2 3
209 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
211 | AF | prefix len |
212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
213 . Nexthop prefix getting resolved .
214 . .
215 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
216 | metric |
217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
218 | #nexthops |
219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
220 | nexthop type |
221 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
222 . resolving Nexthop details .
223 . .
224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
225 . .
226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
227 | nexthop type |
228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
229 . resolving Nexthop details .
230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
231
232
233BGP data structure
75ca3b11 234^^^^^^^^^^^^^^^^^^
b22ba015
QY
235Legend:
236
237::
238
239 /\ struct bgp_node: a BGP destination/route/prefix
240 \/
76bd1499 241
4b7e6066 242 [ ] struct bgp_path_info: a BGP path (e.g. route received from a peer)
76bd1499 243
b22ba015
QY
244 _
245 (_) struct bgp_nexthop_cache: a BGP nexthop
76bd1499 246
b22ba015
QY
247 /\ NULL
248 \/--+ ^
249 | :
250 +--[ ]--[ ]--[ ]--> NULL
251 /\ :
252 \/--+ :
253 | :
254 +--[ ]--[ ]--> NULL
255 :
256 _ :
257 (_)...........
258
259
260Zebra data structure
75ca3b11 261^^^^^^^^^^^^^^^^^^^^
b22ba015
QY
262
263RNH table::
264
d7c0a89a
QY
265 . O
266 / \
267 O O
268 / \
269 O O
4c97fd1a 270
d7c0a89a
QY
271 struct rnh
272 {
273 uint8_t flags;
274 struct route_entry *state;
275 struct list *client_list;
276 struct route_node *node;
277 };
b22ba015
QY
278
279User interface changes
75ca3b11 280^^^^^^^^^^^^^^^^^^^^^^
b22ba015
QY
281
282::
283
284 frr# show ip nht
285 3.3.3.3
286 resolved via kernel
287 via 11.0.0.6, swp1
288 Client list: bgp(fd 12)
289 11.0.0.10
290 resolved via connected
291 is directly connected, swp2
292 Client list: bgp(fd 12)
293 11.0.0.18
294 resolved via connected
295 is directly connected, swp4
296 Client list: bgp(fd 12)
297 11.11.11.11
298 resolved via kernel
299 via 10.0.1.2, eth0
300 Client list: bgp(fd 12)
76bd1499 301
b22ba015
QY
302 frr# show ip bgp nexthop
303 Current BGP nexthop cache:
304 3.3.3.3 valid [IGP metric 0], #paths 3
305 Last update: Wed Oct 16 04:43:49 2013
76bd1499 306
b22ba015
QY
307 11.0.0.10 valid [IGP metric 1], #paths 1
308 Last update: Wed Oct 16 04:43:51 2013
76bd1499 309
b22ba015
QY
310 11.0.0.18 valid [IGP metric 1], #paths 2
311 Last update: Wed Oct 16 04:43:47 2013
76bd1499 312
b22ba015
QY
313 11.11.11.11 valid [IGP metric 0], #paths 1
314 Last update: Wed Oct 16 04:43:47 2013
76bd1499 315
b22ba015
QY
316 frr# show ipv6 nht
317 frr# show ip bgp nexthop detail
76bd1499 318
b22ba015
QY
319 frr# debug bgp nht
320 frr# debug zebra nht
76bd1499 321
b22ba015 322 6. Sample test cases
76bd1499 323
b22ba015
QY
324 r2----r3
325 / \ /
326 r1----r4
76bd1499 327
b22ba015
QY
328 - Verify that a change in IGP cost triggers NHT
329 + shutdown the r1-r4 and r2-r4 links
330 + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back
331 up
332 + We should be back to the original nexthop via r4 now
333 - Verify that a NH becoming unreachable triggers NHT
334 + Shutdown all links to r4
335 - Verify that a NH becoming reachable triggers NHT
336 + no shut all links to r4
337
338Future work
75ca3b11 339^^^^^^^^^^^
b22ba015
QY
340
341- route-policy for next hop validation (e.g. ignore default route)
342- damping for rapid next hop changes
343- prioritized handling of nexthop changes ((un)reachability vs. metric
344 changes)
345- handling recursion loop, e.g::
346
347 11.11.11.11/32 -> 12.12.12.12
348 12.12.12.12/32 -> 11.11.11.11
349 11.0.0.0/8 -> <interface>
350- better statistics