]> git.proxmox.com Git - mirror_frr.git/blob - doc/developer/next-hop-tracking.rst
zebra: Cleanup lines over 80 columns
[mirror_frr.git] / doc / developer / next-hop-tracking.rst
1 Next Hop Tracking
2 ==================
3
4 Next hop tracking is an optimization feature that reduces the processing time
5 involved in the BGP bestpath algorithm by monitoring changes to the routing
6 table.
7
8 Background
9 -----------
10
11 Recursive routes are of the form:
12
13 ::
14
15 p/m --> n
16 [Ex: 1.1.0.0/16 --> 2.2.2.2]
17
18 where 'n' itself is resolved through another route as follows:
19
20 ::
21
22 p2/m --> h, interface
23 [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0]
24
25 Usually, BGP routes are recursive in nature and BGP nexthops get resolved
26 through an IGP route. IGP usually adds its routes pointing to an interface
27 (these are called non-recursive routes).
28
29 When BGP receives a recursive route from a peer, it needs to validate the
30 nexthop. The path is marked valid or invalid based on the reachability status
31 of the nexthop. Nexthop validation is also important for BGP decision process
32 as the metric to reach the nexthop is a parameter to best path selection
33 process.
34
35 As it goes with routing, this is a dynamic process. Route to the nexthop can
36 change. The nexthop can become unreachable or reachable. In the current BGP
37 implementation, the nexthop validation is done periodically in the scanner run.
38 The default scanner run interval is one minute. Every minute, the scanner task
39 walks the entire BGP table. It checks the validity of each nexthop with Zebra
40 (the routing table manager) through a request and response message exchange
41 between BGP and Zebra process. BGP process is blocked for that duration. The
42 mechanism has two major drawbacks:
43
44 - The scanner task runs to completion. That can potentially starve the other
45 tasks for long periods of time, based on the BGP table size and number of
46 nexthops.
47
48 - Convergence around routing changes that affect the nexthops can be long
49 (around a minute with the default intervals). The interval can be shortened
50 to achieve faster reaction time, but it makes the first problem worse, with
51 the scanner task consuming most of the CPU resources.
52
53 The next-hop tracking feature makes this process event-driven. It eliminates
54 periodic nexthop validation and introduces an asynchronous communication path
55 between BGP and Zebra for route change notifications that can then be acted
56 upon.
57
58 Goal
59 ----
60
61 Stating the obvious, the main goal is to remove the two limitations we
62 discussed in the previous section. The goals, in a constructive tone,
63 are the following:
64
65 - **Fairness**: the scanner run should not consume an unjustly high amount of
66 CPU time. This should give an overall good performance and response time to
67 other events (route changes, session events, IO/user interface).
68
69 - **Convergence**: BGP must react to nexthop changes instantly and provide
70 sub-second convergence. This may involve diverting the routes from one
71 nexthop to another.
72
73 Overview of changes
74 ------------------------
75
76 The changes are in both BGP and Zebra modules. The short summary is
77 the following:
78
79 - Zebra implements a registration mechanism by which clients can
80 register for next hop notification. Consequently, it maintains a
81 separate table, per (VRF, AF) pair, of next hops and interested
82 client-list per next hop.
83
84 - When the main routing table changes in Zebra, it evaluates the next
85 hop table: for each next hop, it checks if the route table
86 modifications have changed its state. If so, it notifies the
87 interested clients.
88
89 - BGP is one such client. It registers the next hops corresponding to
90 all of its received routes/paths. It also threads the paths against
91 each nexthop structure.
92
93 - When BGP receives a next hop notification from Zebra, it walks the
94 corresponding path list. It makes them valid or invalid depending
95 on the next hop notification. It then re-computes best path for the
96 corresponding destination. This may result in re-announcing those
97 destinations to peers.
98
99 Design
100 ------
101
102 Modules
103 ~~~~~~~
104
105 The core design introduces an "nht" (next hop tracking) module in BGP
106 and "rnh" (recursive nexthop) module in Zebra. The "nht" module
107 provides the following APIs:
108
109 +----------------------------+--------------------------------------------------+
110 | Function | Action |
111 +============================+==================================================+
112 | bgp_find_or_add_nexthop() | find or add a nexthop in BGP nexthop table |
113 +----------------------------+--------------------------------------------------+
114 | bgp_find_nexthop() | find a nexthop in BGP nexthop table |
115 +----------------------------+--------------------------------------------------+
116 | bgp_parse_nexthop_update() | parse a nexthop update message coming from zebra |
117 +----------------------------+--------------------------------------------------+
118
119 The "rnh" module provides the following APIs:
120
121 +----------------------------+----------------------------------------------------------------------------------------------------------+
122 | Function | Action |
123 +============================+==========================================================================================================+
124 | zebra_add_rnh() | add a recursive nexthop |
125 +----------------------------+----------------------------------------------------------------------------------------------------------+
126 | zebra_delete_rnh() | delete a recursive nexthop |
127 +----------------------------+----------------------------------------------------------------------------------------------------------+
128 | zebra_lookup_rnh() | lookup a recursive nexthop |
129 +----------------------------+----------------------------------------------------------------------------------------------------------+
130 | zebra_add_rnh_client() | register a client for nexthop notifications against a recursive nexthop |
131 +----------------------------+----------------------------------------------------------------------------------------------------------+
132 | zebra_remove_rnh_client() | remove the client registration for a recursive nexthop |
133 +----------------------------+----------------------------------------------------------------------------------------------------------+
134 | zebra_evaluate_rnh_table() | (re)evaluate the recursive nexthop table (most probably because the main routing table has changed). |
135 +----------------------------+----------------------------------------------------------------------------------------------------------+
136 | zebra_cleanup_rnh_client() | Cleanup a client from the "rnh" module data structures (most probably because the client is going away). |
137 +----------------------------+----------------------------------------------------------------------------------------------------------+
138
139 4.2. Control flow
140
141 The next hop registration control flow is the following:
142
143 ::
144
145 <==== BGP Process ====>|<==== Zebra Process ====>
146 |
147 receive module nht module | zserv module rnh module
148 ----------------------------------------------------------------------
149 | | |
150 bgp_update_ | | |
151 main() | bgp_find_or_add_ | |
152 | nexthop() | |
153 | | |
154 | | zserv_nexthop_ |
155 | | register() |
156 | | | zebra_add_rnh()
157 | | |
158
159
160 The next hop notification control flow is the following:
161
162 ::
163
164 <==== Zebra Process ====>|<==== BGP Process ====>
165 |
166 rib module rnh module | zebra module nht module
167 ----------------------------------------------------------------------
168 | | |
169 meta_queue_ | | |
170 process() | zebra_evaluate_ | |
171 | rnh_table() | |
172 | | |
173 | | bgp_read_nexthop_ |
174 | | update() |
175 | | | bgp_parse_
176 | | | nexthop_update()
177 | | |
178
179
180 zclient message format
181 ~~~~~~~~~~~~~~~~~~~~~~
182
183 ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are
184 encoded in the following way:
185
186 ::
187
188 . 0 1 2 3
189 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
190 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
191 | AF | prefix len |
192 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
193 . Nexthop prefix .
194 . .
195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
196 . .
197 . .
198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
199 | AF | prefix len |
200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
201 . Nexthop prefix .
202 . .
203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
204
205
206 ``ZEBRA_NEXTHOP_UPDATE`` message is encoded as follows:
207
208 ::
209
210 . 0 1 2 3
211 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
213 | AF | prefix len |
214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
215 . Nexthop prefix getting resolved .
216 . .
217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
218 | metric |
219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
220 | #nexthops |
221 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
222 | nexthop type |
223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
224 . resolving Nexthop details .
225 . .
226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
227 . .
228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
229 | nexthop type |
230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
231 . resolving Nexthop details .
232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
233
234
235 BGP data structure
236 ~~~~~~~~~~~~~~~~~~
237 Legend:
238
239 ::
240
241 /\ struct bgp_node: a BGP destination/route/prefix
242 \/
243
244 [ ] struct bgp_info: a BGP path (e.g. route received from a peer)
245
246 _
247 (_) struct bgp_nexthop_cache: a BGP nexthop
248
249 /\ NULL
250 \/--+ ^
251 | :
252 +--[ ]--[ ]--[ ]--> NULL
253 /\ :
254 \/--+ :
255 | :
256 +--[ ]--[ ]--> NULL
257 :
258 _ :
259 (_)...........
260
261
262 Zebra data structure
263 ~~~~~~~~~~~~~~~~~~~~
264
265 RNH table::
266
267 . O
268 / \
269 O O
270 / \
271 O O
272
273 struct rnh
274 {
275 uint8_t flags;
276 struct route_entry *state;
277 struct list *client_list;
278 struct route_node *node;
279 };
280
281 User interface changes
282 ~~~~~~~~~~~~~~~~~~~~~~
283
284 ::
285
286 frr# show ip nht
287 3.3.3.3
288 resolved via kernel
289 via 11.0.0.6, swp1
290 Client list: bgp(fd 12)
291 11.0.0.10
292 resolved via connected
293 is directly connected, swp2
294 Client list: bgp(fd 12)
295 11.0.0.18
296 resolved via connected
297 is directly connected, swp4
298 Client list: bgp(fd 12)
299 11.11.11.11
300 resolved via kernel
301 via 10.0.1.2, eth0
302 Client list: bgp(fd 12)
303
304 frr# show ip bgp nexthop
305 Current BGP nexthop cache:
306 3.3.3.3 valid [IGP metric 0], #paths 3
307 Last update: Wed Oct 16 04:43:49 2013
308
309 11.0.0.10 valid [IGP metric 1], #paths 1
310 Last update: Wed Oct 16 04:43:51 2013
311
312 11.0.0.18 valid [IGP metric 1], #paths 2
313 Last update: Wed Oct 16 04:43:47 2013
314
315 11.11.11.11 valid [IGP metric 0], #paths 1
316 Last update: Wed Oct 16 04:43:47 2013
317
318 frr# show ipv6 nht
319 frr# show ip bgp nexthop detail
320
321 frr# debug bgp nht
322 frr# debug zebra nht
323
324 6. Sample test cases
325
326 r2----r3
327 / \ /
328 r1----r4
329
330 - Verify that a change in IGP cost triggers NHT
331 + shutdown the r1-r4 and r2-r4 links
332 + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back
333 up
334 + We should be back to the original nexthop via r4 now
335 - Verify that a NH becoming unreachable triggers NHT
336 + Shutdown all links to r4
337 - Verify that a NH becoming reachable triggers NHT
338 + no shut all links to r4
339
340 Future work
341 ~~~~~~~~~~~
342
343 - route-policy for next hop validation (e.g. ignore default route)
344 - damping for rapid next hop changes
345 - prioritized handling of nexthop changes ((un)reachability vs. metric
346 changes)
347 - handling recursion loop, e.g::
348
349 11.11.11.11/32 -> 12.12.12.12
350 12.12.12.12/32 -> 11.11.11.11
351 11.0.0.0/8 -> <interface>
352 - better statistics