]>
Commit | Line | Data |
---|---|---|
b22ba015 QY |
1 | Next Hop Tracking |
2 | ================== | |
3 | ||
4 | Next hop tracking is an optimization feature that reduces the processing time | |
5 | involved in the BGP bestpath algorithm by monitoring changes to the routing | |
6 | table. | |
7 | ||
8 | Background | |
9 | ----------- | |
10 | ||
11 | Recursive routes are of the form: | |
12 | ||
13 | :: | |
14 | ||
15 | p/m --> n | |
16 | [Ex: 1.1.0.0/16 --> 2.2.2.2] | |
17 | ||
18 | where 'n' itself is resolved through another route as follows: | |
19 | ||
20 | :: | |
21 | ||
22 | p2/m --> h, interface | |
23 | [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0] | |
24 | ||
25 | Usually, BGP routes are recursive in nature and BGP nexthops get resolved | |
26 | through an IGP route. IGP usually adds its routes pointing to an interface | |
27 | (these are called non-recursive routes). | |
28 | ||
29 | When BGP receives a recursive route from a peer, it needs to validate the | |
30 | nexthop. The path is marked valid or invalid based on the reachability status | |
31 | of the nexthop. Nexthop validation is also important for BGP decision process | |
32 | as the metric to reach the nexthop is a parameter to best path selection | |
33 | process. | |
34 | ||
35 | As it goes with routing, this is a dynamic process. Route to the nexthop can | |
36 | change. The nexthop can become unreachable or reachable. In the current BGP | |
37 | implementation, the nexthop validation is done periodically in the scanner run. | |
38 | The default scanner run interval is one minute. Every minute, the scanner task | |
39 | walks the entire BGP table. It checks the validity of each nexthop with Zebra | |
40 | (the routing table manager) through a request and response message exchange | |
41 | between BGP and Zebra process. BGP process is blocked for that duration. The | |
42 | mechanism has two major drawbacks: | |
43 | ||
44 | - The scanner task runs to completion. That can potentially starve the other | |
45 | tasks for long periods of time, based on the BGP table size and number of | |
46 | nexthops. | |
47 | ||
48 | - Convergence around routing changes that affect the nexthops can be long | |
49 | (around a minute with the default intervals). The interval can be shortened | |
50 | to achieve faster reaction time, but it makes the first problem worse, with | |
51 | the scanner task consuming most of the CPU resources. | |
52 | ||
53 | The next-hop tracking feature makes this process event-driven. It eliminates | |
54 | periodic nexthop validation and introduces an asynchronous communication path | |
55 | between BGP and Zebra for route change notifications that can then be acted | |
56 | upon. | |
57 | ||
58 | Goal | |
59 | ---- | |
60 | ||
61 | Stating the obvious, the main goal is to remove the two limitations we | |
62 | discussed in the previous section. The goals, in a constructive tone, | |
63 | are the following: | |
64 | ||
65 | - **Fairness**: the scanner run should not consume an unjustly high amount of | |
66 | CPU time. This should give an overall good performance and response time to | |
67 | other events (route changes, session events, IO/user interface). | |
68 | ||
69 | - **Convergence**: BGP must react to nexthop changes instantly and provide | |
70 | sub-second convergence. This may involve diverting the routes from one | |
71 | nexthop to another. | |
72 | ||
73 | Overview of changes | |
74 | ------------------------ | |
75 | ||
76 | The changes are in both BGP and Zebra modules. The short summary is | |
77 | the following: | |
78 | ||
79 | - Zebra implements a registration mechanism by which clients can | |
80 | register for next hop notification. Consequently, it maintains a | |
81 | separate table, per (VRF, AF) pair, of next hops and interested | |
82 | client-list per next hop. | |
83 | ||
84 | - When the main routing table changes in Zebra, it evaluates the next | |
85 | hop table: for each next hop, it checks if the route table | |
86 | modifications have changed its state. If so, it notifies the | |
87 | interested clients. | |
88 | ||
89 | - BGP is one such client. It registers the next hops corresponding to | |
90 | all of its received routes/paths. It also threads the paths against | |
91 | each nexthop structure. | |
92 | ||
93 | - When BGP receives a next hop notification from Zebra, it walks the | |
94 | corresponding path list. It makes them valid or invalid depending | |
95 | on the next hop notification. It then re-computes best path for the | |
96 | corresponding destination. This may result in re-announcing those | |
97 | destinations to peers. | |
98 | ||
99 | Design | |
100 | ------ | |
101 | ||
102 | Modules | |
75ca3b11 | 103 | ^^^^^^^ |
b22ba015 QY |
104 | |
105 | The core design introduces an "nht" (next hop tracking) module in BGP | |
106 | and "rnh" (recursive nexthop) module in Zebra. The "nht" module | |
107 | provides the following APIs: | |
108 | ||
109 | +----------------------------+--------------------------------------------------+ | |
110 | | Function | Action | | |
111 | +============================+==================================================+ | |
112 | | bgp_find_or_add_nexthop() | find or add a nexthop in BGP nexthop table | | |
113 | +----------------------------+--------------------------------------------------+ | |
b22ba015 QY |
114 | | bgp_parse_nexthop_update() | parse a nexthop update message coming from zebra | |
115 | +----------------------------+--------------------------------------------------+ | |
116 | ||
117 | The "rnh" module provides the following APIs: | |
118 | ||
119 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
120 | | Function | Action | | |
121 | +============================+==========================================================================================================+ | |
122 | | zebra_add_rnh() | add a recursive nexthop | | |
123 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
124 | | zebra_delete_rnh() | delete a recursive nexthop | | |
125 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
126 | | zebra_lookup_rnh() | lookup a recursive nexthop | | |
127 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
128 | | zebra_add_rnh_client() | register a client for nexthop notifications against a recursive nexthop | | |
129 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
130 | | zebra_remove_rnh_client() | remove the client registration for a recursive nexthop | | |
131 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
132 | | zebra_evaluate_rnh_table() | (re)evaluate the recursive nexthop table (most probably because the main routing table has changed). | | |
133 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
134 | | zebra_cleanup_rnh_client() | Cleanup a client from the "rnh" module data structures (most probably because the client is going away). | | |
135 | +----------------------------+----------------------------------------------------------------------------------------------------------+ | |
136 | ||
137 | 4.2. Control flow | |
138 | ||
139 | The next hop registration control flow is the following: | |
140 | ||
141 | :: | |
142 | ||
143 | <==== BGP Process ====>|<==== Zebra Process ====> | |
144 | | | |
145 | receive module nht module | zserv module rnh module | |
146 | ---------------------------------------------------------------------- | |
147 | | | | | |
148 | bgp_update_ | | | | |
149 | main() | bgp_find_or_add_ | | | |
150 | | nexthop() | | | |
151 | | | | | |
152 | | | zserv_nexthop_ | | |
153 | | | register() | | |
154 | | | | zebra_add_rnh() | |
155 | | | | | |
76bd1499 | 156 | |
b22ba015 QY |
157 | |
158 | The next hop notification control flow is the following: | |
159 | ||
160 | :: | |
161 | ||
162 | <==== Zebra Process ====>|<==== BGP Process ====> | |
163 | | | |
164 | rib module rnh module | zebra module nht module | |
165 | ---------------------------------------------------------------------- | |
166 | | | | | |
167 | meta_queue_ | | | | |
168 | process() | zebra_evaluate_ | | | |
169 | | rnh_table() | | | |
170 | | | | | |
171 | | | bgp_read_nexthop_ | | |
172 | | | update() | | |
173 | | | | bgp_parse_ | |
174 | | | | nexthop_update() | |
175 | | | | | |
176 | ||
177 | ||
178 | zclient message format | |
75ca3b11 | 179 | ^^^^^^^^^^^^^^^^^^^^^^ |
b22ba015 QY |
180 | |
181 | ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are | |
182 | encoded in the following way: | |
183 | ||
184 | :: | |
185 | ||
186 | . 0 1 2 3 | |
187 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |
188 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
189 | | AF | prefix len | | |
190 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
191 | . Nexthop prefix . | |
192 | . . | |
193 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
194 | . . | |
195 | . . | |
196 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
197 | | AF | prefix len | | |
198 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
199 | . Nexthop prefix . | |
200 | . . | |
201 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
76bd1499 | 202 | |
b22ba015 QY |
203 | |
204 | ``ZEBRA_NEXTHOP_UPDATE`` message is encoded as follows: | |
205 | ||
206 | :: | |
207 | ||
208 | . 0 1 2 3 | |
209 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |
210 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
211 | | AF | prefix len | | |
212 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
213 | . Nexthop prefix getting resolved . | |
214 | . . | |
215 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
216 | | metric | | |
217 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
218 | | #nexthops | | |
219 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
220 | | nexthop type | | |
221 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
222 | . resolving Nexthop details . | |
223 | . . | |
224 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
225 | . . | |
226 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
227 | | nexthop type | | |
228 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
229 | . resolving Nexthop details . | |
230 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
231 | ||
232 | ||
233 | BGP data structure | |
75ca3b11 | 234 | ^^^^^^^^^^^^^^^^^^ |
b22ba015 QY |
235 | Legend: |
236 | ||
237 | :: | |
238 | ||
239 | /\ struct bgp_node: a BGP destination/route/prefix | |
240 | \/ | |
76bd1499 | 241 | |
4b7e6066 | 242 | [ ] struct bgp_path_info: a BGP path (e.g. route received from a peer) |
76bd1499 | 243 | |
b22ba015 QY |
244 | _ |
245 | (_) struct bgp_nexthop_cache: a BGP nexthop | |
76bd1499 | 246 | |
b22ba015 QY |
247 | /\ NULL |
248 | \/--+ ^ | |
249 | | : | |
250 | +--[ ]--[ ]--[ ]--> NULL | |
251 | /\ : | |
252 | \/--+ : | |
253 | | : | |
254 | +--[ ]--[ ]--> NULL | |
255 | : | |
256 | _ : | |
257 | (_)........... | |
258 | ||
259 | ||
260 | Zebra data structure | |
75ca3b11 | 261 | ^^^^^^^^^^^^^^^^^^^^ |
b22ba015 QY |
262 | |
263 | RNH table:: | |
264 | ||
d7c0a89a QY |
265 | . O |
266 | / \ | |
267 | O O | |
268 | / \ | |
269 | O O | |
4c97fd1a | 270 | |
d7c0a89a QY |
271 | struct rnh |
272 | { | |
273 | uint8_t flags; | |
274 | struct route_entry *state; | |
275 | struct list *client_list; | |
276 | struct route_node *node; | |
277 | }; | |
b22ba015 QY |
278 | |
279 | User interface changes | |
75ca3b11 | 280 | ^^^^^^^^^^^^^^^^^^^^^^ |
b22ba015 QY |
281 | |
282 | :: | |
283 | ||
284 | frr# show ip nht | |
285 | 3.3.3.3 | |
286 | resolved via kernel | |
287 | via 11.0.0.6, swp1 | |
288 | Client list: bgp(fd 12) | |
289 | 11.0.0.10 | |
290 | resolved via connected | |
291 | is directly connected, swp2 | |
292 | Client list: bgp(fd 12) | |
293 | 11.0.0.18 | |
294 | resolved via connected | |
295 | is directly connected, swp4 | |
296 | Client list: bgp(fd 12) | |
297 | 11.11.11.11 | |
298 | resolved via kernel | |
299 | via 10.0.1.2, eth0 | |
300 | Client list: bgp(fd 12) | |
76bd1499 | 301 | |
b22ba015 QY |
302 | frr# show ip bgp nexthop |
303 | Current BGP nexthop cache: | |
304 | 3.3.3.3 valid [IGP metric 0], #paths 3 | |
305 | Last update: Wed Oct 16 04:43:49 2013 | |
76bd1499 | 306 | |
b22ba015 QY |
307 | 11.0.0.10 valid [IGP metric 1], #paths 1 |
308 | Last update: Wed Oct 16 04:43:51 2013 | |
76bd1499 | 309 | |
b22ba015 QY |
310 | 11.0.0.18 valid [IGP metric 1], #paths 2 |
311 | Last update: Wed Oct 16 04:43:47 2013 | |
76bd1499 | 312 | |
b22ba015 QY |
313 | 11.11.11.11 valid [IGP metric 0], #paths 1 |
314 | Last update: Wed Oct 16 04:43:47 2013 | |
76bd1499 | 315 | |
b22ba015 QY |
316 | frr# show ipv6 nht |
317 | frr# show ip bgp nexthop detail | |
76bd1499 | 318 | |
b22ba015 QY |
319 | frr# debug bgp nht |
320 | frr# debug zebra nht | |
76bd1499 | 321 | |
b22ba015 | 322 | 6. Sample test cases |
76bd1499 | 323 | |
b22ba015 QY |
324 | r2----r3 |
325 | / \ / | |
326 | r1----r4 | |
76bd1499 | 327 | |
b22ba015 QY |
328 | - Verify that a change in IGP cost triggers NHT |
329 | + shutdown the r1-r4 and r2-r4 links | |
330 | + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back | |
331 | up | |
332 | + We should be back to the original nexthop via r4 now | |
333 | - Verify that a NH becoming unreachable triggers NHT | |
334 | + Shutdown all links to r4 | |
335 | - Verify that a NH becoming reachable triggers NHT | |
336 | + no shut all links to r4 | |
337 | ||
338 | Future work | |
75ca3b11 | 339 | ^^^^^^^^^^^ |
b22ba015 QY |
340 | |
341 | - route-policy for next hop validation (e.g. ignore default route) | |
342 | - damping for rapid next hop changes | |
343 | - prioritized handling of nexthop changes ((un)reachability vs. metric | |
344 | changes) | |
345 | - handling recursion loop, e.g:: | |
346 | ||
347 | 11.11.11.11/32 -> 12.12.12.12 | |
348 | 12.12.12.12/32 -> 11.11.11.11 | |
349 | 11.0.0.0/8 -> <interface> | |
350 | - better statistics |