]> git.proxmox.com Git - mirror_frr.git/blob - doc/developer/logging.rst
Merge pull request #7764 from pguibert6WIND/nhrp_shortcut_routes
[mirror_frr.git] / doc / developer / logging.rst
1 .. _logging:
2
3 Logging
4 =======
5
6 One of the most frequent decisions to make while writing code for FRR is what
7 to log, what level to log it at, and when to log it. Here is a list of
8 recommendations for these decisions.
9
10
11 printfrr()
12 ----------
13
14 ``printfrr()`` is FRR's modified version of ``printf()``, designed to make
15 life easier when printing nontrivial datastructures. The following variants
16 are available:
17
18 .. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
19 .. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
20
21 These correspond to ``snprintf``/``vsnprintf``. If you pass NULL for buf
22 or 0 for len, no output is written but the return value is still calculated.
23
24 The return value is always the full length of the output, unconstrained by
25 `len`. It does **not** include the terminating ``\0`` character. A
26 malformed format string can result in a ``-1`` return value.
27
28 .. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
29 .. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
30
31 Same as above, but the ``c`` stands for "continue" or "concatenate". The
32 output is appended to the string instead of overwriting it.
33
34 .. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
35 .. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
36
37 These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
38 that. If the format string is malformed, they return a copy of the format
39 string, so the return value is always non-NULL and always dynamically
40 allocated with `mt`.
41
42 .. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
43 .. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
44
45 This variant tries to use the static buffer provided, but falls back to
46 dynamic allocation if it is insufficient.
47
48 The return value can be either `buf` or a newly allocated string using
49 `mt`. You MUST free it like this::
50
51 char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
52 if (ret != buf)
53 XFREE(MTYPE_FOO, ret);
54
55 Extensions
56 ^^^^^^^^^^
57
58 ``printfrr()`` format strings can be extended with suffixes after `%p` or
59 `%d`. The following extended format specifiers are available:
60
61 +-----------+--------------------------+----------------------------------------------+
62 | Specifier | Argument | Output |
63 +===========+==========================+==============================================+
64 | ``%Lu`` | ``uint64_t`` | ``12345`` |
65 +-----------+--------------------------+----------------------------------------------+
66 | ``%Ld`` | ``int64_t`` | ``-12345`` |
67 +-----------+--------------------------+----------------------------------------------+
68 | ``%pI4`` | ``struct in_addr *`` | ``1.2.3.4`` |
69 | | | |
70 | | ``in_addr_t *`` | |
71 +-----------+--------------------------+----------------------------------------------+
72 | ``%pI6`` | ``struct in6_addr *`` | ``fe80::1234`` |
73 +-----------+--------------------------+----------------------------------------------+
74 | ``%pFX`` | ``struct prefix *`` | ``fe80::1234/64`` |
75 +-----------+--------------------------+----------------------------------------------+
76 | ``%pSG4`` | ``struct prefix_sg *`` | ``(*,1.2.3.4)`` |
77 +-----------+--------------------------+----------------------------------------------+
78 | ``%pRN`` | ``struct route_node *`` | ``192.168.1.0/24`` (dst-only node) |
79 | | | |
80 | | | ``2001:db8::/32 from fe80::/64`` (SADR node) |
81 +-----------+--------------------------+----------------------------------------------+
82 | ``%pNHv`` | ``struct nexthop *`` | ``1.2.3.4, via eth0`` |
83 +-----------+--------------------------+----------------------------------------------+
84 | ``%pNHs`` | ``struct nexthop *`` | ``1.2.3.4 if 15`` |
85 +-----------+--------------------------+----------------------------------------------+
86 | ``%pFX`` | ``struct bgp_dest *`` | ``fe80::1234/64`` (available in BGP only) |
87 +-----------+--------------------------+----------------------------------------------+
88
89 Printf features like field lengths can be used normally with these extensions,
90 e.g. ``%-15pI4`` works correctly.
91
92 The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
93 by means of established pattern uppercase letters and numbers form the type
94 identifier which may be followed by lowercase flags.
95
96 You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
97 printers and what exactly they do. More printers are likely to be added as
98 needed/useful, so the list above may become outdated.
99
100 ``%Ld`` is not an "extension" for printfrr; it's wired directly into the main
101 printf logic.
102
103 .. note::
104
105 The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
106 internally, so these extensions are available there. However, they are
107 **not** available when calling ``snprintf`` directly. You need to call
108 ``snprintfrr`` instead.
109
110 AS-Safety
111 ^^^^^^^^^
112
113 ``printfrr()`` are AS-Safe under the following conditions:
114
115 * the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
116 * floating point specifiers are not AS-Safe (system printf is used for these)
117 * the positional ``%1$d`` syntax should not be used (8 arguments are supported
118 while AS-Safe)
119 * extensions are only AS-Safe if their printer is AS-Safe
120
121 Log levels
122 ----------
123
124 Errors and warnings
125 ^^^^^^^^^^^^^^^^^^^
126
127 If it is something that the user will want to look at and maybe do
128 something, it is either an **error** or a **warning**.
129
130 We're expecting that warnings and errors are in some way visible to the
131 user (in the worst case by looking at the log after the network broke, but
132 maybe by a syslog collector from all routers.) Therefore, anything that
133 needs to get the user in the loop—and only these things—are warnings or
134 errors.
135
136 Note that this doesn't necessarily mean the user needs to fix something in
137 the FRR instance. It also includes when we detect something else needs
138 fixing, for example another router, the system we're running on, or the
139 configuration. The common point is that the user should probably do
140 *something*.
141
142 Deciding between a warning and an error is slightly less obvious; the rule
143 of thumb here is that an error will cause considerable fallout beyond its
144 direct effect. Closing a BGP session due to a malformed update is an error
145 since all routes from the peer are dropped; discarding one route because
146 its attributes don't make sense is a warning.
147
148 This also loosely corresponds to the kind of reaction we're expecting from
149 the user. An error is likely to need immediate response while a warning
150 might be snoozed for a bit and addressed as part of general maintenance.
151 If a problem will self-repair (e.g. by retransmits), it should be a
152 warning—unless the impact until that self-repair is very harsh.
153
154 Examples for warnings:
155
156 * a BGP update, LSA or LSP could not be processed, but operation is
157 proceeding and the broken pieces are likely to self-fix later
158 * some kind of controller cannot be reached, but we can work without it
159 * another router is using some unknown or unsupported capability
160
161 Examples for errors:
162
163 * dropping a BGP session due to malformed data
164 * a socket for routing protocol operation cannot be opened
165 * desynchronization from network state because something went wrong
166 * *everything that we as developers would really like to be notified about,
167 i.e. some assumption in the code isn't holding up*
168
169
170 Informational messages
171 ^^^^^^^^^^^^^^^^^^^^^^
172
173 Anything that provides introspection to the user during normal operation
174 is an **info** message.
175
176 This includes all kinds of operational state transitions and events,
177 especially if they might be interesting to the user during the course of
178 figuring out a warning or an error.
179
180 By itself, these messages should mostly be statements of fact. They might
181 indicate the order and relationship in which things happened. Also covered
182 are conditions that might be "operational issues" like a link failure due
183 to an unplugged cable. If it's pretty much the point of running a routing
184 daemon for, it's not a warning or an error, just business as usual.
185
186 The user should be able to see the state of these bits from operational
187 state output, i.e. `show interface` or `show foobar neighbors`. The log
188 message indicating the change may have been printed weeks ago, but the
189 state can always be viewed. (If some state change has an info message but
190 no "show" command, maybe that command needs to be added.)
191
192 Examples:
193
194 * all kinds of up/down state changes
195
196 * interface coming up or going down
197 * addresses being added or deleted
198 * peers and neighbors coming up or going down
199
200 * rejection of some routes due to user-configured route maps
201 * backwards compatibility handling because another system on the network
202 has a different or smaller feature set
203
204 .. note::
205 The previously used **notify** priority is replaced with *info* in all
206 cases. We don't currently have a well-defined use case for it.
207
208
209 Debug messages and asserts
210 ^^^^^^^^^^^^^^^^^^^^^^^^^^
211
212 Everything that is only interesting on-demand, or only while developing,
213 is a **debug** message. It might be interesting to the user for a
214 particularly evasive issue, but in general these are details that an
215 average user might not even be able to make sense of.
216
217 Most (or all?) debug messages should be behind a `debug foobar` category
218 switch that controls which subset of these messages is currently
219 interesting and thus printed. If a debug message doesn't have such a
220 guard, there should be a good explanation as to why.
221
222 Conversely, debug messages are the only thing that should be guarded by
223 these switches. Neither info nor warning or error messages should be
224 hidden in this way.
225
226 **Asserts** should only be used as pretty crashes. We are expecting that
227 asserts remain enabled in production builds, but please try to not use
228 asserts in a way that would cause a security problem if the assert wasn't
229 there (i.e. don't use them for length checks.)
230
231 The purpose of asserts is mainly to help development and bug hunting. If
232 the daemon crashes, then having some more information is nice, and the
233 assert can provide crucial hints that cut down on the time needed to track
234 an issue. That said, if the issue can be reasonably handled and/or isn't
235 going to crash the daemon, it shouldn't be an assert.
236
237 For anything else where internal constraints are violated but we're not
238 breaking due to it, it's an error instead (not a debug.) These require
239 "user action" of notifying the developers.
240
241 Examples:
242
243 * mismatched :code:`prev`/:code:`next` pointers in lists
244 * some field that is absolutely needed is :code:`NULL`
245 * any other kind of data structure corruption that will cause the daemon
246 to crash sooner or later, one way or another
247
248 Thread-local buffering
249 ----------------------
250
251 The core logging code in :file:`lib/zlog.c` allows setting up per-thread log
252 message buffers in order to improve logging performance. The following rules
253 apply for this buffering:
254
255 * Only messages of priority *DEBUG* or *INFO* are buffered.
256 * Any higher-priority message causes the thread's entire buffer to be flushed,
257 thus message ordering is preserved on a per-thread level.
258 * There is no guarantee on ordering between different threads; in most cases
259 this is arbitrary to begin with since the threads essentially race each
260 other in printing log messages. If an order is established with some
261 synchronization primitive, add calls to :c:func:`zlog_tls_buffer_flush()`.
262 * The buffers are only ever accessed by the thread they are created by. This
263 means no locking is necessary.
264
265 Both the main/default thread and additional threads created by
266 :c:func:`frr_pthread_new()` with the default :c:func:`frr_run()` handler will
267 initialize thread-local buffering and call :c:func:`zlog_tls_buffer_flush()`
268 when idle.
269
270 If some piece of code runs for an extended period, it may be useful to insert
271 calls to :c:func:`zlog_tls_buffer_flush()` in appropriate places:
272
273 .. c:function:: void zlog_tls_buffer_flush(void)
274
275 Write out any pending log messages that the calling thread may have in its
276 buffer. This function is safe to call regardless of the per-thread log
277 buffer being set up / in use or not.
278
279 When working with threads that do not use the :c:type:`struct thread_master`
280 event loop, per-thread buffers can be managed with:
281
282 .. c:function:: void zlog_tls_buffer_init(void)
283
284 Set up thread-local buffering for log messages. This function may be
285 called repeatedly without adverse effects, but remember to call
286 :c:func:`zlog_tls_buffer_fini()` at thread exit.
287
288 .. warning::
289
290 If this function is called, but :c:func:`zlog_tls_buffer_flush()` is
291 not used, log message output will lag behind since messages will only be
292 written out when the buffer is full.
293
294 Exiting the thread without calling :c:func:`zlog_tls_buffer_fini()`
295 will cause buffered log messages to be lost.
296
297 .. c:function:: void zlog_tls_buffer_fini(void)
298
299 Flush pending messages and tear down thread-local log message buffering.
300 This function may be called repeatedly regardless of whether
301 :c:func:`zlog_tls_buffer_init()` was ever called.
302
303 Log targets
304 -----------
305
306 The actual logging subsystem (in :file:`lib/zlog.c`) is heavily separated
307 from the actual log writers. It uses an atomic linked-list (`zlog_targets`)
308 with RCU to maintain the log targets to be called. This list is intended to
309 function as "backend" only, it **is not used for configuration**.
310
311 Logging targets provide their configuration layer on top of this and maintain
312 their own capability to enumerate and store their configuration. Some targets
313 (e.g. syslog) are inherently single instance and just stuff their config in
314 global variables. Others (e.g. file/fd output) are multi-instance capable.
315 There is another layer boundary here between these and the VTY configuration
316 that they use.
317
318 Basic internals
319 ^^^^^^^^^^^^^^^
320
321 .. c:type:: struct zlog_target
322
323 This struct needs to be filled in by any log target and then passed to
324 :c:func:`zlog_target_replace()`. After it has been registered,
325 **RCU semantics apply**. Most changes to associated data should make a
326 copy, change that, and then replace the entire struct.
327
328 Additional per-target data should be "appended" by embedding this struct
329 into a larger one, for use with `containerof()`, and
330 :c:func:`zlog_target_clone()` and :c:func:`zlog_target_free()` should be
331 used to allocate/free the entire container struct.
332
333 Do not use this structure to maintain configuration. It should only
334 contain (a copy of) the data needed to perform the actual logging. For
335 example, the syslog target uses this:
336
337 .. code-block:: c
338
339 struct zlt_syslog {
340 struct zlog_target zt;
341 int syslog_facility;
342 };
343
344 static void zlog_syslog(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsgs)
345 {
346 struct zlt_syslog *zte = container_of(zt, struct zlt_syslog, zt);
347 size_t i;
348
349 for (i = 0; i < nmsgs; i++)
350 if (zlog_msg_prio(msgs[i]) <= zt->prio_min)
351 syslog(zlog_msg_prio(msgs[i]) | zte->syslog_facility, "%s",
352 zlog_msg_text(msgs[i], NULL));
353 }
354
355
356 .. c:function:: struct zlog_target *zlog_target_clone(struct memtype *mt, struct zlog_target *oldzt, size_t size)
357
358 Allocates a logging target struct. Note that the ``oldzt`` argument may be
359 ``NULL`` to allocate a "from scratch". If ``oldzt`` is not ``NULL``, the
360 generic bits in :c:type:`struct zlog_target` are copied. **Target specific
361 bits are not copied.**
362
363 .. c:function:: struct zlog_target *zlog_target_replace(struct zlog_target *oldzt, struct zlog_target *newzt)
364
365 Adds, replaces or deletes a logging target (either ``oldzt`` or ``newzt`` may be ``NULL``.)
366
367 Returns ``oldzt`` for freeing. The target remains possibly in use by
368 other threads until the RCU cycle ends. This implies you cannot release
369 resources (e.g. memory, file descriptors) immediately.
370
371 The replace operation is not atomic; for a brief period it is possible that
372 messages are delivered on both ``oldzt`` and ``newzt``.
373
374 .. warning::
375
376 ``oldzt`` must remain **functional** until the RCU cycle ends.
377
378 .. c:function:: void zlog_target_free(struct memtype *mt, struct zlog_target *zt)
379
380 Counterpart to :c:func:`zlog_target_clone()`, frees a target (using RCU.)
381
382 .. c:member:: void (*zlog_target.logfn)(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsg)
383
384 Called on a target to deliver "normal" logging messages. ``msgs`` is an
385 array of opaque structs containing the actual message. Use ``zlog_msg_*``
386 functions to access message data (this is done to allow some optimizations,
387 e.g. lazy formatting the message text and timestamp as needed.)
388
389 .. note::
390
391 ``logfn()`` must check each individual message's priority value against
392 the configured ``prio_min``. While the ``prio_min`` field is common to
393 all targets and used by the core logging code to early-drop unneeded log
394 messages, the array is **not** filtered for each ``logfn()`` call.
395
396 .. c:member:: void (*zlog_target.logfn_sigsafe)(struct zlog_target *zt, const char *text, size_t len)
397
398 Called to deliver "exception" logging messages (i.e. SEGV messages.)
399 Must be Async-Signal-Safe (may not allocate memory or call "complicated"
400 libc functions.) May be ``NULL`` if the log target cannot handle this.
401
402 Standard targets
403 ^^^^^^^^^^^^^^^^
404
405 :file:`lib/zlog_targets.c` provides the standard file / fd / syslog targets.
406 The syslog target is single-instance while file / fd targets can be
407 instantiated as needed. There are 3 built-in targets that are fully
408 autonomous without any config:
409
410 - startup logging to `stderr`, until either :c:func:`zlog_startup_end()` or
411 :c:func:`zlog_aux_init()` is called.
412 - stdout logging for non-daemon programs using :c:func:`zlog_aux_init()`
413 - crashlogs written to :file:`/var/tmp/frr.daemon.crashlog`
414
415 The regular CLI/command-line logging setup is handled by :file:`lib/log_vty.c`
416 which makes the appropriate instantiations of syslog / file / fd targets.
417
418 .. todo::
419
420 :c:func:`zlog_startup_end()` should do an explicit switchover from
421 startup stderr logging to configured logging. Currently, configured logging
422 starts in parallel as soon as the respective setup is executed. This results
423 in some duplicate logging.