]> git.proxmox.com Git - mirror_frr.git/blob - doc/developer/logging.rst
Merge pull request #10391 from gromit1811/bug_9720_ecmp_inter_area_topotest_3paths
[mirror_frr.git] / doc / developer / logging.rst
1 .. _logging:
2
3 .. highlight:: c
4
5 Logging
6 =======
7
8 One of the most frequent decisions to make while writing code for FRR is what
9 to log, what level to log it at, and when to log it. Here is a list of
10 recommendations for these decisions.
11
12
13 printfrr()
14 ----------
15
16 ``printfrr()`` is FRR's modified version of ``printf()``, designed to make
17 life easier when printing nontrivial datastructures. The following variants
18 are available:
19
20 .. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
21 .. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
22
23 These correspond to ``snprintf``/``vsnprintf``. If you pass NULL for buf
24 or 0 for len, no output is written but the return value is still calculated.
25
26 The return value is always the full length of the output, unconstrained by
27 `len`. It does **not** include the terminating ``\0`` character. A
28 malformed format string can result in a ``-1`` return value.
29
30 .. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
31 .. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
32
33 Same as above, but the ``c`` stands for "continue" or "concatenate". The
34 output is appended to the string instead of overwriting it.
35
36 .. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
37 .. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
38
39 These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
40 that. If the format string is malformed, they return a copy of the format
41 string, so the return value is always non-NULL and always dynamically
42 allocated with `mt`.
43
44 .. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
45 .. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
46
47 This variant tries to use the static buffer provided, but falls back to
48 dynamic allocation if it is insufficient.
49
50 The return value can be either `buf` or a newly allocated string using
51 `mt`. You MUST free it like this::
52
53 char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
54 if (ret != buf)
55 XFREE(MTYPE_FOO, ret);
56
57 .. c:function:: ssize_t bprintfrr(struct fbuf *fb, const char *fmt, ...)
58 .. c:function:: ssize_t vbprintfrr(struct fbuf *fb, const char *fmt, va_list)
59
60 These are the "lowest level" functions, which the other variants listed
61 above use to implement their functionality on top. Mainly useful for
62 implementing printfrr extensions since those get a ``struct fbuf *`` to
63 write their output to.
64
65 .. c:macro:: FMT_NSTD(expr)
66
67 This macro turns off/on format warnings as needed when non-ISO-C
68 compatible printfrr extensions are used (e.g. ``%.*p`` or ``%Ld``.)::
69
70 vty_out(vty, "standard compatible %pI4\n", &addr);
71 FMT_NSTD(vty_out(vty, "non-standard %-47.*pHX\n", (int)len, buf));
72
73 When the frr-format plugin is in use, this macro is a no-op since the
74 frr-format plugin supports all printfrr extensions. Since the FRR CI
75 includes a system with the plugin enabled, this means format errors will
76 not slip by undetected even with FMT_NSTD.
77
78 .. note::
79
80 ``printfrr()`` does not support the ``%n`` format.
81
82 AS-Safety
83 ^^^^^^^^^
84
85 ``printfrr()`` are AS-Safe under the following conditions:
86
87 * the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
88 * floating point specifiers are not AS-Safe (system printf is used for these)
89 * the positional ``%1$d`` syntax should not be used (8 arguments are supported
90 while AS-Safe)
91 * extensions are only AS-Safe if their printer is AS-Safe
92
93 printfrr Extensions
94 -------------------
95
96 ``printfrr()`` format strings can be extended with suffixes after `%p` or `%d`.
97 Printf features like field lengths can be used normally with these extensions,
98 e.g. ``%-15pI4`` works correctly, **except if the extension consumes the
99 width or precision**. Extensions that do so are listed below as ``%*pXX``
100 rather than ``%pXX``.
101
102 The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
103 by means of established pattern uppercase letters and numbers form the type
104 identifier which may be followed by lowercase flags.
105
106 You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
107 printers and what exactly they do. More printers are likely to be added as
108 needed/useful, so the list here may be outdated.
109
110 .. note::
111
112 The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
113 internally, so these extensions are available there. However, they are
114 **not** available when calling ``snprintf`` directly. You need to call
115 ``snprintfrr`` instead.
116
117 Networking data types
118 ^^^^^^^^^^^^^^^^^^^^^
119
120 .. role:: frrfmtout(code)
121
122 .. frrfmt:: %pI4 (struct in_addr *, in_addr_t *)
123
124 :frrfmtout:`1.2.3.4`
125
126 ``%pI4s``: :frrfmtout:`*` — print star instead of ``0.0.0.0`` (for multicast)
127
128 .. frrfmt:: %pI6 (struct in6_addr *)
129
130 :frrfmtout:`fe80::1234`
131
132 ``%pI6s``: :frrfmtout:`*` — print star instead of ``::`` (for multicast)
133
134 .. frrfmt:: %pEA (struct ethaddr *)
135
136 :frrfmtout:`01:23:45:67:89:ab`
137
138 .. frrfmt:: %pIA (struct ipaddr *)
139
140 :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
141
142 ``%pIAs``: — print star instead of zero address (for multicast)
143
144 .. frrfmt:: %pFX (struct prefix *)
145
146 :frrfmtout:`1.2.3.0/24` / :frrfmtout:`fe80::1234/64`
147
148 This accepts the following types:
149
150 - :c:struct:`prefix`
151 - :c:struct:`prefix_ipv4`
152 - :c:struct:`prefix_ipv6`
153 - :c:struct:`prefix_eth`
154 - :c:struct:`prefix_evpn`
155 - :c:struct:`prefix_fs`
156
157 It does **not** accept the following types:
158
159 - :c:struct:`prefix_ls`
160 - :c:struct:`prefix_rd`
161 - :c:struct:`prefix_ptr`
162 - :c:struct:`prefix_sg` (use :frrfmt:`%pPSG4`)
163 - :c:union:`prefixptr` (dereference to get :c:struct:`prefix`)
164 - :c:union:`prefixconstptr` (dereference to get :c:struct:`prefix`)
165
166 .. frrfmt:: %pPSG4 (struct prefix_sg *)
167
168 :frrfmtout:`(*,1.2.3.4)`
169
170 This is *(S,G)* output for use in pimd. (Note prefix_sg is not a prefix
171 "subclass" like the other prefix_* structs.)
172
173 .. frrfmt:: %pSU (union sockunion *)
174
175 ``%pSU``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
176
177 ``%pSUs``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234%89`
178 (adds IPv6 scope ID as integer)
179
180 ``%pSUp``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234]:567`
181 (adds port)
182
183 ``%pSUps``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234%89]:567`
184 (adds port and scope ID)
185
186 .. frrfmt:: %pRN (struct route_node *, struct bgp_node *, struct agg_node *)
187
188 :frrfmtout:`192.168.1.0/24` (dst-only node)
189
190 :frrfmtout:`2001:db8::/32 from fe80::/64` (SADR node)
191
192 .. frrfmt:: %pNH (struct nexthop *)
193
194 ``%pNHvv``: :frrfmtout:`via 1.2.3.4, eth0` — verbose zebra format
195
196 ``%pNHv``: :frrfmtout:`1.2.3.4, via eth0` — slightly less verbose zebra format
197
198 ``%pNHs``: :frrfmtout:`1.2.3.4 if 15` — same as :c:func:`nexthop2str()`
199
200 ``%pNHcg``: :frrfmtout:`1.2.3.4` — compact gateway only
201
202 ``%pNHci``: :frrfmtout:`eth0` — compact interface only
203
204 .. frrfmt:: %pBD (struct bgp_dest *)
205
206 :frrfmtout:`fe80::1234/64`
207
208 (only available in bgpd.)
209
210 .. frrfmt:: %dPF (int)
211
212 :frrfmtout:`AF_INET`
213
214 Prints an `AF_*` / `PF_*` constant. ``PF`` is used here to avoid confusion
215 with `AFI` constants, even though the FRR codebase prefers `AF_INET` over
216 `PF_INET` & co.
217
218 .. frrfmt:: %dSO (int)
219
220 :frrfmtout:`SOCK_STREAM`
221
222 Time/interval formats
223 ^^^^^^^^^^^^^^^^^^^^^
224
225 .. frrfmt:: %pTS (struct timespec *)
226
227 .. frrfmt:: %pTV (struct timeval *)
228
229 .. frrfmt:: %pTT (time_t *)
230
231 Above 3 options internally result in the same code being called, support
232 the same flags and produce equal output with one exception: ``%pTT``
233 has no sub-second precision and the formatter will never print a
234 (nonsensical) ``.000``.
235
236 Exactly one of ``I``, ``M`` or ``R`` must immediately follow after
237 ``TS``/``TV``/``TT`` to specify whether the input is an interval, monotonic
238 timestamp or realtime timestamp:
239
240 ``%pTVI``: input is an interval, not a timestamp. Print interval.
241
242 ``%pTVIs``: input is an interval, convert to wallclock by subtracting it
243 from current time (i.e. interval has passed **s**\ ince.)
244
245 ``%pTVIu``: input is an interval, convert to wallclock by adding it to
246 current time (i.e. **u**\ ntil interval has passed.)
247
248 ``%pTVM`` - input is a timestamp on CLOCK_MONOTONIC, convert to wallclock
249 time (by grabbing current CLOCK_MONOTONIC and CLOCK_REALTIME and doing the
250 math) and print calendaric date.
251
252 ``%pTVMs`` - input is a timestamp on CLOCK_MONOTONIC, print interval
253 **s**\ ince that timestamp (elapsed.)
254
255 ``%pTVMu`` - input is a timestamp on CLOCK_MONOTONIC, print interval
256 **u**\ ntil that timestamp (deadline.)
257
258 ``%pTVR`` - input is a timestamp on CLOCK_REALTIME, print calendaric date.
259
260 ``%pTVRs`` - input is a timestamp on CLOCK_REALTIME, print interval
261 **s**\ ince that timestamp.
262
263 ``%pTVRu`` - input is a timestamp on CLOCK_REALTIME, print interval
264 **u**\ ntil that timestamp.
265
266 ``%pTVA`` - reserved for CLOCK_TAI in case a PTP implementation is
267 interfaced to FRR. Not currently implemented.
268
269 .. note::
270
271 If ``%pTVRs`` or ``%pTVRu`` are used, this is generally an indication
272 that a CLOCK_MONOTONIC timestamp should be used instead (or added in
273 parallel.) CLOCK_REALTIME might be adjusted by NTP, PTP or similar
274 procedures, causing bogus intervals to be printed.
275
276 ``%pTVM`` on first look might be assumed to have the same problem, but
277 on closer thought the assumption is always that current system time is
278 correct. And since a CLOCK_MONOTONIC interval is also quite safe to
279 assume to be correct, the (past) absolute timestamp to be printed from
280 this can likely be correct even if it doesn't match what CLOCK_REALTIME
281 would have indicated at that point in the past. This logic does,
282 however, not quite work for *future* times.
283
284 Generally speaking, almost all use cases in FRR should (and do) use
285 CLOCK_MONOTONIC (through :c:func:`monotime()`.)
286
287 Flags common to printing calendar times and intervals:
288
289 ``p``: include spaces in appropriate places (depends on selected format.)
290
291 ``%p.3TV...``: specify sub-second resolution (use with ``FMT_NSTD`` to
292 suppress gcc warning.) As noted above, ``%pTT`` will never print sub-second
293 digits since there are none. Only some formats support printing sub-second
294 digits and the default may vary.
295
296 The following flags are available for printing calendar times/dates:
297
298 (no flag): :frrfmtout:`Sat Jan 1 00:00:00 2022` - print output from
299 ``ctime()``, in local time zone. Since FRR does not currently use/enable
300 locale support, this is always the C locale. (Locale support getting added
301 is unlikely for the time being and would likely break other things worse
302 than this.)
303
304 ``i``: :frrfmtout:`2022-01-01T00:00:00.123` - ISO8601 timestamp in local
305 time zone (note there is no ``Z`` or ``+00:00`` suffix.) Defaults to
306 millisecond precision.
307
308 ``ip``: :frrfmtout:`2022-01-01 00:00:00.123` - use readable form of ISO8601
309 with space instead of ``T`` separator.
310
311 The following flags are available for printing intervals:
312
313 (no flag): :frrfmtout:`9w9d09:09:09.123` - does not match any
314 preexisting format; added because it does not lose precision (like ``t``)
315 for longer intervals without printing huge numbers (like ``h``/``m``).
316 Defaults to millisecond precision. The week/day fields are left off if
317 they're zero, ``p`` adds a space after the respective letter.
318
319 ``t``: :frrfmtout:`9w9d09h`, :frrfmtout:`9d09h09m`, :frrfmtout:`09:09:09` -
320 this replaces :c:func:`frrtime_to_interval()`. ``p`` adds spaces after
321 week/day/hour letters.
322
323 ``d``: print decimal number of seconds. Defaults to millisecond precision.
324
325 ``x`` / ``tx`` / ``dx``: Like no flag / ``t`` / ``d``, but print
326 :frrfmtout:`-` for zero or negative intervals (for use with unset timers.)
327
328 ``h``: :frrfmtout:`09:09:09`
329
330 ``hx``: :frrfmtout:`09:09:09`, :frrfmtout:`--:--:--` - this replaces
331 :c:func:`pim_time_timer_to_hhmmss()`.
332
333 ``m``: :frrfmtout:`09:09`
334
335 ``mx``: :frrfmtout:`09:09`, :frrfmtout:`--:--` - this replaces
336 :c:func:`pim_time_timer_to_mmss()`.
337
338 FRR library helper formats
339 ^^^^^^^^^^^^^^^^^^^^^^^^^^
340
341 .. frrfmt:: %pTH (struct thread *)
342
343 Print remaining time on timer thread. Interval-printing flag characters
344 listed above for ``%pTV`` can be added, e.g. ``%pTHtx``.
345
346 ``NULL`` pointers are printed as ``-``.
347
348 .. frrfmt:: %pTHD (struct thread *)
349
350 Print debugging information for given thread. Sample output:
351
352 .. code-block:: none
353
354 {(thread *)NULL}
355 {(thread *)0x55a3b5818910 arg=0x55a3b5827c50 timer r=7.824 mld_t_query() &mld_ifp->t_query from pimd/pim6_mld.c:1369}
356 {(thread *)0x55a3b5827230 arg=0x55a3b5827c50 read fd=16 mld_t_recv() &mld_ifp->t_recv from pimd/pim6_mld.c:1186}
357
358 (The output is aligned to some degree.)
359
360 General utility formats
361 ^^^^^^^^^^^^^^^^^^^^^^^
362
363 .. frrfmt:: %m (no argument)
364
365 :frrfmtout:`Permission denied`
366
367 Prints ``strerror(errno)``. Does **not** consume any input argument, don't
368 pass ``errno``!
369
370 (This is a GNU extension not specific to FRR. FRR guarantees it is
371 available on all systems in printfrr, though BSDs support it in printf too.)
372
373 .. frrfmt:: %pSQ (char *)
374
375 ([S]tring [Q]uote.) Like ``%s``, but produce a quoted string. Options:
376
377 ``n`` - treat ``NULL`` as empty string instead.
378
379 ``q`` - include ``""`` quotation marks. Note: ``NULL`` is printed as
380 ``(null)``, not ``"(null)"`` unless ``n`` is used too. This is
381 intentional.
382
383 ``s`` - use escaping suitable for RFC5424 syslog. This means ``]`` is
384 escaped too.
385
386 If a length is specified (``%*pSQ`` or ``%.*pSQ``), null bytes in the input
387 string do not end the string and are just printed as ``\x00``.
388
389 .. frrfmt:: %pSE (char *)
390
391 ([S]tring [E]scape.) Like ``%s``, but escape special characters.
392 Options:
393
394 ``n`` - treat ``NULL`` as empty string instead.
395
396 Unlike :frrfmt:`%pSQ`, this escapes many more characters that are fine for
397 a quoted string but not on their own.
398
399 If a length is specified (``%*pSE`` or ``%.*pSE``), null bytes in the input
400 string do not end the string and are just printed as ``\x00``.
401
402 .. frrfmt:: %pVA (struct va_format *)
403
404 Recursively invoke printfrr, with arguments passed in through:
405
406 .. c:struct:: va_format
407
408 .. c:member:: const char *fmt
409
410 Format string to use for the recursive printfrr call.
411
412 .. c:member:: va_list *va
413
414 Formatting arguments. Note this is passed as a pointer, not - as in
415 most other places - a direct struct reference. Internally uses
416 ``va_copy()`` so repeated calls can be made (e.g. for determining
417 output length.)
418
419 .. frrfmt:: %pFB (struct fbuf *)
420
421 Insert text from a ``struct fbuf *``, i.e. the output of a call to
422 :c:func:`bprintfrr()`.
423
424 .. frrfmt:: %*pHX (void *, char *, unsigned char *)
425
426 ``%pHX``: :frrfmtout:`12 34 56 78`
427
428 ``%pHXc``: :frrfmtout:`12:34:56:78` (separate with [c]olon)
429
430 ``%pHXn``: :frrfmtout:`12345678` (separate with [n]othing)
431
432 Insert hexdump. This specifier requires a precision or width to be
433 specified. A precision (``%.*pHX``) takes precedence, but generates a
434 compiler warning since precisions are undefined for ``%p`` in ISO C. If
435 no precision is given, the width is used instead (and normal handling of
436 the width is suppressed).
437
438 Note that width and precision are ``int`` arguments, not ``size_t``. Use
439 like::
440
441 char *buf;
442 size_t len;
443
444 snprintfrr(out, sizeof(out), "... %*pHX ...", (int)len, buf);
445
446 /* with padding to width - would generate a warning due to %.*p */
447 FMT_NSTD(snprintfrr(out, sizeof(out), "... %-47.*pHX ...", (int)len, buf));
448
449 .. frrfmt:: %*pHS (void *, char *, unsigned char *)
450
451 ``%pHS``: :frrfmtout:`hex.dump`
452
453 This is a complementary format for :frrfmt:`%*pHX` to print the text
454 representation for a hexdump. Non-printable characters are replaced with
455 a dot.
456
457 Integer formats
458 ^^^^^^^^^^^^^^^
459
460 .. note::
461
462 These formats currently only exist for advanced type checking with the
463 ``frr-format`` GCC plugin. They should not be used directly since they will
464 cause compiler warnings when used without the plugin. Use with
465 :c:macro:`FMT_NSTD` if necessary.
466
467 It is possible ISO C23 may introduce another format for these, possibly
468 ``%w64d`` discussed in `JTC 1/SC 22/WG 14/N2680 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf>`_.
469
470 .. frrfmt:: %Lu (uint64_t)
471
472 :frrfmtout:`12345`
473
474 .. frrfmt:: %Ld (int64_t)
475
476 :frrfmtout:`-12345`
477
478 Log levels
479 ----------
480
481 Errors and warnings
482 ^^^^^^^^^^^^^^^^^^^
483
484 If it is something that the user will want to look at and maybe do
485 something, it is either an **error** or a **warning**.
486
487 We're expecting that warnings and errors are in some way visible to the
488 user (in the worst case by looking at the log after the network broke, but
489 maybe by a syslog collector from all routers.) Therefore, anything that
490 needs to get the user in the loop—and only these things—are warnings or
491 errors.
492
493 Note that this doesn't necessarily mean the user needs to fix something in
494 the FRR instance. It also includes when we detect something else needs
495 fixing, for example another router, the system we're running on, or the
496 configuration. The common point is that the user should probably do
497 *something*.
498
499 Deciding between a warning and an error is slightly less obvious; the rule
500 of thumb here is that an error will cause considerable fallout beyond its
501 direct effect. Closing a BGP session due to a malformed update is an error
502 since all routes from the peer are dropped; discarding one route because
503 its attributes don't make sense is a warning.
504
505 This also loosely corresponds to the kind of reaction we're expecting from
506 the user. An error is likely to need immediate response while a warning
507 might be snoozed for a bit and addressed as part of general maintenance.
508 If a problem will self-repair (e.g. by retransmits), it should be a
509 warning—unless the impact until that self-repair is very harsh.
510
511 Examples for warnings:
512
513 * a BGP update, LSA or LSP could not be processed, but operation is
514 proceeding and the broken pieces are likely to self-fix later
515 * some kind of controller cannot be reached, but we can work without it
516 * another router is using some unknown or unsupported capability
517
518 Examples for errors:
519
520 * dropping a BGP session due to malformed data
521 * a socket for routing protocol operation cannot be opened
522 * desynchronization from network state because something went wrong
523 * *everything that we as developers would really like to be notified about,
524 i.e. some assumption in the code isn't holding up*
525
526
527 Informational messages
528 ^^^^^^^^^^^^^^^^^^^^^^
529
530 Anything that provides introspection to the user during normal operation
531 is an **info** message.
532
533 This includes all kinds of operational state transitions and events,
534 especially if they might be interesting to the user during the course of
535 figuring out a warning or an error.
536
537 By itself, these messages should mostly be statements of fact. They might
538 indicate the order and relationship in which things happened. Also covered
539 are conditions that might be "operational issues" like a link failure due
540 to an unplugged cable. If it's pretty much the point of running a routing
541 daemon for, it's not a warning or an error, just business as usual.
542
543 The user should be able to see the state of these bits from operational
544 state output, i.e. `show interface` or `show foobar neighbors`. The log
545 message indicating the change may have been printed weeks ago, but the
546 state can always be viewed. (If some state change has an info message but
547 no "show" command, maybe that command needs to be added.)
548
549 Examples:
550
551 * all kinds of up/down state changes
552
553 * interface coming up or going down
554 * addresses being added or deleted
555 * peers and neighbors coming up or going down
556
557 * rejection of some routes due to user-configured route maps
558 * backwards compatibility handling because another system on the network
559 has a different or smaller feature set
560
561 .. note::
562 The previously used **notify** priority is replaced with *info* in all
563 cases. We don't currently have a well-defined use case for it.
564
565
566 Debug messages and asserts
567 ^^^^^^^^^^^^^^^^^^^^^^^^^^
568
569 Everything that is only interesting on-demand, or only while developing,
570 is a **debug** message. It might be interesting to the user for a
571 particularly evasive issue, but in general these are details that an
572 average user might not even be able to make sense of.
573
574 Most (or all?) debug messages should be behind a `debug foobar` category
575 switch that controls which subset of these messages is currently
576 interesting and thus printed. If a debug message doesn't have such a
577 guard, there should be a good explanation as to why.
578
579 Conversely, debug messages are the only thing that should be guarded by
580 these switches. Neither info nor warning or error messages should be
581 hidden in this way.
582
583 **Asserts** should only be used as pretty crashes. We are expecting that
584 asserts remain enabled in production builds, but please try to not use
585 asserts in a way that would cause a security problem if the assert wasn't
586 there (i.e. don't use them for length checks.)
587
588 The purpose of asserts is mainly to help development and bug hunting. If
589 the daemon crashes, then having some more information is nice, and the
590 assert can provide crucial hints that cut down on the time needed to track
591 an issue. That said, if the issue can be reasonably handled and/or isn't
592 going to crash the daemon, it shouldn't be an assert.
593
594 For anything else where internal constraints are violated but we're not
595 breaking due to it, it's an error instead (not a debug.) These require
596 "user action" of notifying the developers.
597
598 Examples:
599
600 * mismatched :code:`prev`/:code:`next` pointers in lists
601 * some field that is absolutely needed is :code:`NULL`
602 * any other kind of data structure corruption that will cause the daemon
603 to crash sooner or later, one way or another
604
605 Thread-local buffering
606 ----------------------
607
608 The core logging code in :file:`lib/zlog.c` allows setting up per-thread log
609 message buffers in order to improve logging performance. The following rules
610 apply for this buffering:
611
612 * Only messages of priority *DEBUG* or *INFO* are buffered.
613 * Any higher-priority message causes the thread's entire buffer to be flushed,
614 thus message ordering is preserved on a per-thread level.
615 * There is no guarantee on ordering between different threads; in most cases
616 this is arbitrary to begin with since the threads essentially race each
617 other in printing log messages. If an order is established with some
618 synchronization primitive, add calls to :c:func:`zlog_tls_buffer_flush()`.
619 * The buffers are only ever accessed by the thread they are created by. This
620 means no locking is necessary.
621
622 Both the main/default thread and additional threads created by
623 :c:func:`frr_pthread_new()` with the default :c:func:`frr_run()` handler will
624 initialize thread-local buffering and call :c:func:`zlog_tls_buffer_flush()`
625 when idle.
626
627 If some piece of code runs for an extended period, it may be useful to insert
628 calls to :c:func:`zlog_tls_buffer_flush()` in appropriate places:
629
630 .. c:function:: void zlog_tls_buffer_flush(void)
631
632 Write out any pending log messages that the calling thread may have in its
633 buffer. This function is safe to call regardless of the per-thread log
634 buffer being set up / in use or not.
635
636 When working with threads that do not use the :c:struct:`thread_master`
637 event loop, per-thread buffers can be managed with:
638
639 .. c:function:: void zlog_tls_buffer_init(void)
640
641 Set up thread-local buffering for log messages. This function may be
642 called repeatedly without adverse effects, but remember to call
643 :c:func:`zlog_tls_buffer_fini()` at thread exit.
644
645 .. warning::
646
647 If this function is called, but :c:func:`zlog_tls_buffer_flush()` is
648 not used, log message output will lag behind since messages will only be
649 written out when the buffer is full.
650
651 Exiting the thread without calling :c:func:`zlog_tls_buffer_fini()`
652 will cause buffered log messages to be lost.
653
654 .. c:function:: void zlog_tls_buffer_fini(void)
655
656 Flush pending messages and tear down thread-local log message buffering.
657 This function may be called repeatedly regardless of whether
658 :c:func:`zlog_tls_buffer_init()` was ever called.
659
660 Log targets
661 -----------
662
663 The actual logging subsystem (in :file:`lib/zlog.c`) is heavily separated
664 from the actual log writers. It uses an atomic linked-list (`zlog_targets`)
665 with RCU to maintain the log targets to be called. This list is intended to
666 function as "backend" only, it **is not used for configuration**.
667
668 Logging targets provide their configuration layer on top of this and maintain
669 their own capability to enumerate and store their configuration. Some targets
670 (e.g. syslog) are inherently single instance and just stuff their config in
671 global variables. Others (e.g. file/fd output) are multi-instance capable.
672 There is another layer boundary here between these and the VTY configuration
673 that they use.
674
675 Basic internals
676 ^^^^^^^^^^^^^^^
677
678 .. c:struct:: zlog_target
679
680 This struct needs to be filled in by any log target and then passed to
681 :c:func:`zlog_target_replace()`. After it has been registered,
682 **RCU semantics apply**. Most changes to associated data should make a
683 copy, change that, and then replace the entire struct.
684
685 Additional per-target data should be "appended" by embedding this struct
686 into a larger one, for use with `containerof()`, and
687 :c:func:`zlog_target_clone()` and :c:func:`zlog_target_free()` should be
688 used to allocate/free the entire container struct.
689
690 Do not use this structure to maintain configuration. It should only
691 contain (a copy of) the data needed to perform the actual logging. For
692 example, the syslog target uses this:
693
694 .. code-block:: c
695
696 struct zlt_syslog {
697 struct zlog_target zt;
698 int syslog_facility;
699 };
700
701 static void zlog_syslog(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsgs)
702 {
703 struct zlt_syslog *zte = container_of(zt, struct zlt_syslog, zt);
704 size_t i;
705
706 for (i = 0; i < nmsgs; i++)
707 if (zlog_msg_prio(msgs[i]) <= zt->prio_min)
708 syslog(zlog_msg_prio(msgs[i]) | zte->syslog_facility, "%s",
709 zlog_msg_text(msgs[i], NULL));
710 }
711
712
713 .. c:function:: struct zlog_target *zlog_target_clone(struct memtype *mt, struct zlog_target *oldzt, size_t size)
714
715 Allocates a logging target struct. Note that the ``oldzt`` argument may be
716 ``NULL`` to allocate a "from scratch". If ``oldzt`` is not ``NULL``, the
717 generic bits in :c:struct:`zlog_target` are copied. **Target specific
718 bits are not copied.**
719
720 .. c:function:: struct zlog_target *zlog_target_replace(struct zlog_target *oldzt, struct zlog_target *newzt)
721
722 Adds, replaces or deletes a logging target (either ``oldzt`` or ``newzt`` may be ``NULL``.)
723
724 Returns ``oldzt`` for freeing. The target remains possibly in use by
725 other threads until the RCU cycle ends. This implies you cannot release
726 resources (e.g. memory, file descriptors) immediately.
727
728 The replace operation is not atomic; for a brief period it is possible that
729 messages are delivered on both ``oldzt`` and ``newzt``.
730
731 .. warning::
732
733 ``oldzt`` must remain **functional** until the RCU cycle ends.
734
735 .. c:function:: void zlog_target_free(struct memtype *mt, struct zlog_target *zt)
736
737 Counterpart to :c:func:`zlog_target_clone()`, frees a target (using RCU.)
738
739 .. c:member:: void (*zlog_target.logfn)(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsg)
740
741 Called on a target to deliver "normal" logging messages. ``msgs`` is an
742 array of opaque structs containing the actual message. Use ``zlog_msg_*``
743 functions to access message data (this is done to allow some optimizations,
744 e.g. lazy formatting the message text and timestamp as needed.)
745
746 .. note::
747
748 ``logfn()`` must check each individual message's priority value against
749 the configured ``prio_min``. While the ``prio_min`` field is common to
750 all targets and used by the core logging code to early-drop unneeded log
751 messages, the array is **not** filtered for each ``logfn()`` call.
752
753 .. c:member:: void (*zlog_target.logfn_sigsafe)(struct zlog_target *zt, const char *text, size_t len)
754
755 Called to deliver "exception" logging messages (i.e. SEGV messages.)
756 Must be Async-Signal-Safe (may not allocate memory or call "complicated"
757 libc functions.) May be ``NULL`` if the log target cannot handle this.
758
759 Standard targets
760 ^^^^^^^^^^^^^^^^
761
762 :file:`lib/zlog_targets.c` provides the standard file / fd / syslog targets.
763 The syslog target is single-instance while file / fd targets can be
764 instantiated as needed. There are 3 built-in targets that are fully
765 autonomous without any config:
766
767 - startup logging to `stderr`, until either :c:func:`zlog_startup_end()` or
768 :c:func:`zlog_aux_init()` is called.
769 - stdout logging for non-daemon programs using :c:func:`zlog_aux_init()`
770 - crashlogs written to :file:`/var/tmp/frr.daemon.crashlog`
771
772 The regular CLI/command-line logging setup is handled by :file:`lib/log_vty.c`
773 which makes the appropriate instantiations of syslog / file / fd targets.
774
775 .. todo::
776
777 :c:func:`zlog_startup_end()` should do an explicit switchover from
778 startup stderr logging to configured logging. Currently, configured logging
779 starts in parallel as soon as the respective setup is executed. This results
780 in some duplicate logging.