doc/developer/logging.rst

   1 .. _logging:
   2
   3 .. highlight:: c
   4
   5 Logging
   6 =======
   7
   8 One of the most frequent decisions to make while writing code for FRR is what
   9 to log, what level to log it at, and when to log it.  Here is a list of
  10 recommendations for these decisions.
  11
  12
  13 printfrr()
  14 ----------
  15
  16 ``printfrr()`` is FRR's modified version of ``printf()``, designed to make
  17 life easier when printing nontrivial datastructures.  The following variants
  18 are available:
  19
  20 .. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
  21 .. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  22
  23    These correspond to ``snprintf``/``vsnprintf``.  If you pass NULL for buf
  24    or 0 for len, no output is written but the return value is still calculated.
  25
  26    The return value is always the full length of the output, unconstrained by
  27    `len`.  It does **not** include the terminating ``\0`` character.  A
  28    malformed format string can result in a ``-1`` return value.
  29
  30 .. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
  31 .. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  32
  33    Same as above, but the ``c`` stands for "continue" or "concatenate".  The
  34    output is appended to the string instead of overwriting it.
  35
  36 .. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
  37 .. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
  38
  39    These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
  40    that.  If the format string is malformed, they return a copy of the format
  41    string, so the return value is always non-NULL and always dynamically
  42    allocated with `mt`.
  43
  44 .. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
  45 .. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
  46
  47    This variant tries to use the static buffer provided, but falls back to
  48    dynamic allocation if it is insufficient.
  49
  50    The return value can be either `buf` or a newly allocated string using
  51    `mt`.  You MUST free it like this::
  52
  53       char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
  54       if (ret != buf)
  55          XFREE(MTYPE_FOO, ret);
  56
  57 .. c:function:: ssize_t bprintfrr(struct fbuf *fb, const char *fmt, ...)
  58 .. c:function:: ssize_t vbprintfrr(struct fbuf *fb, const char *fmt, va_list)
  59
  60    These are the "lowest level" functions, which the other variants listed
  61    above use to implement their functionality on top.  Mainly useful for
  62    implementing printfrr extensions since those get a ``struct fbuf *`` to
  63    write their output to.
  64
  65 .. c:macro:: FMT_NSTD(expr)
  66
  67    This macro turns off/on format warnings as needed when non-ISO-C
  68    compatible printfrr extensions are used (e.g. ``%.*p`` or ``%Ld``.)::
  69
  70       vty_out(vty, "standard compatible %pI4\n", &addr);
  71       FMT_NSTD(vty_out(vty, "non-standard %-47.*pHX\n", (int)len, buf));
  72
  73    When the frr-format plugin is in use, this macro is a no-op since the
  74    frr-format plugin supports all printfrr extensions.  Since the FRR CI
  75    includes a system with the plugin enabled, this means format errors will
  76    not slip by undetected even with FMT_NSTD.
  77
  78 .. note::
  79
  80    ``printfrr()`` does not support the ``%n`` format.
  81
  82 AS-Safety
  83 ^^^^^^^^^
  84
  85 ``printfrr()`` are AS-Safe under the following conditions:
  86
  87 * the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
  88 * floating point specifiers are not AS-Safe (system printf is used for these)
  89 * the positional ``%1$d`` syntax should not be used (8 arguments are supported
  90   while AS-Safe)
  91 * extensions are only AS-Safe if their printer is AS-Safe
  92
  93 printfrr Extensions
  94 -------------------
  95
  96 ``printfrr()`` format strings can be extended with suffixes after `%p` or `%d`.
  97 Printf features like field lengths can be used normally with these extensions,
  98 e.g. ``%-15pI4`` works correctly, **except if the extension consumes the
  99 width or precision**.  Extensions that do so are listed below as ``%*pXX``
 100 rather than ``%pXX``.
 101
 102 The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
 103 by means of established pattern uppercase letters and numbers form the type
 104 identifier which may be followed by lowercase flags.
 105
 106 You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
 107 printers and what exactly they do.  More printers are likely to be added as
 108 needed/useful, so the list here may be outdated.
 109
 110 .. note::
 111
 112    The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
 113    internally, so these extensions are available there.  However, they are
 114    **not** available when calling ``snprintf`` directly.  You need to call
 115    ``snprintfrr`` instead.
 116
 117 Networking data types
 118 ^^^^^^^^^^^^^^^^^^^^^
 119
 120 .. role:: frrfmtout(code)
 121
 122 .. frrfmt:: %pI4 (struct in_addr *, in_addr_t *)
 123
 124    :frrfmtout:`1.2.3.4`
 125
 126    ``%pI4s``: :frrfmtout:`*` — print star instead of ``0.0.0.0`` (for multicast)
 127
 128 .. frrfmt:: %pI6 (struct in6_addr *)
 129
 130    :frrfmtout:`fe80::1234`
 131
 132    ``%pI6s``: :frrfmtout:`*` — print star instead of ``::`` (for multicast)
 133
 134 .. frrfmt:: %pEA (struct ethaddr *)
 135
 136    :frrfmtout:`01:23:45:67:89:ab`
 137
 138 .. frrfmt:: %pIA (struct ipaddr *)
 139
 140    :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 141
 142    ``%pIAs``: — print star instead of zero address (for multicast)
 143
 144 .. frrfmt:: %pFX (struct prefix *)
 145
 146    :frrfmtout:`1.2.3.0/24` / :frrfmtout:`fe80::1234/64`
 147
 148    This accepts the following types:
 149
 150    - :c:struct:`prefix`
 151    - :c:struct:`prefix_ipv4`
 152    - :c:struct:`prefix_ipv6`
 153    - :c:struct:`prefix_eth`
 154    - :c:struct:`prefix_evpn`
 155    - :c:struct:`prefix_fs`
 156
 157    It does **not** accept the following types:
 158
 159    - :c:struct:`prefix_ls`
 160    - :c:struct:`prefix_rd`
 161    - :c:struct:`prefix_ptr`
 162    - :c:struct:`prefix_sg` (use :frrfmt:`%pPSG4`)
 163    - :c:union:`prefixptr` (dereference to get :c:struct:`prefix`)
 164    - :c:union:`prefixconstptr` (dereference to get :c:struct:`prefix`)
 165
 166 .. frrfmt:: %pPSG4 (struct prefix_sg *)
 167
 168    :frrfmtout:`(*,1.2.3.4)`
 169
 170    This is *(S,G)* output for use in pimd.  (Note prefix_sg is not a prefix
 171    "subclass" like the other prefix_* structs.)
 172
 173 .. frrfmt:: %pSU (union sockunion *)
 174
 175    ``%pSU``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 176
 177    ``%pSUs``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234%89`
 178    (adds IPv6 scope ID as integer)
 179
 180    ``%pSUp``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234]:567`
 181    (adds port)
 182
 183    ``%pSUps``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234%89]:567`
 184    (adds port and scope ID)
 185
 186 .. frrfmt:: %pRN (struct route_node *, struct bgp_node *, struct agg_node *)
 187
 188    :frrfmtout:`192.168.1.0/24` (dst-only node)
 189
 190    :frrfmtout:`2001:db8::/32 from fe80::/64` (SADR node)
 191
 192 .. frrfmt:: %pNH (struct nexthop *)
 193
 194    ``%pNHvv``: :frrfmtout:`via 1.2.3.4, eth0` — verbose zebra format
 195
 196    ``%pNHv``: :frrfmtout:`1.2.3.4, via eth0` — slightly less verbose zebra format
 197
 198    ``%pNHs``: :frrfmtout:`1.2.3.4 if 15` — same as :c:func:`nexthop2str()`
 199
 200    ``%pNHcg``: :frrfmtout:`1.2.3.4` — compact gateway only
 201
 202    ``%pNHci``: :frrfmtout:`eth0` — compact interface only
 203
 204 .. frrfmt:: %pBD (struct bgp_dest *)
 205
 206    :frrfmtout:`fe80::1234/64`
 207
 208    (only available in bgpd.)
 209
 210 .. frrfmt:: %dPF (int)
 211
 212    :frrfmtout:`AF_INET`
 213
 214    Prints an `AF_*` / `PF_*` constant.  ``PF`` is used here to avoid confusion
 215    with `AFI` constants, even though the FRR codebase prefers `AF_INET` over
 216    `PF_INET` & co.
 217
 218 .. frrfmt:: %dSO (int)
 219
 220    :frrfmtout:`SOCK_STREAM`
 221
 222 Time/interval formats
 223 ^^^^^^^^^^^^^^^^^^^^^
 224
 225 .. frrfmt:: %pTS (struct timespec *)
 226
 227 .. frrfmt:: %pTV (struct timeval *)
 228
 229 .. frrfmt:: %pTT (time_t *)
 230
 231    Above 3 options internally result in the same code being called, support
 232    the same flags and produce equal output with one exception:  ``%pTT``
 233    has no sub-second precision and the formatter will never print a
 234    (nonsensical) ``.000``.
 235
 236    Exactly one of ``I``, ``M`` or ``R`` must immediately follow after
 237    ``TS``/``TV``/``TT`` to specify whether the input is an interval, monotonic
 238    timestamp or realtime timestamp:
 239
 240    ``%pTVI``: input is an interval, not a timestamp.  Print interval.
 241
 242    ``%pTVIs``: input is an interval, convert to wallclock by subtracting it
 243    from current time (i.e. interval has passed **s**\ ince.)
 244
 245    ``%pTVIu``: input is an interval, convert to wallclock by adding it to
 246    current time (i.e. **u**\ ntil interval has passed.)
 247
 248    ``%pTVM`` - input is a timestamp on CLOCK_MONOTONIC, convert to wallclock
 249    time (by grabbing current CLOCK_MONOTONIC and CLOCK_REALTIME and doing the
 250    math) and print calendaric date.
 251
 252    ``%pTVMs`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 253    **s**\ ince that timestamp (elapsed.)
 254
 255    ``%pTVMu`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 256    **u**\ ntil that timestamp (deadline.)
 257
 258    ``%pTVR`` - input is a timestamp on CLOCK_REALTIME, print calendaric date.
 259
 260    ``%pTVRs`` - input is a timestamp on CLOCK_REALTIME, print interval
 261    **s**\ ince that timestamp.
 262
 263    ``%pTVRu`` - input is a timestamp on CLOCK_REALTIME, print interval
 264    **u**\ ntil that timestamp.
 265
 266    ``%pTVA`` - reserved for CLOCK_TAI in case a PTP implementation is
 267    interfaced to FRR.  Not currently implemented.
 268
 269    .. note::
 270
 271       If ``%pTVRs`` or ``%pTVRu`` are used, this is generally an indication
 272       that a CLOCK_MONOTONIC timestamp should be used instead (or added in
 273       parallel.) CLOCK_REALTIME might be adjusted by NTP, PTP or similar
 274       procedures, causing bogus intervals to be printed.
 275
 276       ``%pTVM`` on first look might be assumed to have the same problem, but
 277       on closer thought the assumption is always that current system time is
 278       correct.  And since a CLOCK_MONOTONIC interval is also quite safe to
 279       assume to be correct, the (past) absolute timestamp to be printed from
 280       this can likely be correct even if it doesn't match what CLOCK_REALTIME
 281       would have indicated at that point in the past.  This logic does,
 282       however, not quite work for *future* times.
 283
 284       Generally speaking, almost all use cases in FRR should (and do) use
 285       CLOCK_MONOTONIC (through :c:func:`monotime()`.)
 286
 287    Flags common to printing calendar times and intervals:
 288
 289    ``p``: include spaces in appropriate places (depends on selected format.)
 290
 291    ``%p.3TV...``: specify sub-second resolution (use with ``FMT_NSTD`` to
 292    suppress gcc warning.)  As noted above, ``%pTT`` will never print sub-second
 293    digits since there are none.  Only some formats support printing sub-second
 294    digits and the default may vary.
 295
 296    The following flags are available for printing calendar times/dates:
 297
 298    (no flag): :frrfmtout:`Sat Jan  1 00:00:00 2022` - print output from
 299    ``ctime()``, in local time zone.  Since FRR does not currently use/enable
 300    locale support, this is always the C locale.  (Locale support getting added
 301    is unlikely for the time being and would likely break other things worse
 302    than this.)
 303
 304    ``i``: :frrfmtout:`2022-01-01T00:00:00.123` - ISO8601 timestamp in local
 305    time zone (note there is no ``Z`` or ``+00:00`` suffix.)  Defaults to
 306    millisecond precision.
 307
 308    ``ip``: :frrfmtout:`2022-01-01 00:00:00.123` - use readable form of ISO8601
 309    with space instead of ``T`` separator.
 310
 311    The following flags are available for printing intervals:
 312
 313    (no flag): :frrfmtout:`9w9d09:09:09.123` - does not match any
 314    preexisting format;  added because it does not lose precision (like ``t``)
 315    for longer intervals without printing huge numbers (like ``h``/``m``).
 316    Defaults to millisecond precision.  The week/day fields are left off if
 317    they're zero, ``p`` adds a space after the respective letter.
 318
 319    ``t``: :frrfmtout:`9w9d09h`, :frrfmtout:`9d09h09m`, :frrfmtout:`09:09:09` -
 320    this replaces :c:func:`frrtime_to_interval()`.  ``p`` adds spaces after
 321    week/day/hour letters.
 322
 323    ``d``: print decimal number of seconds.  Defaults to millisecond precision.
 324
 325    ``x`` / ``tx`` / ``dx``: Like no flag / ``t`` / ``d``, but print
 326    :frrfmtout:`-` for zero or negative intervals (for use with unset timers.)
 327
 328    ``h``: :frrfmtout:`09:09:09`
 329
 330    ``hx``: :frrfmtout:`09:09:09`, :frrfmtout:`--:--:--` - this replaces
 331    :c:func:`pim_time_timer_to_hhmmss()`.
 332
 333    ``m``: :frrfmtout:`09:09`
 334
 335    ``mx``: :frrfmtout:`09:09`, :frrfmtout:`--:--` - this replaces
 336    :c:func:`pim_time_timer_to_mmss()`.
 337
 338 General utility formats
 339 ^^^^^^^^^^^^^^^^^^^^^^^
 340
 341 .. frrfmt:: %m (no argument)
 342
 343    :frrfmtout:`Permission denied`
 344
 345    Prints ``strerror(errno)``.  Does **not** consume any input argument, don't
 346    pass ``errno``!
 347
 348    (This is a GNU extension not specific to FRR.  FRR guarantees it is
 349    available on all systems in printfrr, though BSDs support it in printf too.)
 350
 351 .. frrfmt:: %pSQ (char *)
 352
 353    ([S]tring [Q]uote.)  Like ``%s``, but produce a quoted string.  Options:
 354
 355       ``n`` - treat ``NULL`` as empty string instead.
 356
 357       ``q`` - include ``""`` quotation marks.  Note: ``NULL`` is printed as
 358       ``(null)``, not ``"(null)"`` unless ``n`` is used too.  This is
 359       intentional.
 360
 361       ``s`` - use escaping suitable for RFC5424 syslog.  This means ``]`` is
 362       escaped too.
 363
 364    If a length is specified (``%*pSQ`` or ``%.*pSQ``), null bytes in the input
 365    string do not end the string and are just printed as ``\x00``.
 366
 367 .. frrfmt:: %pSE (char *)
 368
 369    ([S]tring [E]scape.)  Like ``%s``, but escape special characters.
 370    Options:
 371
 372       ``n`` - treat ``NULL`` as empty string instead.
 373
 374    Unlike :frrfmt:`%pSQ`, this escapes many more characters that are fine for
 375    a quoted string but not on their own.
 376
 377    If a length is specified (``%*pSE`` or ``%.*pSE``), null bytes in the input
 378    string do not end the string and are just printed as ``\x00``.
 379
 380 .. frrfmt:: %pVA (struct va_format *)
 381
 382    Recursively invoke printfrr, with arguments passed in through:
 383
 384    .. c:struct:: va_format
 385
 386       .. c:member:: const char *fmt
 387
 388          Format string to use for the recursive printfrr call.
 389
 390       .. c:member:: va_list *va
 391
 392          Formatting arguments.  Note this is passed as a pointer, not - as in
 393          most other places - a direct struct reference.  Internally uses
 394          ``va_copy()`` so repeated calls can be made (e.g. for determining
 395          output length.)
 396
 397 .. frrfmt:: %pFB (struct fbuf *)
 398
 399    Insert text from a ``struct fbuf *``, i.e. the output of a call to
 400    :c:func:`bprintfrr()`.
 401
 402 .. frrfmt:: %*pHX (void *, char *, unsigned char *)
 403
 404    ``%pHX``: :frrfmtout:`12 34 56 78`
 405
 406    ``%pHXc``: :frrfmtout:`12:34:56:78` (separate with [c]olon)
 407
 408    ``%pHXn``: :frrfmtout:`12345678` (separate with [n]othing)
 409
 410    Insert hexdump.  This specifier requires a precision or width to be
 411    specified.  A precision (``%.*pHX``) takes precedence, but generates a
 412    compiler warning since precisions are undefined for ``%p`` in ISO C.  If
 413    no precision is given, the width is used instead (and normal handling of
 414    the width is suppressed).
 415
 416    Note that width and precision are ``int`` arguments, not ``size_t``.  Use
 417    like::
 418
 419      char *buf;
 420      size_t len;
 421
 422      snprintfrr(out, sizeof(out), "... %*pHX ...", (int)len, buf);
 423
 424      /* with padding to width - would generate a warning due to %.*p */
 425      FMT_NSTD(snprintfrr(out, sizeof(out), "... %-47.*pHX ...", (int)len, buf));
 426
 427 .. frrfmt:: %*pHS (void *, char *, unsigned char *)
 428
 429    ``%pHS``: :frrfmtout:`hex.dump`
 430
 431    This is a complementary format for :frrfmt:`%*pHX` to print the text
 432    representation for a hexdump.  Non-printable characters are replaced with
 433    a dot.
 434
 435 Integer formats
 436 ^^^^^^^^^^^^^^^
 437
 438 .. note::
 439
 440    These formats currently only exist for advanced type checking with the
 441    ``frr-format`` GCC plugin.  They should not be used directly since they will
 442    cause compiler warnings when used without the plugin.  Use with
 443    :c:macro:`FMT_NSTD` if necessary.
 444
 445    It is possible ISO C23 may introduce another format for these, possibly
 446    ``%w64d`` discussed in `JTC 1/SC 22/WG 14/N2680 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf>`_.
 447
 448 .. frrfmt:: %Lu (uint64_t)
 449
 450    :frrfmtout:`12345`
 451
 452 .. frrfmt:: %Ld (int64_t)
 453
 454    :frrfmtout:`-12345`
 455
 456 Log levels
 457 ----------
 458
 459 Errors and warnings
 460 ^^^^^^^^^^^^^^^^^^^
 461
 462 If it is something that the user will want to look at and maybe do
 463 something, it is either an **error** or a **warning**.
 464
 465 We're expecting that warnings and errors are in some way visible to the
 466 user (in the worst case by looking at the log after the network broke, but
 467 maybe by a syslog collector from all routers.)  Therefore, anything that
 468 needs to get the user in the loop—and only these things—are warnings or
 469 errors.
 470
 471 Note that this doesn't necessarily mean the user needs to fix something in
 472 the FRR instance.  It also includes when we detect something else needs
 473 fixing, for example another router, the system we're running on, or the
 474 configuration.  The common point is that the user should probably do
 475 *something*.
 476
 477 Deciding between a warning and an error is slightly less obvious; the rule
 478 of thumb here is that an error will cause considerable fallout beyond its
 479 direct effect.  Closing a BGP session due to a malformed update is an error
 480 since all routes from the peer are dropped; discarding one route because
 481 its attributes don't make sense is a warning.
 482
 483 This also loosely corresponds to the kind of reaction we're expecting from
 484 the user.  An error is likely to need immediate response while a warning
 485 might be snoozed for a bit and addressed as part of general maintenance.
 486 If a problem will self-repair (e.g. by retransmits), it should be a
 487 warning—unless the impact until that self-repair is very harsh.
 488
 489 Examples for warnings:
 490
 491 * a BGP update, LSA or LSP could not be processed, but operation is
 492   proceeding and the broken pieces are likely to self-fix later
 493 * some kind of controller cannot be reached, but we can work without it
 494 * another router is using some unknown or unsupported capability
 495
 496 Examples for errors:
 497
 498 * dropping a BGP session due to malformed data
 499 * a socket for routing protocol operation cannot be opened
 500 * desynchronization from network state because something went wrong
 501 * *everything that we as developers would really like to be notified about,
 502   i.e. some assumption in the code isn't holding up*
 503
 504
 505 Informational messages
 506 ^^^^^^^^^^^^^^^^^^^^^^
 507
 508 Anything that provides introspection to the user during normal operation
 509 is an **info** message.
 510
 511 This includes all kinds of operational state transitions and events,
 512 especially if they might be interesting to the user during the course of
 513 figuring out a warning or an error.
 514
 515 By itself, these messages should mostly be statements of fact.  They might
 516 indicate the order and relationship in which things happened.  Also covered
 517 are conditions that might be "operational issues" like a link failure due
 518 to an unplugged cable.  If it's pretty much the point of running a routing
 519 daemon for, it's not a warning or an error, just business as usual.
 520
 521 The user should be able to see the state of these bits from operational
 522 state output, i.e. `show interface` or `show foobar neighbors`.  The log
 523 message indicating the change may have been printed weeks ago, but the
 524 state can always be viewed.  (If some state change has an info message but
 525 no "show" command, maybe that command needs to be added.)
 526
 527 Examples:
 528
 529 * all kinds of up/down state changes
 530
 531   * interface coming up or going down
 532   * addresses being added or deleted
 533   * peers and neighbors coming up or going down
 534
 535 * rejection of some routes due to user-configured route maps
 536 * backwards compatibility handling because another system on the network
 537   has a different or smaller feature set
 538
 539 .. note::
 540    The previously used **notify** priority is replaced with *info* in all
 541    cases.  We don't currently have a well-defined use case for it.
 542
 543
 544 Debug messages and asserts
 545 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 546
 547 Everything that is only interesting on-demand, or only while developing,
 548 is a **debug** message.  It might be interesting to the user for a
 549 particularly evasive issue, but in general these are details that an
 550 average user might not even be able to make sense of.
 551
 552 Most (or all?) debug messages should be behind a `debug foobar` category
 553 switch that controls which subset of these messages is currently
 554 interesting and thus printed.  If a debug message doesn't have such a
 555 guard, there should be a good explanation as to why.
 556
 557 Conversely, debug messages are the only thing that should be guarded by
 558 these switches.  Neither info nor warning or error messages should be
 559 hidden in this way.
 560
 561 **Asserts** should only be used as pretty crashes.  We are expecting that
 562 asserts remain enabled in production builds, but please try to not use
 563 asserts in a way that would cause a security problem if the assert wasn't
 564 there (i.e. don't use them for length checks.)
 565
 566 The purpose of asserts is mainly to help development and bug hunting.  If
 567 the daemon crashes, then having some more information is nice, and the
 568 assert can provide crucial hints that cut down on the time needed to track
 569 an issue.  That said, if the issue can be reasonably handled and/or isn't
 570 going to crash the daemon, it shouldn't be an assert.
 571
 572 For anything else where internal constraints are violated but we're not
 573 breaking due to it, it's an error instead (not a debug.)  These require
 574 "user action" of notifying the developers.
 575
 576 Examples:
 577
 578 * mismatched :code:`prev`/:code:`next` pointers in lists
 579 * some field that is absolutely needed is :code:`NULL`
 580 * any other kind of data structure corruption that will cause the daemon
 581   to crash sooner or later, one way or another
 582
 583 Thread-local buffering
 584 ----------------------
 585
 586 The core logging code in :file:`lib/zlog.c` allows setting up per-thread log
 587 message buffers in order to improve logging performance.  The following rules
 588 apply for this buffering:
 589
 590 * Only messages of priority *DEBUG* or *INFO* are buffered.
 591 * Any higher-priority message causes the thread's entire buffer to be flushed,
 592   thus message ordering is preserved on a per-thread level.
 593 * There is no guarantee on ordering between different threads;  in most cases
 594   this is arbitrary to begin with since the threads essentially race each
 595   other in printing log messages.  If an order is established with some
 596   synchronization primitive, add calls to :c:func:`zlog_tls_buffer_flush()`.
 597 * The buffers are only ever accessed by the thread they are created by.  This
 598   means no locking is necessary.
 599
 600 Both the main/default thread and additional threads created by
 601 :c:func:`frr_pthread_new()` with the default :c:func:`frr_run()` handler will
 602 initialize thread-local buffering and call :c:func:`zlog_tls_buffer_flush()`
 603 when idle.
 604
 605 If some piece of code runs for an extended period, it may be useful to insert
 606 calls to :c:func:`zlog_tls_buffer_flush()` in appropriate places:
 607
 608 .. c:function:: void zlog_tls_buffer_flush(void)
 609
 610    Write out any pending log messages that the calling thread may have in its
 611    buffer.  This function is safe to call regardless of the per-thread log
 612    buffer being set up / in use or not.
 613
 614 When working with threads that do not use the :c:struct:`thread_master`
 615 event loop, per-thread buffers can be managed with:
 616
 617 .. c:function:: void zlog_tls_buffer_init(void)
 618
 619    Set up thread-local buffering for log messages.  This function may be
 620    called repeatedly without adverse effects, but remember to call
 621    :c:func:`zlog_tls_buffer_fini()` at thread exit.
 622
 623    .. warning::
 624
 625       If this function is called, but :c:func:`zlog_tls_buffer_flush()` is
 626       not used, log message output will lag behind since messages will only be
 627       written out when the buffer is full.
 628
 629       Exiting the thread without calling :c:func:`zlog_tls_buffer_fini()`
 630       will cause buffered log messages to be lost.
 631
 632 .. c:function:: void zlog_tls_buffer_fini(void)
 633
 634    Flush pending messages and tear down thread-local log message buffering.
 635    This function may be called repeatedly regardless of whether
 636    :c:func:`zlog_tls_buffer_init()` was ever called.
 637
 638 Log targets
 639 -----------
 640
 641 The actual logging subsystem (in :file:`lib/zlog.c`) is heavily separated
 642 from the actual log writers.  It uses an atomic linked-list (`zlog_targets`)
 643 with RCU to maintain the log targets to be called.  This list is intended to
 644 function as "backend" only, it **is not used for configuration**.
 645
 646 Logging targets provide their configuration layer on top of this and maintain
 647 their own capability to enumerate and store their configuration.  Some targets
 648 (e.g. syslog) are inherently single instance and just stuff their config in
 649 global variables.  Others (e.g. file/fd output) are multi-instance capable.
 650 There is another layer boundary here between these and the VTY configuration
 651 that they use.
 652
 653 Basic internals
 654 ^^^^^^^^^^^^^^^
 655
 656 .. c:struct:: zlog_target
 657
 658    This struct needs to be filled in by any log target and then passed to
 659    :c:func:`zlog_target_replace()`.  After it has been registered,
 660    **RCU semantics apply**.  Most changes to associated data should make a
 661    copy, change that, and then replace the entire struct.
 662
 663    Additional per-target data should be "appended" by embedding this struct
 664    into a larger one, for use with `containerof()`, and
 665    :c:func:`zlog_target_clone()` and :c:func:`zlog_target_free()` should be
 666    used to allocate/free the entire container struct.
 667
 668    Do not use this structure to maintain configuration.  It should only
 669    contain (a copy of) the data needed to perform the actual logging.  For
 670    example, the syslog target uses this:
 671
 672    .. code-block:: c
 673
 674       struct zlt_syslog {
 675           struct zlog_target zt;
 676           int syslog_facility;
 677       };
 678
 679       static void zlog_syslog(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsgs)
 680       {
 681           struct zlt_syslog *zte = container_of(zt, struct zlt_syslog, zt);
 682           size_t i;
 683
 684           for (i = 0; i < nmsgs; i++)
 685               if (zlog_msg_prio(msgs[i]) <= zt->prio_min)
 686                   syslog(zlog_msg_prio(msgs[i]) | zte->syslog_facility, "%s",
 687                          zlog_msg_text(msgs[i], NULL));
 688       }
 689
 690
 691 .. c:function:: struct zlog_target *zlog_target_clone(struct memtype *mt, struct zlog_target *oldzt, size_t size)
 692
 693    Allocates a logging target struct.  Note that the ``oldzt`` argument may be
 694    ``NULL`` to allocate a "from scratch".  If ``oldzt`` is not ``NULL``, the
 695    generic bits in :c:struct:`zlog_target` are copied.  **Target specific
 696    bits are not copied.**
 697
 698 .. c:function:: struct zlog_target *zlog_target_replace(struct zlog_target *oldzt, struct zlog_target *newzt)
 699
 700    Adds, replaces or deletes a logging target (either ``oldzt`` or ``newzt`` may be ``NULL``.)
 701
 702    Returns ``oldzt`` for freeing.  The target remains possibly in use by
 703    other threads until the RCU cycle ends.  This implies you cannot release
 704    resources (e.g. memory, file descriptors) immediately.
 705
 706    The replace operation is not atomic; for a brief period it is possible that
 707    messages are delivered on both ``oldzt`` and ``newzt``.
 708
 709    .. warning::
 710
 711       ``oldzt`` must remain **functional** until the RCU cycle ends.
 712
 713 .. c:function:: void zlog_target_free(struct memtype *mt, struct zlog_target *zt)
 714
 715    Counterpart to :c:func:`zlog_target_clone()`, frees a target (using RCU.)
 716
 717 .. c:member:: void (*zlog_target.logfn)(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsg)
 718
 719    Called on a target to deliver "normal" logging messages.  ``msgs`` is an
 720    array of opaque structs containing the actual message.  Use ``zlog_msg_*``
 721    functions to access message data (this is done to allow some optimizations,
 722    e.g.  lazy formatting the message text and timestamp as needed.)
 723
 724    .. note::
 725
 726       ``logfn()`` must check each individual message's priority value against
 727       the configured ``prio_min``.  While the ``prio_min`` field is common to
 728       all targets and used by the core logging code to early-drop unneeded log
 729       messages, the array is **not** filtered for each ``logfn()`` call.
 730
 731 .. c:member:: void (*zlog_target.logfn_sigsafe)(struct zlog_target *zt, const char *text, size_t len)
 732
 733    Called to deliver "exception" logging messages (i.e. SEGV messages.)
 734    Must be Async-Signal-Safe (may not allocate memory or call "complicated"
 735    libc functions.)  May be ``NULL`` if the log target cannot handle this.
 736
 737 Standard targets
 738 ^^^^^^^^^^^^^^^^
 739
 740 :file:`lib/zlog_targets.c` provides the standard file / fd / syslog targets.
 741 The syslog target is single-instance while file / fd targets can be
 742 instantiated as needed.  There are 3 built-in targets that are fully
 743 autonomous without any config:
 744
 745 - startup logging to `stderr`, until either :c:func:`zlog_startup_end()` or
 746   :c:func:`zlog_aux_init()` is called.
 747 - stdout logging for non-daemon programs using :c:func:`zlog_aux_init()`
 748 - crashlogs written to :file:`/var/tmp/frr.daemon.crashlog`
 749
 750 The regular CLI/command-line logging setup is handled by :file:`lib/log_vty.c`
 751 which makes the appropriate instantiations of syslog / file / fd targets.
 752
 753 .. todo::
 754
 755   :c:func:`zlog_startup_end()` should do an explicit switchover from
 756   startup stderr logging to configured logging.  Currently, configured logging
 757   starts in parallel as soon as the respective setup is executed.  This results
 758   in some duplicate logging.