doc/developer/logging.rst

   1 .. _logging:
   2
   3 .. highlight:: c
   4
   5 Logging
   6 =======
   7
   8 One of the most frequent decisions to make while writing code for FRR is what
   9 to log, what level to log it at, and when to log it.  Here is a list of
  10 recommendations for these decisions.
  11
  12
  13 printfrr()
  14 ----------
  15
  16 ``printfrr()`` is FRR's modified version of ``printf()``, designed to make
  17 life easier when printing nontrivial datastructures.  The following variants
  18 are available:
  19
  20 .. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
  21 .. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  22
  23    These correspond to ``snprintf``/``vsnprintf``.  If you pass NULL for buf
  24    or 0 for len, no output is written but the return value is still calculated.
  25
  26    The return value is always the full length of the output, unconstrained by
  27    `len`.  It does **not** include the terminating ``\0`` character.  A
  28    malformed format string can result in a ``-1`` return value.
  29
  30 .. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
  31 .. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  32
  33    Same as above, but the ``c`` stands for "continue" or "concatenate".  The
  34    output is appended to the string instead of overwriting it.
  35
  36 .. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
  37 .. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
  38
  39    These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
  40    that.  If the format string is malformed, they return a copy of the format
  41    string, so the return value is always non-NULL and always dynamically
  42    allocated with `mt`.
  43
  44 .. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
  45 .. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
  46
  47    This variant tries to use the static buffer provided, but falls back to
  48    dynamic allocation if it is insufficient.
  49
  50    The return value can be either `buf` or a newly allocated string using
  51    `mt`.  You MUST free it like this::
  52
  53       char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
  54       if (ret != buf)
  55          XFREE(MTYPE_FOO, ret);
  56
  57 .. c:function:: ssize_t bprintfrr(struct fbuf *fb, const char *fmt, ...)
  58 .. c:function:: ssize_t vbprintfrr(struct fbuf *fb, const char *fmt, va_list)
  59
  60    These are the "lowest level" functions, which the other variants listed
  61    above use to implement their functionality on top.  Mainly useful for
  62    implementing printfrr extensions since those get a ``struct fbuf *`` to
  63    write their output to.
  64
  65 .. c:macro:: FMT_NSTD(expr)
  66
  67    This macro turns off/on format warnings as needed when non-ISO-C
  68    compatible printfrr extensions are used (e.g. ``%.*p`` or ``%Ld``.)::
  69
  70       vty_out(vty, "standard compatible %pI4\n", &addr);
  71       FMT_NSTD(vty_out(vty, "non-standard %-47.*pHX\n", (int)len, buf));
  72
  73    When the frr-format plugin is in use, this macro is a no-op since the
  74    frr-format plugin supports all printfrr extensions.  Since the FRR CI
  75    includes a system with the plugin enabled, this means format errors will
  76    not slip by undetected even with FMT_NSTD.
  77
  78 .. note::
  79
  80    ``printfrr()`` does not support the ``%n`` format.
  81
  82 AS-Safety
  83 ^^^^^^^^^
  84
  85 ``printfrr()`` are AS-Safe under the following conditions:
  86
  87 * the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
  88 * floating point specifiers are not AS-Safe (system printf is used for these)
  89 * the positional ``%1$d`` syntax should not be used (8 arguments are supported
  90   while AS-Safe)
  91 * extensions are only AS-Safe if their printer is AS-Safe
  92
  93 printfrr Extensions
  94 -------------------
  95
  96 ``printfrr()`` format strings can be extended with suffixes after `%p` or `%d`.
  97 Printf features like field lengths can be used normally with these extensions,
  98 e.g. ``%-15pI4`` works correctly, **except if the extension consumes the
  99 width or precision**.  Extensions that do so are listed below as ``%*pXX``
 100 rather than ``%pXX``.
 101
 102 The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
 103 by means of established pattern uppercase letters and numbers form the type
 104 identifier which may be followed by lowercase flags.
 105
 106 You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
 107 printers and what exactly they do.  More printers are likely to be added as
 108 needed/useful, so the list here may be outdated.
 109
 110 .. note::
 111
 112    The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
 113    internally, so these extensions are available there.  However, they are
 114    **not** available when calling ``snprintf`` directly.  You need to call
 115    ``snprintfrr`` instead.
 116
 117 Networking data types
 118 ^^^^^^^^^^^^^^^^^^^^^
 119
 120 .. role:: frrfmtout(code)
 121
 122 .. frrfmt:: %pI4 (struct in_addr *, in_addr_t *)
 123
 124    :frrfmtout:`1.2.3.4`
 125
 126    ``%pI4s``: :frrfmtout:`*` — print star instead of ``0.0.0.0`` (for multicast)
 127
 128 .. frrfmt:: %pI6 (struct in6_addr *)
 129
 130    :frrfmtout:`fe80::1234`
 131
 132    ``%pI6s``: :frrfmtout:`*` — print star instead of ``::`` (for multicast)
 133
 134 .. frrfmt:: %pEA (struct ethaddr *)
 135
 136    :frrfmtout:`01:23:45:67:89:ab`
 137
 138 .. frrfmt:: %pIA (struct ipaddr *)
 139
 140    :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 141
 142    ``%pIAs``: — print star instead of zero address (for multicast)
 143
 144 .. frrfmt:: %pFX (struct prefix *)
 145
 146    :frrfmtout:`1.2.3.0/24` / :frrfmtout:`fe80::1234/64`
 147
 148    This accepts the following types:
 149
 150    - :c:struct:`prefix`
 151    - :c:struct:`prefix_ipv4`
 152    - :c:struct:`prefix_ipv6`
 153    - :c:struct:`prefix_eth`
 154    - :c:struct:`prefix_evpn`
 155    - :c:struct:`prefix_fs`
 156
 157    It does **not** accept the following types:
 158
 159    - :c:struct:`prefix_ls`
 160    - :c:struct:`prefix_rd`
 161    - :c:struct:`prefix_ptr`
 162    - :c:struct:`prefix_sg` (use :frrfmt:`%pPSG4`)
 163    - :c:union:`prefixptr` (dereference to get :c:struct:`prefix`)
 164    - :c:union:`prefixconstptr` (dereference to get :c:struct:`prefix`)
 165
 166    Options:
 167
 168    ``%pFXh``: (address only) :frrfmtout:`1.2.3.0` / :frrfmtout:`fe80::1234`
 169
 170 .. frrfmt:: %pPSG4 (struct prefix_sg *)
 171
 172    :frrfmtout:`(*,1.2.3.4)`
 173
 174    This is *(S,G)* output for use in pimd.  (Note prefix_sg is not a prefix
 175    "subclass" like the other prefix_* structs.)
 176
 177 .. frrfmt:: %pSU (union sockunion *)
 178
 179    ``%pSU``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 180
 181    ``%pSUs``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234%89`
 182    (adds IPv6 scope ID as integer)
 183
 184    ``%pSUp``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234]:567`
 185    (adds port)
 186
 187    ``%pSUps``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234%89]:567`
 188    (adds port and scope ID)
 189
 190 .. frrfmt:: %pRN (struct route_node *, struct bgp_node *, struct agg_node *)
 191
 192    :frrfmtout:`192.168.1.0/24` (dst-only node)
 193
 194    :frrfmtout:`2001:db8::/32 from fe80::/64` (SADR node)
 195
 196 .. frrfmt:: %pNH (struct nexthop *)
 197
 198    ``%pNHvv``: :frrfmtout:`via 1.2.3.4, eth0` — verbose zebra format
 199
 200    ``%pNHv``: :frrfmtout:`1.2.3.4, via eth0` — slightly less verbose zebra format
 201
 202    ``%pNHs``: :frrfmtout:`1.2.3.4 if 15` — same as :c:func:`nexthop2str()`
 203
 204    ``%pNHcg``: :frrfmtout:`1.2.3.4` — compact gateway only
 205
 206    ``%pNHci``: :frrfmtout:`eth0` — compact interface only
 207
 208 .. frrfmt:: %pBD (struct bgp_dest *)
 209
 210    :frrfmtout:`fe80::1234/64`
 211
 212    (only available in bgpd.)
 213
 214 .. frrfmt:: %dPF (int)
 215
 216    :frrfmtout:`AF_INET`
 217
 218    Prints an `AF_*` / `PF_*` constant.  ``PF`` is used here to avoid confusion
 219    with `AFI` constants, even though the FRR codebase prefers `AF_INET` over
 220    `PF_INET` & co.
 221
 222 .. frrfmt:: %dSO (int)
 223
 224    :frrfmtout:`SOCK_STREAM`
 225
 226 Time/interval formats
 227 ^^^^^^^^^^^^^^^^^^^^^
 228
 229 .. frrfmt:: %pTS (struct timespec *)
 230
 231 .. frrfmt:: %pTV (struct timeval *)
 232
 233 .. frrfmt:: %pTT (time_t *)
 234
 235    Above 3 options internally result in the same code being called, support
 236    the same flags and produce equal output with one exception:  ``%pTT``
 237    has no sub-second precision and the formatter will never print a
 238    (nonsensical) ``.000``.
 239
 240    Exactly one of ``I``, ``M`` or ``R`` must immediately follow after
 241    ``TS``/``TV``/``TT`` to specify whether the input is an interval, monotonic
 242    timestamp or realtime timestamp:
 243
 244    ``%pTVI``: input is an interval, not a timestamp.  Print interval.
 245
 246    ``%pTVIs``: input is an interval, convert to wallclock by subtracting it
 247    from current time (i.e. interval has passed **s**\ ince.)
 248
 249    ``%pTVIu``: input is an interval, convert to wallclock by adding it to
 250    current time (i.e. **u**\ ntil interval has passed.)
 251
 252    ``%pTVM`` - input is a timestamp on CLOCK_MONOTONIC, convert to wallclock
 253    time (by grabbing current CLOCK_MONOTONIC and CLOCK_REALTIME and doing the
 254    math) and print calendaric date.
 255
 256    ``%pTVMs`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 257    **s**\ ince that timestamp (elapsed.)
 258
 259    ``%pTVMu`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 260    **u**\ ntil that timestamp (deadline.)
 261
 262    ``%pTVR`` - input is a timestamp on CLOCK_REALTIME, print calendaric date.
 263
 264    ``%pTVRs`` - input is a timestamp on CLOCK_REALTIME, print interval
 265    **s**\ ince that timestamp.
 266
 267    ``%pTVRu`` - input is a timestamp on CLOCK_REALTIME, print interval
 268    **u**\ ntil that timestamp.
 269
 270    ``%pTVA`` - reserved for CLOCK_TAI in case a PTP implementation is
 271    interfaced to FRR.  Not currently implemented.
 272
 273    .. note::
 274
 275       If ``%pTVRs`` or ``%pTVRu`` are used, this is generally an indication
 276       that a CLOCK_MONOTONIC timestamp should be used instead (or added in
 277       parallel.) CLOCK_REALTIME might be adjusted by NTP, PTP or similar
 278       procedures, causing bogus intervals to be printed.
 279
 280       ``%pTVM`` on first look might be assumed to have the same problem, but
 281       on closer thought the assumption is always that current system time is
 282       correct.  And since a CLOCK_MONOTONIC interval is also quite safe to
 283       assume to be correct, the (past) absolute timestamp to be printed from
 284       this can likely be correct even if it doesn't match what CLOCK_REALTIME
 285       would have indicated at that point in the past.  This logic does,
 286       however, not quite work for *future* times.
 287
 288       Generally speaking, almost all use cases in FRR should (and do) use
 289       CLOCK_MONOTONIC (through :c:func:`monotime()`.)
 290
 291    Flags common to printing calendar times and intervals:
 292
 293    ``p``: include spaces in appropriate places (depends on selected format.)
 294
 295    ``%p.3TV...``: specify sub-second resolution (use with ``FMT_NSTD`` to
 296    suppress gcc warning.)  As noted above, ``%pTT`` will never print sub-second
 297    digits since there are none.  Only some formats support printing sub-second
 298    digits and the default may vary.
 299
 300    The following flags are available for printing calendar times/dates:
 301
 302    (no flag): :frrfmtout:`Sat Jan  1 00:00:00 2022` - print output from
 303    ``ctime()``, in local time zone.  Since FRR does not currently use/enable
 304    locale support, this is always the C locale.  (Locale support getting added
 305    is unlikely for the time being and would likely break other things worse
 306    than this.)
 307
 308    ``i``: :frrfmtout:`2022-01-01T00:00:00.123` - ISO8601 timestamp in local
 309    time zone (note there is no ``Z`` or ``+00:00`` suffix.)  Defaults to
 310    millisecond precision.
 311
 312    ``ip``: :frrfmtout:`2022-01-01 00:00:00.123` - use readable form of ISO8601
 313    with space instead of ``T`` separator.
 314
 315    The following flags are available for printing intervals:
 316
 317    (no flag): :frrfmtout:`9w9d09:09:09.123` - does not match any
 318    preexisting format;  added because it does not lose precision (like ``t``)
 319    for longer intervals without printing huge numbers (like ``h``/``m``).
 320    Defaults to millisecond precision.  The week/day fields are left off if
 321    they're zero, ``p`` adds a space after the respective letter.
 322
 323    ``t``: :frrfmtout:`9w9d09h`, :frrfmtout:`9d09h09m`, :frrfmtout:`09:09:09` -
 324    this replaces :c:func:`frrtime_to_interval()`.  ``p`` adds spaces after
 325    week/day/hour letters.
 326
 327    ``d``: print decimal number of seconds.  Defaults to millisecond precision.
 328
 329    ``x`` / ``tx`` / ``dx``: Like no flag / ``t`` / ``d``, but print
 330    :frrfmtout:`-` for zero or negative intervals (for use with unset timers.)
 331
 332    ``h``: :frrfmtout:`09:09:09`
 333
 334    ``hx``: :frrfmtout:`09:09:09`, :frrfmtout:`--:--:--` - this replaces
 335    :c:func:`pim_time_timer_to_hhmmss()`.
 336
 337    ``m``: :frrfmtout:`09:09`
 338
 339    ``mx``: :frrfmtout:`09:09`, :frrfmtout:`--:--` - this replaces
 340    :c:func:`pim_time_timer_to_mmss()`.
 341
 342 FRR library helper formats
 343 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 344
 345 .. frrfmt:: %pTH (struct thread *)
 346
 347    Print remaining time on timer thread. Interval-printing flag characters
 348    listed above for ``%pTV`` can be added, e.g. ``%pTHtx``.
 349
 350    ``NULL`` pointers are printed as ``-``.
 351
 352 .. frrfmt:: %pTHD (struct thread *)
 353
 354    Print debugging information for given thread.  Sample output:
 355
 356    .. code-block:: none
 357
 358       {(thread *)NULL}
 359       {(thread *)0x55a3b5818910 arg=0x55a3b5827c50 timer  r=7.824      mld_t_query() &mld_ifp->t_query from pimd/pim6_mld.c:1369}
 360       {(thread *)0x55a3b5827230 arg=0x55a3b5827c50 read   fd=16        mld_t_recv() &mld_ifp->t_recv from pimd/pim6_mld.c:1186}
 361
 362    (The output is aligned to some degree.)
 363
 364 General utility formats
 365 ^^^^^^^^^^^^^^^^^^^^^^^
 366
 367 .. frrfmt:: %m (no argument)
 368
 369    :frrfmtout:`Permission denied`
 370
 371    Prints ``strerror(errno)``.  Does **not** consume any input argument, don't
 372    pass ``errno``!
 373
 374    (This is a GNU extension not specific to FRR.  FRR guarantees it is
 375    available on all systems in printfrr, though BSDs support it in printf too.)
 376
 377 .. frrfmt:: %pSQ (char *)
 378
 379    ([S]tring [Q]uote.)  Like ``%s``, but produce a quoted string.  Options:
 380
 381       ``n`` - treat ``NULL`` as empty string instead.
 382
 383       ``q`` - include ``""`` quotation marks.  Note: ``NULL`` is printed as
 384       ``(null)``, not ``"(null)"`` unless ``n`` is used too.  This is
 385       intentional.
 386
 387       ``s`` - use escaping suitable for RFC5424 syslog.  This means ``]`` is
 388       escaped too.
 389
 390    If a length is specified (``%*pSQ`` or ``%.*pSQ``), null bytes in the input
 391    string do not end the string and are just printed as ``\x00``.
 392
 393 .. frrfmt:: %pSE (char *)
 394
 395    ([S]tring [E]scape.)  Like ``%s``, but escape special characters.
 396    Options:
 397
 398       ``n`` - treat ``NULL`` as empty string instead.
 399
 400    Unlike :frrfmt:`%pSQ`, this escapes many more characters that are fine for
 401    a quoted string but not on their own.
 402
 403    If a length is specified (``%*pSE`` or ``%.*pSE``), null bytes in the input
 404    string do not end the string and are just printed as ``\x00``.
 405
 406 .. frrfmt:: %pVA (struct va_format *)
 407
 408    Recursively invoke printfrr, with arguments passed in through:
 409
 410    .. c:struct:: va_format
 411
 412       .. c:member:: const char *fmt
 413
 414          Format string to use for the recursive printfrr call.
 415
 416       .. c:member:: va_list *va
 417
 418          Formatting arguments.  Note this is passed as a pointer, not - as in
 419          most other places - a direct struct reference.  Internally uses
 420          ``va_copy()`` so repeated calls can be made (e.g. for determining
 421          output length.)
 422
 423 .. frrfmt:: %pFB (struct fbuf *)
 424
 425    Insert text from a ``struct fbuf *``, i.e. the output of a call to
 426    :c:func:`bprintfrr()`.
 427
 428 .. frrfmt:: %*pHX (void *, char *, unsigned char *)
 429
 430    ``%pHX``: :frrfmtout:`12 34 56 78`
 431
 432    ``%pHXc``: :frrfmtout:`12:34:56:78` (separate with [c]olon)
 433
 434    ``%pHXn``: :frrfmtout:`12345678` (separate with [n]othing)
 435
 436    Insert hexdump.  This specifier requires a precision or width to be
 437    specified.  A precision (``%.*pHX``) takes precedence, but generates a
 438    compiler warning since precisions are undefined for ``%p`` in ISO C.  If
 439    no precision is given, the width is used instead (and normal handling of
 440    the width is suppressed).
 441
 442    Note that width and precision are ``int`` arguments, not ``size_t``.  Use
 443    like::
 444
 445      char *buf;
 446      size_t len;
 447
 448      snprintfrr(out, sizeof(out), "... %*pHX ...", (int)len, buf);
 449
 450      /* with padding to width - would generate a warning due to %.*p */
 451      FMT_NSTD(snprintfrr(out, sizeof(out), "... %-47.*pHX ...", (int)len, buf));
 452
 453 .. frrfmt:: %*pHS (void *, char *, unsigned char *)
 454
 455    ``%pHS``: :frrfmtout:`hex.dump`
 456
 457    This is a complementary format for :frrfmt:`%*pHX` to print the text
 458    representation for a hexdump.  Non-printable characters are replaced with
 459    a dot.
 460
 461 Integer formats
 462 ^^^^^^^^^^^^^^^
 463
 464 .. note::
 465
 466    These formats currently only exist for advanced type checking with the
 467    ``frr-format`` GCC plugin.  They should not be used directly since they will
 468    cause compiler warnings when used without the plugin.  Use with
 469    :c:macro:`FMT_NSTD` if necessary.
 470
 471    It is possible ISO C23 may introduce another format for these, possibly
 472    ``%w64d`` discussed in `JTC 1/SC 22/WG 14/N2680 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf>`_.
 473
 474 .. frrfmt:: %Lu (uint64_t)
 475
 476    :frrfmtout:`12345`
 477
 478 .. frrfmt:: %Ld (int64_t)
 479
 480    :frrfmtout:`-12345`
 481
 482 Log levels
 483 ----------
 484
 485 Errors and warnings
 486 ^^^^^^^^^^^^^^^^^^^
 487
 488 If it is something that the user will want to look at and maybe do
 489 something, it is either an **error** or a **warning**.
 490
 491 We're expecting that warnings and errors are in some way visible to the
 492 user (in the worst case by looking at the log after the network broke, but
 493 maybe by a syslog collector from all routers.)  Therefore, anything that
 494 needs to get the user in the loop—and only these things—are warnings or
 495 errors.
 496
 497 Note that this doesn't necessarily mean the user needs to fix something in
 498 the FRR instance.  It also includes when we detect something else needs
 499 fixing, for example another router, the system we're running on, or the
 500 configuration.  The common point is that the user should probably do
 501 *something*.
 502
 503 Deciding between a warning and an error is slightly less obvious; the rule
 504 of thumb here is that an error will cause considerable fallout beyond its
 505 direct effect.  Closing a BGP session due to a malformed update is an error
 506 since all routes from the peer are dropped; discarding one route because
 507 its attributes don't make sense is a warning.
 508
 509 This also loosely corresponds to the kind of reaction we're expecting from
 510 the user.  An error is likely to need immediate response while a warning
 511 might be snoozed for a bit and addressed as part of general maintenance.
 512 If a problem will self-repair (e.g. by retransmits), it should be a
 513 warning—unless the impact until that self-repair is very harsh.
 514
 515 Examples for warnings:
 516
 517 * a BGP update, LSA or LSP could not be processed, but operation is
 518   proceeding and the broken pieces are likely to self-fix later
 519 * some kind of controller cannot be reached, but we can work without it
 520 * another router is using some unknown or unsupported capability
 521
 522 Examples for errors:
 523
 524 * dropping a BGP session due to malformed data
 525 * a socket for routing protocol operation cannot be opened
 526 * desynchronization from network state because something went wrong
 527 * *everything that we as developers would really like to be notified about,
 528   i.e. some assumption in the code isn't holding up*
 529
 530
 531 Informational messages
 532 ^^^^^^^^^^^^^^^^^^^^^^
 533
 534 Anything that provides introspection to the user during normal operation
 535 is an **info** message.
 536
 537 This includes all kinds of operational state transitions and events,
 538 especially if they might be interesting to the user during the course of
 539 figuring out a warning or an error.
 540
 541 By itself, these messages should mostly be statements of fact.  They might
 542 indicate the order and relationship in which things happened.  Also covered
 543 are conditions that might be "operational issues" like a link failure due
 544 to an unplugged cable.  If it's pretty much the point of running a routing
 545 daemon for, it's not a warning or an error, just business as usual.
 546
 547 The user should be able to see the state of these bits from operational
 548 state output, i.e. `show interface` or `show foobar neighbors`.  The log
 549 message indicating the change may have been printed weeks ago, but the
 550 state can always be viewed.  (If some state change has an info message but
 551 no "show" command, maybe that command needs to be added.)
 552
 553 Examples:
 554
 555 * all kinds of up/down state changes
 556
 557   * interface coming up or going down
 558   * addresses being added or deleted
 559   * peers and neighbors coming up or going down
 560
 561 * rejection of some routes due to user-configured route maps
 562 * backwards compatibility handling because another system on the network
 563   has a different or smaller feature set
 564
 565 .. note::
 566    The previously used **notify** priority is replaced with *info* in all
 567    cases.  We don't currently have a well-defined use case for it.
 568
 569
 570 Debug messages and asserts
 571 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 572
 573 Everything that is only interesting on-demand, or only while developing,
 574 is a **debug** message.  It might be interesting to the user for a
 575 particularly evasive issue, but in general these are details that an
 576 average user might not even be able to make sense of.
 577
 578 Most (or all?) debug messages should be behind a `debug foobar` category
 579 switch that controls which subset of these messages is currently
 580 interesting and thus printed.  If a debug message doesn't have such a
 581 guard, there should be a good explanation as to why.
 582
 583 Conversely, debug messages are the only thing that should be guarded by
 584 these switches.  Neither info nor warning or error messages should be
 585 hidden in this way.
 586
 587 **Asserts** should only be used as pretty crashes.  We are expecting that
 588 asserts remain enabled in production builds, but please try to not use
 589 asserts in a way that would cause a security problem if the assert wasn't
 590 there (i.e. don't use them for length checks.)
 591
 592 The purpose of asserts is mainly to help development and bug hunting.  If
 593 the daemon crashes, then having some more information is nice, and the
 594 assert can provide crucial hints that cut down on the time needed to track
 595 an issue.  That said, if the issue can be reasonably handled and/or isn't
 596 going to crash the daemon, it shouldn't be an assert.
 597
 598 For anything else where internal constraints are violated but we're not
 599 breaking due to it, it's an error instead (not a debug.)  These require
 600 "user action" of notifying the developers.
 601
 602 Examples:
 603
 604 * mismatched :code:`prev`/:code:`next` pointers in lists
 605 * some field that is absolutely needed is :code:`NULL`
 606 * any other kind of data structure corruption that will cause the daemon
 607   to crash sooner or later, one way or another
 608
 609 Thread-local buffering
 610 ----------------------
 611
 612 The core logging code in :file:`lib/zlog.c` allows setting up per-thread log
 613 message buffers in order to improve logging performance.  The following rules
 614 apply for this buffering:
 615
 616 * Only messages of priority *DEBUG* or *INFO* are buffered.
 617 * Any higher-priority message causes the thread's entire buffer to be flushed,
 618   thus message ordering is preserved on a per-thread level.
 619 * There is no guarantee on ordering between different threads;  in most cases
 620   this is arbitrary to begin with since the threads essentially race each
 621   other in printing log messages.  If an order is established with some
 622   synchronization primitive, add calls to :c:func:`zlog_tls_buffer_flush()`.
 623 * The buffers are only ever accessed by the thread they are created by.  This
 624   means no locking is necessary.
 625
 626 Both the main/default thread and additional threads created by
 627 :c:func:`frr_pthread_new()` with the default :c:func:`frr_run()` handler will
 628 initialize thread-local buffering and call :c:func:`zlog_tls_buffer_flush()`
 629 when idle.
 630
 631 If some piece of code runs for an extended period, it may be useful to insert
 632 calls to :c:func:`zlog_tls_buffer_flush()` in appropriate places:
 633
 634 .. c:function:: void zlog_tls_buffer_flush(void)
 635
 636    Write out any pending log messages that the calling thread may have in its
 637    buffer.  This function is safe to call regardless of the per-thread log
 638    buffer being set up / in use or not.
 639
 640 When working with threads that do not use the :c:struct:`thread_master`
 641 event loop, per-thread buffers can be managed with:
 642
 643 .. c:function:: void zlog_tls_buffer_init(void)
 644
 645    Set up thread-local buffering for log messages.  This function may be
 646    called repeatedly without adverse effects, but remember to call
 647    :c:func:`zlog_tls_buffer_fini()` at thread exit.
 648
 649    .. warning::
 650
 651       If this function is called, but :c:func:`zlog_tls_buffer_flush()` is
 652       not used, log message output will lag behind since messages will only be
 653       written out when the buffer is full.
 654
 655       Exiting the thread without calling :c:func:`zlog_tls_buffer_fini()`
 656       will cause buffered log messages to be lost.
 657
 658 .. c:function:: void zlog_tls_buffer_fini(void)
 659
 660    Flush pending messages and tear down thread-local log message buffering.
 661    This function may be called repeatedly regardless of whether
 662    :c:func:`zlog_tls_buffer_init()` was ever called.
 663
 664 Log targets
 665 -----------
 666
 667 The actual logging subsystem (in :file:`lib/zlog.c`) is heavily separated
 668 from the actual log writers.  It uses an atomic linked-list (`zlog_targets`)
 669 with RCU to maintain the log targets to be called.  This list is intended to
 670 function as "backend" only, it **is not used for configuration**.
 671
 672 Logging targets provide their configuration layer on top of this and maintain
 673 their own capability to enumerate and store their configuration.  Some targets
 674 (e.g. syslog) are inherently single instance and just stuff their config in
 675 global variables.  Others (e.g. file/fd output) are multi-instance capable.
 676 There is another layer boundary here between these and the VTY configuration
 677 that they use.
 678
 679 Basic internals
 680 ^^^^^^^^^^^^^^^
 681
 682 .. c:struct:: zlog_target
 683
 684    This struct needs to be filled in by any log target and then passed to
 685    :c:func:`zlog_target_replace()`.  After it has been registered,
 686    **RCU semantics apply**.  Most changes to associated data should make a
 687    copy, change that, and then replace the entire struct.
 688
 689    Additional per-target data should be "appended" by embedding this struct
 690    into a larger one, for use with `containerof()`, and
 691    :c:func:`zlog_target_clone()` and :c:func:`zlog_target_free()` should be
 692    used to allocate/free the entire container struct.
 693
 694    Do not use this structure to maintain configuration.  It should only
 695    contain (a copy of) the data needed to perform the actual logging.  For
 696    example, the syslog target uses this:
 697
 698    .. code-block:: c
 699
 700       struct zlt_syslog {
 701           struct zlog_target zt;
 702           int syslog_facility;
 703       };
 704
 705       static void zlog_syslog(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsgs)
 706       {
 707           struct zlt_syslog *zte = container_of(zt, struct zlt_syslog, zt);
 708           size_t i;
 709
 710           for (i = 0; i < nmsgs; i++)
 711               if (zlog_msg_prio(msgs[i]) <= zt->prio_min)
 712                   syslog(zlog_msg_prio(msgs[i]) | zte->syslog_facility, "%s",
 713                          zlog_msg_text(msgs[i], NULL));
 714       }
 715
 716
 717 .. c:function:: struct zlog_target *zlog_target_clone(struct memtype *mt, struct zlog_target *oldzt, size_t size)
 718
 719    Allocates a logging target struct.  Note that the ``oldzt`` argument may be
 720    ``NULL`` to allocate a "from scratch".  If ``oldzt`` is not ``NULL``, the
 721    generic bits in :c:struct:`zlog_target` are copied.  **Target specific
 722    bits are not copied.**
 723
 724 .. c:function:: struct zlog_target *zlog_target_replace(struct zlog_target *oldzt, struct zlog_target *newzt)
 725
 726    Adds, replaces or deletes a logging target (either ``oldzt`` or ``newzt`` may be ``NULL``.)
 727
 728    Returns ``oldzt`` for freeing.  The target remains possibly in use by
 729    other threads until the RCU cycle ends.  This implies you cannot release
 730    resources (e.g. memory, file descriptors) immediately.
 731
 732    The replace operation is not atomic; for a brief period it is possible that
 733    messages are delivered on both ``oldzt`` and ``newzt``.
 734
 735    .. warning::
 736
 737       ``oldzt`` must remain **functional** until the RCU cycle ends.
 738
 739 .. c:function:: void zlog_target_free(struct memtype *mt, struct zlog_target *zt)
 740
 741    Counterpart to :c:func:`zlog_target_clone()`, frees a target (using RCU.)
 742
 743 .. c:member:: void (*zlog_target.logfn)(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsg)
 744
 745    Called on a target to deliver "normal" logging messages.  ``msgs`` is an
 746    array of opaque structs containing the actual message.  Use ``zlog_msg_*``
 747    functions to access message data (this is done to allow some optimizations,
 748    e.g.  lazy formatting the message text and timestamp as needed.)
 749
 750    .. note::
 751
 752       ``logfn()`` must check each individual message's priority value against
 753       the configured ``prio_min``.  While the ``prio_min`` field is common to
 754       all targets and used by the core logging code to early-drop unneeded log
 755       messages, the array is **not** filtered for each ``logfn()`` call.
 756
 757 .. c:member:: void (*zlog_target.logfn_sigsafe)(struct zlog_target *zt, const char *text, size_t len)
 758
 759    Called to deliver "exception" logging messages (i.e. SEGV messages.)
 760    Must be Async-Signal-Safe (may not allocate memory or call "complicated"
 761    libc functions.)  May be ``NULL`` if the log target cannot handle this.
 762
 763 Standard targets
 764 ^^^^^^^^^^^^^^^^
 765
 766 :file:`lib/zlog_targets.c` provides the standard file / fd / syslog targets.
 767 The syslog target is single-instance while file / fd targets can be
 768 instantiated as needed.  There are 3 built-in targets that are fully
 769 autonomous without any config:
 770
 771 - startup logging to `stderr`, until either :c:func:`zlog_startup_end()` or
 772   :c:func:`zlog_aux_init()` is called.
 773 - stdout logging for non-daemon programs using :c:func:`zlog_aux_init()`
 774 - crashlogs written to :file:`/var/tmp/frr.daemon.crashlog`
 775
 776 The regular CLI/command-line logging setup is handled by :file:`lib/log_vty.c`
 777 which makes the appropriate instantiations of syslog / file / fd targets.
 778
 779 .. todo::
 780
 781   :c:func:`zlog_startup_end()` should do an explicit switchover from
 782   startup stderr logging to configured logging.  Currently, configured logging
 783   starts in parallel as soon as the respective setup is executed.  This results
 784   in some duplicate logging.