doc/developer/logging.rst

   1 .. _logging:
   2
   3 .. highlight:: c
   4
   5 Logging
   6 =======
   7
   8 One of the most frequent decisions to make while writing code for FRR is what
   9 to log, what level to log it at, and when to log it.  Here is a list of
  10 recommendations for these decisions.
  11
  12
  13 printfrr()
  14 ----------
  15
  16 ``printfrr()`` is FRR's modified version of ``printf()``, designed to make
  17 life easier when printing nontrivial datastructures.  The following variants
  18 are available:
  19
  20 .. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
  21 .. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  22
  23    These correspond to ``snprintf``/``vsnprintf``.  If you pass NULL for buf
  24    or 0 for len, no output is written but the return value is still calculated.
  25
  26    The return value is always the full length of the output, unconstrained by
  27    `len`.  It does **not** include the terminating ``\0`` character.  A
  28    malformed format string can result in a ``-1`` return value.
  29
  30 .. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
  31 .. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  32
  33    Same as above, but the ``c`` stands for "continue" or "concatenate".  The
  34    output is appended to the string instead of overwriting it.
  35
  36 .. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
  37 .. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
  38
  39    These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
  40    that.  If the format string is malformed, they return a copy of the format
  41    string, so the return value is always non-NULL and always dynamically
  42    allocated with `mt`.
  43
  44 .. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
  45 .. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
  46
  47    This variant tries to use the static buffer provided, but falls back to
  48    dynamic allocation if it is insufficient.
  49
  50    The return value can be either `buf` or a newly allocated string using
  51    `mt`.  You MUST free it like this::
  52
  53       char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
  54       if (ret != buf)
  55          XFREE(MTYPE_FOO, ret);
  56
  57 .. c:function:: ssize_t bprintfrr(struct fbuf *fb, const char *fmt, ...)
  58 .. c:function:: ssize_t vbprintfrr(struct fbuf *fb, const char *fmt, va_list)
  59
  60    These are the "lowest level" functions, which the other variants listed
  61    above use to implement their functionality on top.  Mainly useful for
  62    implementing printfrr extensions since those get a ``struct fbuf *`` to
  63    write their output to.
  64
  65 .. c:macro:: FMT_NSTD(expr)
  66
  67    This macro turns off/on format warnings as needed when non-ISO-C
  68    compatible printfrr extensions are used (e.g. ``%.*p`` or ``%Ld``.)::
  69
  70       vty_out(vty, "standard compatible %pI4\n", &addr);
  71       FMT_NSTD(vty_out(vty, "non-standard %-47.*pHX\n", (int)len, buf));
  72
  73    When the frr-format plugin is in use, this macro is a no-op since the
  74    frr-format plugin supports all printfrr extensions.  Since the FRR CI
  75    includes a system with the plugin enabled, this means format errors will
  76    not slip by undetected even with FMT_NSTD.
  77
  78 .. note::
  79
  80    ``printfrr()`` does not support the ``%n`` format.
  81
  82 AS-Safety
  83 ^^^^^^^^^
  84
  85 ``printfrr()`` are AS-Safe under the following conditions:
  86
  87 * the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
  88 * floating point specifiers are not AS-Safe (system printf is used for these)
  89 * the positional ``%1$d`` syntax should not be used (8 arguments are supported
  90   while AS-Safe)
  91 * extensions are only AS-Safe if their printer is AS-Safe
  92
  93 printfrr Extensions
  94 -------------------
  95
  96 ``printfrr()`` format strings can be extended with suffixes after `%p` or `%d`.
  97 Printf features like field lengths can be used normally with these extensions,
  98 e.g. ``%-15pI4`` works correctly, **except if the extension consumes the
  99 width or precision**.  Extensions that do so are listed below as ``%*pXX``
 100 rather than ``%pXX``.
 101
 102 The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
 103 by means of established pattern uppercase letters and numbers form the type
 104 identifier which may be followed by lowercase flags.
 105
 106 You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
 107 printers and what exactly they do.  More printers are likely to be added as
 108 needed/useful, so the list here may be outdated.
 109
 110 .. note::
 111
 112    The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
 113    internally, so these extensions are available there.  However, they are
 114    **not** available when calling ``snprintf`` directly.  You need to call
 115    ``snprintfrr`` instead.
 116
 117 Networking data types
 118 ^^^^^^^^^^^^^^^^^^^^^
 119
 120 .. role:: frrfmtout(code)
 121
 122 .. frrfmt:: %pI4 (struct in_addr *, in_addr_t *)
 123
 124    :frrfmtout:`1.2.3.4`
 125
 126    ``%pI4s``: :frrfmtout:`*` — print star instead of ``0.0.0.0`` (for multicast)
 127
 128 .. frrfmt:: %pI6 (struct in6_addr *)
 129
 130    :frrfmtout:`fe80::1234`
 131
 132    ``%pI6s``: :frrfmtout:`*` — print star instead of ``::`` (for multicast)
 133
 134 .. frrfmt:: %pEA (struct ethaddr *)
 135
 136    :frrfmtout:`01:23:45:67:89:ab`
 137
 138 .. frrfmt:: %pIA (struct ipaddr *)
 139
 140    :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 141
 142    ``%pIAs``: — print star instead of zero address (for multicast)
 143
 144 .. frrfmt:: %pFX (struct prefix *)
 145
 146    :frrfmtout:`1.2.3.0/24` / :frrfmtout:`fe80::1234/64`
 147
 148    This accepts the following types:
 149
 150    - :c:struct:`prefix`
 151    - :c:struct:`prefix_ipv4`
 152    - :c:struct:`prefix_ipv6`
 153    - :c:struct:`prefix_eth`
 154    - :c:struct:`prefix_evpn`
 155    - :c:struct:`prefix_fs`
 156
 157    It does **not** accept the following types:
 158
 159    - :c:struct:`prefix_ls`
 160    - :c:struct:`prefix_rd`
 161    - :c:struct:`prefix_ptr`
 162    - :c:struct:`prefix_sg` (use :frrfmt:`%pPSG4`)
 163    - :c:union:`prefixptr` (dereference to get :c:struct:`prefix`)
 164    - :c:union:`prefixconstptr` (dereference to get :c:struct:`prefix`)
 165
 166 .. frrfmt:: %pPSG4 (struct prefix_sg *)
 167
 168    :frrfmtout:`(*,1.2.3.4)`
 169
 170    This is *(S,G)* output for use in pimd.  (Note prefix_sg is not a prefix
 171    "subclass" like the other prefix_* structs.)
 172
 173 .. frrfmt:: %pSU (union sockunion *)
 174
 175    ``%pSU``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 176
 177    ``%pSUs``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234%89`
 178    (adds IPv6 scope ID as integer)
 179
 180    ``%pSUp``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234]:567`
 181    (adds port)
 182
 183    ``%pSUps``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234%89]:567`
 184    (adds port and scope ID)
 185
 186 .. frrfmt:: %pRN (struct route_node *, struct bgp_node *, struct agg_node *)
 187
 188    :frrfmtout:`192.168.1.0/24` (dst-only node)
 189
 190    :frrfmtout:`2001:db8::/32 from fe80::/64` (SADR node)
 191
 192 .. frrfmt:: %pNH (struct nexthop *)
 193
 194    ``%pNHvv``: :frrfmtout:`via 1.2.3.4, eth0` — verbose zebra format
 195
 196    ``%pNHv``: :frrfmtout:`1.2.3.4, via eth0` — slightly less verbose zebra format
 197
 198    ``%pNHs``: :frrfmtout:`1.2.3.4 if 15` — same as :c:func:`nexthop2str()`
 199
 200    ``%pNHcg``: :frrfmtout:`1.2.3.4` — compact gateway only
 201
 202    ``%pNHci``: :frrfmtout:`eth0` — compact interface only
 203
 204 .. frrfmt:: %pBD (struct bgp_dest *)
 205
 206    :frrfmtout:`fe80::1234/64`
 207
 208    (only available in bgpd.)
 209
 210 .. frrfmt:: %dPF (int)
 211
 212    :frrfmtout:`AF_INET`
 213
 214    Prints an `AF_*` / `PF_*` constant.  ``PF`` is used here to avoid confusion
 215    with `AFI` constants, even though the FRR codebase prefers `AF_INET` over
 216    `PF_INET` & co.
 217
 218 .. frrfmt:: %dSO (int)
 219
 220    :frrfmtout:`SOCK_STREAM`
 221
 222 Time/interval formats
 223 ^^^^^^^^^^^^^^^^^^^^^
 224
 225 .. frrfmt:: %pTS (struct timespec *)
 226
 227 .. frrfmt:: %pTV (struct timeval *)
 228
 229 .. frrfmt:: %pTT (time_t *)
 230
 231    Above 3 options internally result in the same code being called, support
 232    the same flags and produce equal output with one exception:  ``%pTT``
 233    has no sub-second precision and the formatter will never print a
 234    (nonsensical) ``.000``.
 235
 236    Exactly one of ``I``, ``M`` or ``R`` must immediately follow after
 237    ``TS``/``TV``/``TT`` to specify whether the input is an interval, monotonic
 238    timestamp or realtime timestamp:
 239
 240    ``%pTVI``: input is an interval, not a timestamp.  Print interval.
 241
 242    ``%pTVIs``: input is an interval, convert to wallclock by subtracting it
 243    from current time (i.e. interval has passed **s**\ ince.)
 244
 245    ``%pTVIu``: input is an interval, convert to wallclock by adding it to
 246    current time (i.e. **u**\ ntil interval has passed.)
 247
 248    ``%pTVM`` - input is a timestamp on CLOCK_MONOTONIC, convert to wallclock
 249    time (by grabbing current CLOCK_MONOTONIC and CLOCK_REALTIME and doing the
 250    math) and print calendaric date.
 251
 252    ``%pTVMs`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 253    **s**\ ince that timestamp (elapsed.)
 254
 255    ``%pTVMu`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 256    **u**\ ntil that timestamp (deadline.)
 257
 258    ``%pTVR`` - input is a timestamp on CLOCK_REALTIME, print calendaric date.
 259
 260    ``%pTVRs`` - input is a timestamp on CLOCK_REALTIME, print interval
 261    **s**\ ince that timestamp.
 262
 263    ``%pTVRu`` - input is a timestamp on CLOCK_REALTIME, print interval
 264    **u**\ ntil that timestamp.
 265
 266    ``%pTVA`` - reserved for CLOCK_TAI in case a PTP implementation is
 267    interfaced to FRR.  Not currently implemented.
 268
 269    .. note::
 270
 271       If ``%pTVRs`` or ``%pTVRu`` are used, this is generally an indication
 272       that a CLOCK_MONOTONIC timestamp should be used instead (or added in
 273       parallel.) CLOCK_REALTIME might be adjusted by NTP, PTP or similar
 274       procedures, causing bogus intervals to be printed.
 275
 276       ``%pTVM`` on first look might be assumed to have the same problem, but
 277       on closer thought the assumption is always that current system time is
 278       correct.  And since a CLOCK_MONOTONIC interval is also quite safe to
 279       assume to be correct, the (past) absolute timestamp to be printed from
 280       this can likely be correct even if it doesn't match what CLOCK_REALTIME
 281       would have indicated at that point in the past.  This logic does,
 282       however, not quite work for *future* times.
 283
 284       Generally speaking, almost all use cases in FRR should (and do) use
 285       CLOCK_MONOTONIC (through :c:func:`monotime()`.)
 286
 287    Flags common to printing calendar times and intervals:
 288
 289    ``p``: include spaces in appropriate places (depends on selected format.)
 290
 291    ``%p.3TV...``: specify sub-second resolution (use with ``FMT_NSTD`` to
 292    suppress gcc warning.)  As noted above, ``%pTT`` will never print sub-second
 293    digits since there are none.  Only some formats support printing sub-second
 294    digits and the default may vary.
 295
 296    The following flags are available for printing calendar times/dates:
 297
 298    (no flag): :frrfmtout:`Sat Jan  1 00:00:00 2022` - print output from
 299    ``ctime()``, in local time zone.  Since FRR does not currently use/enable
 300    locale support, this is always the C locale.  (Locale support getting added
 301    is unlikely for the time being and would likely break other things worse
 302    than this.)
 303
 304    ``i``: :frrfmtout:`2022-01-01T00:00:00.123` - ISO8601 timestamp in local
 305    time zone (note there is no ``Z`` or ``+00:00`` suffix.)  Defaults to
 306    millisecond precision.
 307
 308    ``ip``: :frrfmtout:`2022-01-01 00:00:00.123` - use readable form of ISO8601
 309    with space instead of ``T`` separator.
 310
 311    The following flags are available for printing intervals:
 312
 313    (no flag): :frrfmtout:`9w9d09:09:09.123` - does not match any
 314    preexisting format;  added because it does not lose precision (like ``t``)
 315    for longer intervals without printing huge numbers (like ``h``/``m``).
 316    Defaults to millisecond precision.  The week/day fields are left off if
 317    they're zero, ``p`` adds a space after the respective letter.
 318
 319    ``t``: :frrfmtout:`9w9d09h`, :frrfmtout:`9d09h09m`, :frrfmtout:`09:09:09` -
 320    this replaces :c:func:`frrtime_to_interval()`.  ``p`` adds spaces after
 321    week/day/hour letters.
 322
 323    ``d``: print decimal number of seconds.  Defaults to millisecond precision.
 324
 325    ``x`` / ``tx`` / ``dx``: Like no flag / ``t`` / ``d``, but print
 326    :frrfmtout:`-` for zero or negative intervals (for use with unset timers.)
 327
 328    ``h``: :frrfmtout:`09:09:09`
 329
 330    ``hx``: :frrfmtout:`09:09:09`, :frrfmtout:`--:--:--` - this replaces
 331    :c:func:`pim_time_timer_to_hhmmss()`.
 332
 333    ``m``: :frrfmtout:`09:09`
 334
 335    ``mx``: :frrfmtout:`09:09`, :frrfmtout:`--:--` - this replaces
 336    :c:func:`pim_time_timer_to_mmss()`.
 337
 338 FRR library helper formats
 339 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 340
 341 .. frrfmt:: %pTH (struct thread *)
 342
 343    Print remaining time on timer thread. Interval-printing flag characters
 344    listed above for ``%pTV`` can be added, e.g. ``%pTHtx``.
 345
 346    ``NULL`` pointers are printed as ``-``.
 347
 348 .. frrfmt:: %pTHD (struct thread *)
 349
 350    Print debugging information for given thread.  Sample output:
 351
 352    .. code-block:: none
 353
 354       {(thread *)NULL}
 355       {(thread *)0x55a3b5818910 arg=0x55a3b5827c50 timer  r=7.824      mld_t_query() &mld_ifp->t_query from pimd/pim6_mld.c:1369}
 356       {(thread *)0x55a3b5827230 arg=0x55a3b5827c50 read   fd=16        mld_t_recv() &mld_ifp->t_recv from pimd/pim6_mld.c:1186}
 357
 358    (The output is aligned to some degree.)
 359
 360 General utility formats
 361 ^^^^^^^^^^^^^^^^^^^^^^^
 362
 363 .. frrfmt:: %m (no argument)
 364
 365    :frrfmtout:`Permission denied`
 366
 367    Prints ``strerror(errno)``.  Does **not** consume any input argument, don't
 368    pass ``errno``!
 369
 370    (This is a GNU extension not specific to FRR.  FRR guarantees it is
 371    available on all systems in printfrr, though BSDs support it in printf too.)
 372
 373 .. frrfmt:: %pSQ (char *)
 374
 375    ([S]tring [Q]uote.)  Like ``%s``, but produce a quoted string.  Options:
 376
 377       ``n`` - treat ``NULL`` as empty string instead.
 378
 379       ``q`` - include ``""`` quotation marks.  Note: ``NULL`` is printed as
 380       ``(null)``, not ``"(null)"`` unless ``n`` is used too.  This is
 381       intentional.
 382
 383       ``s`` - use escaping suitable for RFC5424 syslog.  This means ``]`` is
 384       escaped too.
 385
 386    If a length is specified (``%*pSQ`` or ``%.*pSQ``), null bytes in the input
 387    string do not end the string and are just printed as ``\x00``.
 388
 389 .. frrfmt:: %pSE (char *)
 390
 391    ([S]tring [E]scape.)  Like ``%s``, but escape special characters.
 392    Options:
 393
 394       ``n`` - treat ``NULL`` as empty string instead.
 395
 396    Unlike :frrfmt:`%pSQ`, this escapes many more characters that are fine for
 397    a quoted string but not on their own.
 398
 399    If a length is specified (``%*pSE`` or ``%.*pSE``), null bytes in the input
 400    string do not end the string and are just printed as ``\x00``.
 401
 402 .. frrfmt:: %pVA (struct va_format *)
 403
 404    Recursively invoke printfrr, with arguments passed in through:
 405
 406    .. c:struct:: va_format
 407
 408       .. c:member:: const char *fmt
 409
 410          Format string to use for the recursive printfrr call.
 411
 412       .. c:member:: va_list *va
 413
 414          Formatting arguments.  Note this is passed as a pointer, not - as in
 415          most other places - a direct struct reference.  Internally uses
 416          ``va_copy()`` so repeated calls can be made (e.g. for determining
 417          output length.)
 418
 419 .. frrfmt:: %pFB (struct fbuf *)
 420
 421    Insert text from a ``struct fbuf *``, i.e. the output of a call to
 422    :c:func:`bprintfrr()`.
 423
 424 .. frrfmt:: %*pHX (void *, char *, unsigned char *)
 425
 426    ``%pHX``: :frrfmtout:`12 34 56 78`
 427
 428    ``%pHXc``: :frrfmtout:`12:34:56:78` (separate with [c]olon)
 429
 430    ``%pHXn``: :frrfmtout:`12345678` (separate with [n]othing)
 431
 432    Insert hexdump.  This specifier requires a precision or width to be
 433    specified.  A precision (``%.*pHX``) takes precedence, but generates a
 434    compiler warning since precisions are undefined for ``%p`` in ISO C.  If
 435    no precision is given, the width is used instead (and normal handling of
 436    the width is suppressed).
 437
 438    Note that width and precision are ``int`` arguments, not ``size_t``.  Use
 439    like::
 440
 441      char *buf;
 442      size_t len;
 443
 444      snprintfrr(out, sizeof(out), "... %*pHX ...", (int)len, buf);
 445
 446      /* with padding to width - would generate a warning due to %.*p */
 447      FMT_NSTD(snprintfrr(out, sizeof(out), "... %-47.*pHX ...", (int)len, buf));
 448
 449 .. frrfmt:: %*pHS (void *, char *, unsigned char *)
 450
 451    ``%pHS``: :frrfmtout:`hex.dump`
 452
 453    This is a complementary format for :frrfmt:`%*pHX` to print the text
 454    representation for a hexdump.  Non-printable characters are replaced with
 455    a dot.
 456
 457 Integer formats
 458 ^^^^^^^^^^^^^^^
 459
 460 .. note::
 461
 462    These formats currently only exist for advanced type checking with the
 463    ``frr-format`` GCC plugin.  They should not be used directly since they will
 464    cause compiler warnings when used without the plugin.  Use with
 465    :c:macro:`FMT_NSTD` if necessary.
 466
 467    It is possible ISO C23 may introduce another format for these, possibly
 468    ``%w64d`` discussed in `JTC 1/SC 22/WG 14/N2680 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf>`_.
 469
 470 .. frrfmt:: %Lu (uint64_t)
 471
 472    :frrfmtout:`12345`
 473
 474 .. frrfmt:: %Ld (int64_t)
 475
 476    :frrfmtout:`-12345`
 477
 478 Log levels
 479 ----------
 480
 481 Errors and warnings
 482 ^^^^^^^^^^^^^^^^^^^
 483
 484 If it is something that the user will want to look at and maybe do
 485 something, it is either an **error** or a **warning**.
 486
 487 We're expecting that warnings and errors are in some way visible to the
 488 user (in the worst case by looking at the log after the network broke, but
 489 maybe by a syslog collector from all routers.)  Therefore, anything that
 490 needs to get the user in the loop—and only these things—are warnings or
 491 errors.
 492
 493 Note that this doesn't necessarily mean the user needs to fix something in
 494 the FRR instance.  It also includes when we detect something else needs
 495 fixing, for example another router, the system we're running on, or the
 496 configuration.  The common point is that the user should probably do
 497 *something*.
 498
 499 Deciding between a warning and an error is slightly less obvious; the rule
 500 of thumb here is that an error will cause considerable fallout beyond its
 501 direct effect.  Closing a BGP session due to a malformed update is an error
 502 since all routes from the peer are dropped; discarding one route because
 503 its attributes don't make sense is a warning.
 504
 505 This also loosely corresponds to the kind of reaction we're expecting from
 506 the user.  An error is likely to need immediate response while a warning
 507 might be snoozed for a bit and addressed as part of general maintenance.
 508 If a problem will self-repair (e.g. by retransmits), it should be a
 509 warning—unless the impact until that self-repair is very harsh.
 510
 511 Examples for warnings:
 512
 513 * a BGP update, LSA or LSP could not be processed, but operation is
 514   proceeding and the broken pieces are likely to self-fix later
 515 * some kind of controller cannot be reached, but we can work without it
 516 * another router is using some unknown or unsupported capability
 517
 518 Examples for errors:
 519
 520 * dropping a BGP session due to malformed data
 521 * a socket for routing protocol operation cannot be opened
 522 * desynchronization from network state because something went wrong
 523 * *everything that we as developers would really like to be notified about,
 524   i.e. some assumption in the code isn't holding up*
 525
 526
 527 Informational messages
 528 ^^^^^^^^^^^^^^^^^^^^^^
 529
 530 Anything that provides introspection to the user during normal operation
 531 is an **info** message.
 532
 533 This includes all kinds of operational state transitions and events,
 534 especially if they might be interesting to the user during the course of
 535 figuring out a warning or an error.
 536
 537 By itself, these messages should mostly be statements of fact.  They might
 538 indicate the order and relationship in which things happened.  Also covered
 539 are conditions that might be "operational issues" like a link failure due
 540 to an unplugged cable.  If it's pretty much the point of running a routing
 541 daemon for, it's not a warning or an error, just business as usual.
 542
 543 The user should be able to see the state of these bits from operational
 544 state output, i.e. `show interface` or `show foobar neighbors`.  The log
 545 message indicating the change may have been printed weeks ago, but the
 546 state can always be viewed.  (If some state change has an info message but
 547 no "show" command, maybe that command needs to be added.)
 548
 549 Examples:
 550
 551 * all kinds of up/down state changes
 552
 553   * interface coming up or going down
 554   * addresses being added or deleted
 555   * peers and neighbors coming up or going down
 556
 557 * rejection of some routes due to user-configured route maps
 558 * backwards compatibility handling because another system on the network
 559   has a different or smaller feature set
 560
 561 .. note::
 562    The previously used **notify** priority is replaced with *info* in all
 563    cases.  We don't currently have a well-defined use case for it.
 564
 565
 566 Debug messages and asserts
 567 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 568
 569 Everything that is only interesting on-demand, or only while developing,
 570 is a **debug** message.  It might be interesting to the user for a
 571 particularly evasive issue, but in general these are details that an
 572 average user might not even be able to make sense of.
 573
 574 Most (or all?) debug messages should be behind a `debug foobar` category
 575 switch that controls which subset of these messages is currently
 576 interesting and thus printed.  If a debug message doesn't have such a
 577 guard, there should be a good explanation as to why.
 578
 579 Conversely, debug messages are the only thing that should be guarded by
 580 these switches.  Neither info nor warning or error messages should be
 581 hidden in this way.
 582
 583 **Asserts** should only be used as pretty crashes.  We are expecting that
 584 asserts remain enabled in production builds, but please try to not use
 585 asserts in a way that would cause a security problem if the assert wasn't
 586 there (i.e. don't use them for length checks.)
 587
 588 The purpose of asserts is mainly to help development and bug hunting.  If
 589 the daemon crashes, then having some more information is nice, and the
 590 assert can provide crucial hints that cut down on the time needed to track
 591 an issue.  That said, if the issue can be reasonably handled and/or isn't
 592 going to crash the daemon, it shouldn't be an assert.
 593
 594 For anything else where internal constraints are violated but we're not
 595 breaking due to it, it's an error instead (not a debug.)  These require
 596 "user action" of notifying the developers.
 597
 598 Examples:
 599
 600 * mismatched :code:`prev`/:code:`next` pointers in lists
 601 * some field that is absolutely needed is :code:`NULL`
 602 * any other kind of data structure corruption that will cause the daemon
 603   to crash sooner or later, one way or another
 604
 605 Thread-local buffering
 606 ----------------------
 607
 608 The core logging code in :file:`lib/zlog.c` allows setting up per-thread log
 609 message buffers in order to improve logging performance.  The following rules
 610 apply for this buffering:
 611
 612 * Only messages of priority *DEBUG* or *INFO* are buffered.
 613 * Any higher-priority message causes the thread's entire buffer to be flushed,
 614   thus message ordering is preserved on a per-thread level.
 615 * There is no guarantee on ordering between different threads;  in most cases
 616   this is arbitrary to begin with since the threads essentially race each
 617   other in printing log messages.  If an order is established with some
 618   synchronization primitive, add calls to :c:func:`zlog_tls_buffer_flush()`.
 619 * The buffers are only ever accessed by the thread they are created by.  This
 620   means no locking is necessary.
 621
 622 Both the main/default thread and additional threads created by
 623 :c:func:`frr_pthread_new()` with the default :c:func:`frr_run()` handler will
 624 initialize thread-local buffering and call :c:func:`zlog_tls_buffer_flush()`
 625 when idle.
 626
 627 If some piece of code runs for an extended period, it may be useful to insert
 628 calls to :c:func:`zlog_tls_buffer_flush()` in appropriate places:
 629
 630 .. c:function:: void zlog_tls_buffer_flush(void)
 631
 632    Write out any pending log messages that the calling thread may have in its
 633    buffer.  This function is safe to call regardless of the per-thread log
 634    buffer being set up / in use or not.
 635
 636 When working with threads that do not use the :c:struct:`thread_master`
 637 event loop, per-thread buffers can be managed with:
 638
 639 .. c:function:: void zlog_tls_buffer_init(void)
 640
 641    Set up thread-local buffering for log messages.  This function may be
 642    called repeatedly without adverse effects, but remember to call
 643    :c:func:`zlog_tls_buffer_fini()` at thread exit.
 644
 645    .. warning::
 646
 647       If this function is called, but :c:func:`zlog_tls_buffer_flush()` is
 648       not used, log message output will lag behind since messages will only be
 649       written out when the buffer is full.
 650
 651       Exiting the thread without calling :c:func:`zlog_tls_buffer_fini()`
 652       will cause buffered log messages to be lost.
 653
 654 .. c:function:: void zlog_tls_buffer_fini(void)
 655
 656    Flush pending messages and tear down thread-local log message buffering.
 657    This function may be called repeatedly regardless of whether
 658    :c:func:`zlog_tls_buffer_init()` was ever called.
 659
 660 Log targets
 661 -----------
 662
 663 The actual logging subsystem (in :file:`lib/zlog.c`) is heavily separated
 664 from the actual log writers.  It uses an atomic linked-list (`zlog_targets`)
 665 with RCU to maintain the log targets to be called.  This list is intended to
 666 function as "backend" only, it **is not used for configuration**.
 667
 668 Logging targets provide their configuration layer on top of this and maintain
 669 their own capability to enumerate and store their configuration.  Some targets
 670 (e.g. syslog) are inherently single instance and just stuff their config in
 671 global variables.  Others (e.g. file/fd output) are multi-instance capable.
 672 There is another layer boundary here between these and the VTY configuration
 673 that they use.
 674
 675 Basic internals
 676 ^^^^^^^^^^^^^^^
 677
 678 .. c:struct:: zlog_target
 679
 680    This struct needs to be filled in by any log target and then passed to
 681    :c:func:`zlog_target_replace()`.  After it has been registered,
 682    **RCU semantics apply**.  Most changes to associated data should make a
 683    copy, change that, and then replace the entire struct.
 684
 685    Additional per-target data should be "appended" by embedding this struct
 686    into a larger one, for use with `containerof()`, and
 687    :c:func:`zlog_target_clone()` and :c:func:`zlog_target_free()` should be
 688    used to allocate/free the entire container struct.
 689
 690    Do not use this structure to maintain configuration.  It should only
 691    contain (a copy of) the data needed to perform the actual logging.  For
 692    example, the syslog target uses this:
 693
 694    .. code-block:: c
 695
 696       struct zlt_syslog {
 697           struct zlog_target zt;
 698           int syslog_facility;
 699       };
 700
 701       static void zlog_syslog(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsgs)
 702       {
 703           struct zlt_syslog *zte = container_of(zt, struct zlt_syslog, zt);
 704           size_t i;
 705
 706           for (i = 0; i < nmsgs; i++)
 707               if (zlog_msg_prio(msgs[i]) <= zt->prio_min)
 708                   syslog(zlog_msg_prio(msgs[i]) | zte->syslog_facility, "%s",
 709                          zlog_msg_text(msgs[i], NULL));
 710       }
 711
 712
 713 .. c:function:: struct zlog_target *zlog_target_clone(struct memtype *mt, struct zlog_target *oldzt, size_t size)
 714
 715    Allocates a logging target struct.  Note that the ``oldzt`` argument may be
 716    ``NULL`` to allocate a "from scratch".  If ``oldzt`` is not ``NULL``, the
 717    generic bits in :c:struct:`zlog_target` are copied.  **Target specific
 718    bits are not copied.**
 719
 720 .. c:function:: struct zlog_target *zlog_target_replace(struct zlog_target *oldzt, struct zlog_target *newzt)
 721
 722    Adds, replaces or deletes a logging target (either ``oldzt`` or ``newzt`` may be ``NULL``.)
 723
 724    Returns ``oldzt`` for freeing.  The target remains possibly in use by
 725    other threads until the RCU cycle ends.  This implies you cannot release
 726    resources (e.g. memory, file descriptors) immediately.
 727
 728    The replace operation is not atomic; for a brief period it is possible that
 729    messages are delivered on both ``oldzt`` and ``newzt``.
 730
 731    .. warning::
 732
 733       ``oldzt`` must remain **functional** until the RCU cycle ends.
 734
 735 .. c:function:: void zlog_target_free(struct memtype *mt, struct zlog_target *zt)
 736
 737    Counterpart to :c:func:`zlog_target_clone()`, frees a target (using RCU.)
 738
 739 .. c:member:: void (*zlog_target.logfn)(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsg)
 740
 741    Called on a target to deliver "normal" logging messages.  ``msgs`` is an
 742    array of opaque structs containing the actual message.  Use ``zlog_msg_*``
 743    functions to access message data (this is done to allow some optimizations,
 744    e.g.  lazy formatting the message text and timestamp as needed.)
 745
 746    .. note::
 747
 748       ``logfn()`` must check each individual message's priority value against
 749       the configured ``prio_min``.  While the ``prio_min`` field is common to
 750       all targets and used by the core logging code to early-drop unneeded log
 751       messages, the array is **not** filtered for each ``logfn()`` call.
 752
 753 .. c:member:: void (*zlog_target.logfn_sigsafe)(struct zlog_target *zt, const char *text, size_t len)
 754
 755    Called to deliver "exception" logging messages (i.e. SEGV messages.)
 756    Must be Async-Signal-Safe (may not allocate memory or call "complicated"
 757    libc functions.)  May be ``NULL`` if the log target cannot handle this.
 758
 759 Standard targets
 760 ^^^^^^^^^^^^^^^^
 761
 762 :file:`lib/zlog_targets.c` provides the standard file / fd / syslog targets.
 763 The syslog target is single-instance while file / fd targets can be
 764 instantiated as needed.  There are 3 built-in targets that are fully
 765 autonomous without any config:
 766
 767 - startup logging to `stderr`, until either :c:func:`zlog_startup_end()` or
 768   :c:func:`zlog_aux_init()` is called.
 769 - stdout logging for non-daemon programs using :c:func:`zlog_aux_init()`
 770 - crashlogs written to :file:`/var/tmp/frr.daemon.crashlog`
 771
 772 The regular CLI/command-line logging setup is handled by :file:`lib/log_vty.c`
 773 which makes the appropriate instantiations of syslog / file / fd targets.
 774
 775 .. todo::
 776
 777   :c:func:`zlog_startup_end()` should do an explicit switchover from
 778   startup stderr logging to configured logging.  Currently, configured logging
 779   starts in parallel as soon as the respective setup is executed.  This results
 780   in some duplicate logging.