doc/developer/logging.rst

   1 .. _logging:
   2
   3 .. highlight:: c
   4
   5 Logging
   6 =======
   7
   8 One of the most frequent decisions to make while writing code for FRR is what
   9 to log, what level to log it at, and when to log it.  Here is a list of
  10 recommendations for these decisions.
  11
  12
  13 printfrr()
  14 ----------
  15
  16 ``printfrr()`` is FRR's modified version of ``printf()``, designed to make
  17 life easier when printing nontrivial datastructures.  The following variants
  18 are available:
  19
  20 .. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
  21 .. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  22
  23    These correspond to ``snprintf``/``vsnprintf``.  If you pass NULL for buf
  24    or 0 for len, no output is written but the return value is still calculated.
  25
  26    The return value is always the full length of the output, unconstrained by
  27    `len`.  It does **not** include the terminating ``\0`` character.  A
  28    malformed format string can result in a ``-1`` return value.
  29
  30 .. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
  31 .. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
  32
  33    Same as above, but the ``c`` stands for "continue" or "concatenate".  The
  34    output is appended to the string instead of overwriting it.
  35
  36 .. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
  37 .. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
  38
  39    These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
  40    that.  If the format string is malformed, they return a copy of the format
  41    string, so the return value is always non-NULL and always dynamically
  42    allocated with `mt`.
  43
  44 .. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
  45 .. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
  46
  47    This variant tries to use the static buffer provided, but falls back to
  48    dynamic allocation if it is insufficient.
  49
  50    The return value can be either `buf` or a newly allocated string using
  51    `mt`.  You MUST free it like this::
  52
  53       char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
  54       if (ret != buf)
  55          XFREE(MTYPE_FOO, ret);
  56
  57 .. c:function:: ssize_t bprintfrr(struct fbuf *fb, const char *fmt, ...)
  58 .. c:function:: ssize_t vbprintfrr(struct fbuf *fb, const char *fmt, va_list)
  59
  60    These are the "lowest level" functions, which the other variants listed
  61    above use to implement their functionality on top.  Mainly useful for
  62    implementing printfrr extensions since those get a ``struct fbuf *`` to
  63    write their output to.
  64
  65 .. c:macro:: FMT_NSTD(expr)
  66
  67    This macro turns off/on format warnings as needed when non-ISO-C
  68    compatible printfrr extensions are used (e.g. ``%.*p`` or ``%Ld``.)::
  69
  70       vty_out(vty, "standard compatible %pI4\n", &addr);
  71       FMT_NSTD(vty_out(vty, "non-standard %-47.*pHX\n", (int)len, buf));
  72
  73    When the frr-format plugin is in use, this macro is a no-op since the
  74    frr-format plugin supports all printfrr extensions.  Since the FRR CI
  75    includes a system with the plugin enabled, this means format errors will
  76    not slip by undetected even with FMT_NSTD.
  77
  78 .. note::
  79
  80    ``printfrr()`` does not support the ``%n`` format.
  81
  82 AS-Safety
  83 ^^^^^^^^^
  84
  85 ``printfrr()`` are AS-Safe under the following conditions:
  86
  87 * the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
  88 * floating point specifiers are not AS-Safe (system printf is used for these)
  89 * the positional ``%1$d`` syntax should not be used (8 arguments are supported
  90   while AS-Safe)
  91 * extensions are only AS-Safe if their printer is AS-Safe
  92
  93 printfrr Extensions
  94 -------------------
  95
  96 ``printfrr()`` format strings can be extended with suffixes after `%p` or `%d`.
  97 Printf features like field lengths can be used normally with these extensions,
  98 e.g. ``%-15pI4`` works correctly, **except if the extension consumes the
  99 width or precision**.  Extensions that do so are listed below as ``%*pXX``
 100 rather than ``%pXX``.
 101
 102 The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
 103 by means of established pattern uppercase letters and numbers form the type
 104 identifier which may be followed by lowercase flags.
 105
 106 You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
 107 printers and what exactly they do.  More printers are likely to be added as
 108 needed/useful, so the list here may be outdated.
 109
 110 .. note::
 111
 112    The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
 113    internally, so these extensions are available there.  However, they are
 114    **not** available when calling ``snprintf`` directly.  You need to call
 115    ``snprintfrr`` instead.
 116
 117 Networking data types
 118 ^^^^^^^^^^^^^^^^^^^^^
 119
 120 .. role:: frrfmtout(code)
 121
 122 .. frrfmt:: %pI4 (struct in_addr *, in_addr_t *)
 123
 124    :frrfmtout:`1.2.3.4`
 125
 126    ``%pI4s``: :frrfmtout:`*` — print star instead of ``0.0.0.0`` (for multicast)
 127
 128 .. frrfmt:: %pI6 (struct in6_addr *)
 129
 130    :frrfmtout:`fe80::1234`
 131
 132    ``%pI6s``: :frrfmtout:`*` — print star instead of ``::`` (for multicast)
 133
 134 .. frrfmt:: %pEA (struct ethaddr *)
 135
 136    :frrfmtout:`01:23:45:67:89:ab`
 137
 138 .. frrfmt:: %pIA (struct ipaddr *)
 139
 140    :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 141
 142    ``%pIAs``: — print star instead of zero address (for multicast)
 143
 144 .. frrfmt:: %pFX (struct prefix *)
 145
 146    :frrfmtout:`1.2.3.0/24` / :frrfmtout:`fe80::1234/64`
 147
 148    This accepts the following types:
 149
 150    - :c:struct:`prefix`
 151    - :c:struct:`prefix_ipv4`
 152    - :c:struct:`prefix_ipv6`
 153    - :c:struct:`prefix_eth`
 154    - :c:struct:`prefix_evpn`
 155    - :c:struct:`prefix_fs`
 156
 157    It does **not** accept the following types:
 158
 159    - :c:struct:`prefix_ls`
 160    - :c:struct:`prefix_rd`
 161    - :c:struct:`prefix_ptr`
 162    - :c:struct:`prefix_sg` (use :frrfmt:`%pPSG4`)
 163    - :c:union:`prefixptr` (dereference to get :c:struct:`prefix`)
 164    - :c:union:`prefixconstptr` (dereference to get :c:struct:`prefix`)
 165
 166    Options:
 167
 168    ``%pFXh``: (address only) :frrfmtout:`1.2.3.0` / :frrfmtout:`fe80::1234`
 169
 170 .. frrfmt:: %pPSG4 (struct prefix_sg *)
 171
 172    :frrfmtout:`(*,1.2.3.4)`
 173
 174    This is *(S,G)* output for use in zebra.  (Note prefix_sg is not a prefix
 175    "subclass" like the other prefix_* structs.)
 176
 177 .. frrfmt:: %pSU (union sockunion *)
 178
 179    ``%pSU``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234`
 180
 181    ``%pSUs``: :frrfmtout:`1.2.3.4` / :frrfmtout:`fe80::1234%89`
 182    (adds IPv6 scope ID as integer)
 183
 184    ``%pSUp``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234]:567`
 185    (adds port)
 186
 187    ``%pSUps``: :frrfmtout:`1.2.3.4:567` / :frrfmtout:`[fe80::1234%89]:567`
 188    (adds port and scope ID)
 189
 190 .. frrfmt:: %pRN (struct route_node *, struct bgp_node *, struct agg_node *)
 191
 192    :frrfmtout:`192.168.1.0/24` (dst-only node)
 193
 194    :frrfmtout:`2001:db8::/32 from fe80::/64` (SADR node)
 195
 196 .. frrfmt:: %pNH (struct nexthop *)
 197
 198    ``%pNHvv``: :frrfmtout:`via 1.2.3.4, eth0` — verbose zebra format
 199
 200    ``%pNHv``: :frrfmtout:`1.2.3.4, via eth0` — slightly less verbose zebra format
 201
 202    ``%pNHs``: :frrfmtout:`1.2.3.4 if 15` — same as :c:func:`nexthop2str()`
 203
 204    ``%pNHcg``: :frrfmtout:`1.2.3.4` — compact gateway only
 205
 206    ``%pNHci``: :frrfmtout:`eth0` — compact interface only
 207
 208 .. frrfmt:: %dPF (int)
 209
 210    :frrfmtout:`AF_INET`
 211
 212    Prints an `AF_*` / `PF_*` constant.  ``PF`` is used here to avoid confusion
 213    with `AFI` constants, even though the FRR codebase prefers `AF_INET` over
 214    `PF_INET` & co.
 215
 216 .. frrfmt:: %dSO (int)
 217
 218    :frrfmtout:`SOCK_STREAM`
 219
 220 Time/interval formats
 221 ^^^^^^^^^^^^^^^^^^^^^
 222
 223 .. frrfmt:: %pTS (struct timespec *)
 224
 225 .. frrfmt:: %pTV (struct timeval *)
 226
 227 .. frrfmt:: %pTT (time_t *)
 228
 229    Above 3 options internally result in the same code being called, support
 230    the same flags and produce equal output with one exception:  ``%pTT``
 231    has no sub-second precision and the formatter will never print a
 232    (nonsensical) ``.000``.
 233
 234    Exactly one of ``I``, ``M`` or ``R`` must immediately follow after
 235    ``TS``/``TV``/``TT`` to specify whether the input is an interval, monotonic
 236    timestamp or realtime timestamp:
 237
 238    ``%pTVI``: input is an interval, not a timestamp.  Print interval.
 239
 240    ``%pTVIs``: input is an interval, convert to wallclock by subtracting it
 241    from current time (i.e. interval has passed **s**\ ince.)
 242
 243    ``%pTVIu``: input is an interval, convert to wallclock by adding it to
 244    current time (i.e. **u**\ ntil interval has passed.)
 245
 246    ``%pTVM`` - input is a timestamp on CLOCK_MONOTONIC, convert to wallclock
 247    time (by grabbing current CLOCK_MONOTONIC and CLOCK_REALTIME and doing the
 248    math) and print calendaric date.
 249
 250    ``%pTVMs`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 251    **s**\ ince that timestamp (elapsed.)
 252
 253    ``%pTVMu`` - input is a timestamp on CLOCK_MONOTONIC, print interval
 254    **u**\ ntil that timestamp (deadline.)
 255
 256    ``%pTVR`` - input is a timestamp on CLOCK_REALTIME, print calendaric date.
 257
 258    ``%pTVRs`` - input is a timestamp on CLOCK_REALTIME, print interval
 259    **s**\ ince that timestamp.
 260
 261    ``%pTVRu`` - input is a timestamp on CLOCK_REALTIME, print interval
 262    **u**\ ntil that timestamp.
 263
 264    ``%pTVA`` - reserved for CLOCK_TAI in case a PTP implementation is
 265    interfaced to FRR.  Not currently implemented.
 266
 267    .. note::
 268
 269       If ``%pTVRs`` or ``%pTVRu`` are used, this is generally an indication
 270       that a CLOCK_MONOTONIC timestamp should be used instead (or added in
 271       parallel.) CLOCK_REALTIME might be adjusted by NTP, PTP or similar
 272       procedures, causing bogus intervals to be printed.
 273
 274       ``%pTVM`` on first look might be assumed to have the same problem, but
 275       on closer thought the assumption is always that current system time is
 276       correct.  And since a CLOCK_MONOTONIC interval is also quite safe to
 277       assume to be correct, the (past) absolute timestamp to be printed from
 278       this can likely be correct even if it doesn't match what CLOCK_REALTIME
 279       would have indicated at that point in the past.  This logic does,
 280       however, not quite work for *future* times.
 281
 282       Generally speaking, almost all use cases in FRR should (and do) use
 283       CLOCK_MONOTONIC (through :c:func:`monotime()`.)
 284
 285    Flags common to printing calendar times and intervals:
 286
 287    ``p``: include spaces in appropriate places (depends on selected format.)
 288
 289    ``%p.3TV...``: specify sub-second resolution (use with ``FMT_NSTD`` to
 290    suppress gcc warning.)  As noted above, ``%pTT`` will never print sub-second
 291    digits since there are none.  Only some formats support printing sub-second
 292    digits and the default may vary.
 293
 294    The following flags are available for printing calendar times/dates:
 295
 296    (no flag): :frrfmtout:`Sat Jan  1 00:00:00 2022` - print output from
 297    ``ctime()``, in local time zone.  Since FRR does not currently use/enable
 298    locale support, this is always the C locale.  (Locale support getting added
 299    is unlikely for the time being and would likely break other things worse
 300    than this.)
 301
 302    ``i``: :frrfmtout:`2022-01-01T00:00:00.123` - ISO8601 timestamp in local
 303    time zone (note there is no ``Z`` or ``+00:00`` suffix.)  Defaults to
 304    millisecond precision.
 305
 306    ``ip``: :frrfmtout:`2022-01-01 00:00:00.123` - use readable form of ISO8601
 307    with space instead of ``T`` separator.
 308
 309    The following flags are available for printing intervals:
 310
 311    (no flag): :frrfmtout:`9w9d09:09:09.123` - does not match any
 312    preexisting format;  added because it does not lose precision (like ``t``)
 313    for longer intervals without printing huge numbers (like ``h``/``m``).
 314    Defaults to millisecond precision.  The week/day fields are left off if
 315    they're zero, ``p`` adds a space after the respective letter.
 316
 317    ``t``: :frrfmtout:`9w9d09h`, :frrfmtout:`9d09h09m`, :frrfmtout:`09:09:09` -
 318    this replaces :c:func:`frrtime_to_interval()`.  ``p`` adds spaces after
 319    week/day/hour letters.
 320
 321    ``d``: print decimal number of seconds.  Defaults to millisecond precision.
 322
 323    ``x`` / ``tx`` / ``dx``: Like no flag / ``t`` / ``d``, but print
 324    :frrfmtout:`-` for zero or negative intervals (for use with unset timers.)
 325
 326    ``h``: :frrfmtout:`09:09:09`
 327
 328    ``hx``: :frrfmtout:`09:09:09`, :frrfmtout:`--:--:--` - this replaces
 329    :c:func:`pim_time_timer_to_hhmmss()`.
 330
 331    ``m``: :frrfmtout:`09:09`
 332
 333    ``mx``: :frrfmtout:`09:09`, :frrfmtout:`--:--` - this replaces
 334    :c:func:`pim_time_timer_to_mmss()`.
 335
 336 FRR library helper formats
 337 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 338
 339 .. frrfmt:: %pTH (struct thread *)
 340
 341    Print remaining time on timer thread. Interval-printing flag characters
 342    listed above for ``%pTV`` can be added, e.g. ``%pTHtx``.
 343
 344    ``NULL`` pointers are printed as ``-``.
 345
 346 .. frrfmt:: %pTHD (struct thread *)
 347
 348    Print debugging information for given thread.  Sample output:
 349
 350    .. code-block:: none
 351
 352       {(thread *)NULL}
 353       {(thread *)0x55a3b5818910 arg=0x55a3b5827c50 timer  r=7.824      mld_t_query() &mld_ifp->t_query from pimd/pim6_mld.c:1369}
 354       {(thread *)0x55a3b5827230 arg=0x55a3b5827c50 read   fd=16        mld_t_recv() &mld_ifp->t_recv from pimd/pim6_mld.c:1186}
 355
 356    (The output is aligned to some degree.)
 357
 358 FRR daemon specific formats
 359 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 360
 361 The following formats are only available in specific daemons, as the code
 362 implementing them is part of the daemon, not the library.
 363
 364 zebra
 365 """""
 366
 367 .. frrfmt:: %pZN (struct route_node *)
 368
 369    Print information for a RIB node, including zebra-specific data.
 370
 371    :frrfmtout:`::/0 src fe80::/64 (MRIB)` (``%pZN``)
 372
 373    :frrfmtout:`1234` (``%pZNt`` - table number)
 374
 375 bgpd
 376 """"
 377
 378 .. frrfmt:: %pBD (struct bgp_dest *)
 379
 380    Print prefix for a BGP destination.
 381
 382    :frrfmtout:`fe80::1234/64`
 383
 384 .. frrfmt:: %pBP (struct peer *)
 385
 386    :frrfmtout:`192.168.1.1(leaf1.frrouting.org)`
 387
 388    Print BGP peer's IP and hostname together.
 389
 390 pimd/pim6d
 391 """"""""""
 392
 393 .. frrfmt:: %pPA (pim_addr *)
 394
 395    Format IP address according to IP version (pimd vs. pim6d) being compiled.
 396
 397    :frrfmtout:`fe80::1234` / :frrfmtout:`10.0.0.1`
 398
 399    :frrfmtout:`*` (``%pPAs`` - replace 0.0.0.0/:: with star)
 400
 401 .. frrfmt:: %pSG (pim_sgaddr *)
 402
 403    Format S,G pair according to IP version (pimd vs. pim6d) being compiled.
 404    Braces are included.
 405
 406    :frrfmtout:`(*,224.0.0.0)`
 407
 408
 409 General utility formats
 410 ^^^^^^^^^^^^^^^^^^^^^^^
 411
 412 .. frrfmt:: %m (no argument)
 413
 414    :frrfmtout:`Permission denied`
 415
 416    Prints ``strerror(errno)``.  Does **not** consume any input argument, don't
 417    pass ``errno``!
 418
 419    (This is a GNU extension not specific to FRR.  FRR guarantees it is
 420    available on all systems in printfrr, though BSDs support it in printf too.)
 421
 422 .. frrfmt:: %pSQ (char *)
 423
 424    ([S]tring [Q]uote.)  Like ``%s``, but produce a quoted string.  Options:
 425
 426       ``n`` - treat ``NULL`` as empty string instead.
 427
 428       ``q`` - include ``""`` quotation marks.  Note: ``NULL`` is printed as
 429       ``(null)``, not ``"(null)"`` unless ``n`` is used too.  This is
 430       intentional.
 431
 432       ``s`` - use escaping suitable for RFC5424 syslog.  This means ``]`` is
 433       escaped too.
 434
 435    If a length is specified (``%*pSQ`` or ``%.*pSQ``), null bytes in the input
 436    string do not end the string and are just printed as ``\x00``.
 437
 438 .. frrfmt:: %pSE (char *)
 439
 440    ([S]tring [E]scape.)  Like ``%s``, but escape special characters.
 441    Options:
 442
 443       ``n`` - treat ``NULL`` as empty string instead.
 444
 445    Unlike :frrfmt:`%pSQ`, this escapes many more characters that are fine for
 446    a quoted string but not on their own.
 447
 448    If a length is specified (``%*pSE`` or ``%.*pSE``), null bytes in the input
 449    string do not end the string and are just printed as ``\x00``.
 450
 451 .. frrfmt:: %pVA (struct va_format *)
 452
 453    Recursively invoke printfrr, with arguments passed in through:
 454
 455    .. c:struct:: va_format
 456
 457       .. c:member:: const char *fmt
 458
 459          Format string to use for the recursive printfrr call.
 460
 461       .. c:member:: va_list *va
 462
 463          Formatting arguments.  Note this is passed as a pointer, not - as in
 464          most other places - a direct struct reference.  Internally uses
 465          ``va_copy()`` so repeated calls can be made (e.g. for determining
 466          output length.)
 467
 468 .. frrfmt:: %pFB (struct fbuf *)
 469
 470    Insert text from a ``struct fbuf *``, i.e. the output of a call to
 471    :c:func:`bprintfrr()`.
 472
 473 .. frrfmt:: %*pHX (void *, char *, unsigned char *)
 474
 475    ``%pHX``: :frrfmtout:`12 34 56 78`
 476
 477    ``%pHXc``: :frrfmtout:`12:34:56:78` (separate with [c]olon)
 478
 479    ``%pHXn``: :frrfmtout:`12345678` (separate with [n]othing)
 480
 481    Insert hexdump.  This specifier requires a precision or width to be
 482    specified.  A precision (``%.*pHX``) takes precedence, but generates a
 483    compiler warning since precisions are undefined for ``%p`` in ISO C.  If
 484    no precision is given, the width is used instead (and normal handling of
 485    the width is suppressed).
 486
 487    Note that width and precision are ``int`` arguments, not ``size_t``.  Use
 488    like::
 489
 490      char *buf;
 491      size_t len;
 492
 493      snprintfrr(out, sizeof(out), "... %*pHX ...", (int)len, buf);
 494
 495      /* with padding to width - would generate a warning due to %.*p */
 496      FMT_NSTD(snprintfrr(out, sizeof(out), "... %-47.*pHX ...", (int)len, buf));
 497
 498 .. frrfmt:: %*pHS (void *, char *, unsigned char *)
 499
 500    ``%pHS``: :frrfmtout:`hex.dump`
 501
 502    This is a complementary format for :frrfmt:`%*pHX` to print the text
 503    representation for a hexdump.  Non-printable characters are replaced with
 504    a dot.
 505
 506 Integer formats
 507 ^^^^^^^^^^^^^^^
 508
 509 .. note::
 510
 511    These formats currently only exist for advanced type checking with the
 512    ``frr-format`` GCC plugin.  They should not be used directly since they will
 513    cause compiler warnings when used without the plugin.  Use with
 514    :c:macro:`FMT_NSTD` if necessary.
 515
 516    It is possible ISO C23 may introduce another format for these, possibly
 517    ``%w64d`` discussed in `JTC 1/SC 22/WG 14/N2680 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf>`_.
 518
 519 .. frrfmt:: %Lu (uint64_t)
 520
 521    :frrfmtout:`12345`
 522
 523 .. frrfmt:: %Ld (int64_t)
 524
 525    :frrfmtout:`-12345`
 526
 527 Log levels
 528 ----------
 529
 530 Errors and warnings
 531 ^^^^^^^^^^^^^^^^^^^
 532
 533 If it is something that the user will want to look at and maybe do
 534 something, it is either an **error** or a **warning**.
 535
 536 We're expecting that warnings and errors are in some way visible to the
 537 user (in the worst case by looking at the log after the network broke, but
 538 maybe by a syslog collector from all routers.)  Therefore, anything that
 539 needs to get the user in the loop—and only these things—are warnings or
 540 errors.
 541
 542 Note that this doesn't necessarily mean the user needs to fix something in
 543 the FRR instance.  It also includes when we detect something else needs
 544 fixing, for example another router, the system we're running on, or the
 545 configuration.  The common point is that the user should probably do
 546 *something*.
 547
 548 Deciding between a warning and an error is slightly less obvious; the rule
 549 of thumb here is that an error will cause considerable fallout beyond its
 550 direct effect.  Closing a BGP session due to a malformed update is an error
 551 since all routes from the peer are dropped; discarding one route because
 552 its attributes don't make sense is a warning.
 553
 554 This also loosely corresponds to the kind of reaction we're expecting from
 555 the user.  An error is likely to need immediate response while a warning
 556 might be snoozed for a bit and addressed as part of general maintenance.
 557 If a problem will self-repair (e.g. by retransmits), it should be a
 558 warning—unless the impact until that self-repair is very harsh.
 559
 560 Examples for warnings:
 561
 562 * a BGP update, LSA or LSP could not be processed, but operation is
 563   proceeding and the broken pieces are likely to self-fix later
 564 * some kind of controller cannot be reached, but we can work without it
 565 * another router is using some unknown or unsupported capability
 566
 567 Examples for errors:
 568
 569 * dropping a BGP session due to malformed data
 570 * a socket for routing protocol operation cannot be opened
 571 * desynchronization from network state because something went wrong
 572 * *everything that we as developers would really like to be notified about,
 573   i.e. some assumption in the code isn't holding up*
 574
 575
 576 Informational messages
 577 ^^^^^^^^^^^^^^^^^^^^^^
 578
 579 Anything that provides introspection to the user during normal operation
 580 is an **info** message.
 581
 582 This includes all kinds of operational state transitions and events,
 583 especially if they might be interesting to the user during the course of
 584 figuring out a warning or an error.
 585
 586 By itself, these messages should mostly be statements of fact.  They might
 587 indicate the order and relationship in which things happened.  Also covered
 588 are conditions that might be "operational issues" like a link failure due
 589 to an unplugged cable.  If it's pretty much the point of running a routing
 590 daemon for, it's not a warning or an error, just business as usual.
 591
 592 The user should be able to see the state of these bits from operational
 593 state output, i.e. `show interface` or `show foobar neighbors`.  The log
 594 message indicating the change may have been printed weeks ago, but the
 595 state can always be viewed.  (If some state change has an info message but
 596 no "show" command, maybe that command needs to be added.)
 597
 598 Examples:
 599
 600 * all kinds of up/down state changes
 601
 602   * interface coming up or going down
 603   * addresses being added or deleted
 604   * peers and neighbors coming up or going down
 605
 606 * rejection of some routes due to user-configured route maps
 607 * backwards compatibility handling because another system on the network
 608   has a different or smaller feature set
 609
 610 .. note::
 611    The previously used **notify** priority is replaced with *info* in all
 612    cases.  We don't currently have a well-defined use case for it.
 613
 614
 615 Debug messages and asserts
 616 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 617
 618 Everything that is only interesting on-demand, or only while developing,
 619 is a **debug** message.  It might be interesting to the user for a
 620 particularly evasive issue, but in general these are details that an
 621 average user might not even be able to make sense of.
 622
 623 Most (or all?) debug messages should be behind a `debug foobar` category
 624 switch that controls which subset of these messages is currently
 625 interesting and thus printed.  If a debug message doesn't have such a
 626 guard, there should be a good explanation as to why.
 627
 628 Conversely, debug messages are the only thing that should be guarded by
 629 these switches.  Neither info nor warning or error messages should be
 630 hidden in this way.
 631
 632 **Asserts** should only be used as pretty crashes.  We are expecting that
 633 asserts remain enabled in production builds, but please try to not use
 634 asserts in a way that would cause a security problem if the assert wasn't
 635 there (i.e. don't use them for length checks.)
 636
 637 The purpose of asserts is mainly to help development and bug hunting.  If
 638 the daemon crashes, then having some more information is nice, and the
 639 assert can provide crucial hints that cut down on the time needed to track
 640 an issue.  That said, if the issue can be reasonably handled and/or isn't
 641 going to crash the daemon, it shouldn't be an assert.
 642
 643 For anything else where internal constraints are violated but we're not
 644 breaking due to it, it's an error instead (not a debug.)  These require
 645 "user action" of notifying the developers.
 646
 647 Examples:
 648
 649 * mismatched :code:`prev`/:code:`next` pointers in lists
 650 * some field that is absolutely needed is :code:`NULL`
 651 * any other kind of data structure corruption that will cause the daemon
 652   to crash sooner or later, one way or another
 653
 654 Thread-local buffering
 655 ----------------------
 656
 657 The core logging code in :file:`lib/zlog.c` allows setting up per-thread log
 658 message buffers in order to improve logging performance.  The following rules
 659 apply for this buffering:
 660
 661 * Only messages of priority *DEBUG* or *INFO* are buffered.
 662 * Any higher-priority message causes the thread's entire buffer to be flushed,
 663   thus message ordering is preserved on a per-thread level.
 664 * There is no guarantee on ordering between different threads;  in most cases
 665   this is arbitrary to begin with since the threads essentially race each
 666   other in printing log messages.  If an order is established with some
 667   synchronization primitive, add calls to :c:func:`zlog_tls_buffer_flush()`.
 668 * The buffers are only ever accessed by the thread they are created by.  This
 669   means no locking is necessary.
 670
 671 Both the main/default thread and additional threads created by
 672 :c:func:`frr_pthread_new()` with the default :c:func:`frr_run()` handler will
 673 initialize thread-local buffering and call :c:func:`zlog_tls_buffer_flush()`
 674 when idle.
 675
 676 If some piece of code runs for an extended period, it may be useful to insert
 677 calls to :c:func:`zlog_tls_buffer_flush()` in appropriate places:
 678
 679 .. c:function:: void zlog_tls_buffer_flush(void)
 680
 681    Write out any pending log messages that the calling thread may have in its
 682    buffer.  This function is safe to call regardless of the per-thread log
 683    buffer being set up / in use or not.
 684
 685 When working with threads that do not use the :c:struct:`thread_master`
 686 event loop, per-thread buffers can be managed with:
 687
 688 .. c:function:: void zlog_tls_buffer_init(void)
 689
 690    Set up thread-local buffering for log messages.  This function may be
 691    called repeatedly without adverse effects, but remember to call
 692    :c:func:`zlog_tls_buffer_fini()` at thread exit.
 693
 694    .. warning::
 695
 696       If this function is called, but :c:func:`zlog_tls_buffer_flush()` is
 697       not used, log message output will lag behind since messages will only be
 698       written out when the buffer is full.
 699
 700       Exiting the thread without calling :c:func:`zlog_tls_buffer_fini()`
 701       will cause buffered log messages to be lost.
 702
 703 .. c:function:: void zlog_tls_buffer_fini(void)
 704
 705    Flush pending messages and tear down thread-local log message buffering.
 706    This function may be called repeatedly regardless of whether
 707    :c:func:`zlog_tls_buffer_init()` was ever called.
 708
 709 Log targets
 710 -----------
 711
 712 The actual logging subsystem (in :file:`lib/zlog.c`) is heavily separated
 713 from the actual log writers.  It uses an atomic linked-list (`zlog_targets`)
 714 with RCU to maintain the log targets to be called.  This list is intended to
 715 function as "backend" only, it **is not used for configuration**.
 716
 717 Logging targets provide their configuration layer on top of this and maintain
 718 their own capability to enumerate and store their configuration.  Some targets
 719 (e.g. syslog) are inherently single instance and just stuff their config in
 720 global variables.  Others (e.g. file/fd output) are multi-instance capable.
 721 There is another layer boundary here between these and the VTY configuration
 722 that they use.
 723
 724 Basic internals
 725 ^^^^^^^^^^^^^^^
 726
 727 .. c:struct:: zlog_target
 728
 729    This struct needs to be filled in by any log target and then passed to
 730    :c:func:`zlog_target_replace()`.  After it has been registered,
 731    **RCU semantics apply**.  Most changes to associated data should make a
 732    copy, change that, and then replace the entire struct.
 733
 734    Additional per-target data should be "appended" by embedding this struct
 735    into a larger one, for use with `containerof()`, and
 736    :c:func:`zlog_target_clone()` and :c:func:`zlog_target_free()` should be
 737    used to allocate/free the entire container struct.
 738
 739    Do not use this structure to maintain configuration.  It should only
 740    contain (a copy of) the data needed to perform the actual logging.  For
 741    example, the syslog target uses this:
 742
 743    .. code-block:: c
 744
 745       struct zlt_syslog {
 746           struct zlog_target zt;
 747           int syslog_facility;
 748       };
 749
 750       static void zlog_syslog(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsgs)
 751       {
 752           struct zlt_syslog *zte = container_of(zt, struct zlt_syslog, zt);
 753           size_t i;
 754
 755           for (i = 0; i < nmsgs; i++)
 756               if (zlog_msg_prio(msgs[i]) <= zt->prio_min)
 757                   syslog(zlog_msg_prio(msgs[i]) | zte->syslog_facility, "%s",
 758                          zlog_msg_text(msgs[i], NULL));
 759       }
 760
 761
 762 .. c:function:: struct zlog_target *zlog_target_clone(struct memtype *mt, struct zlog_target *oldzt, size_t size)
 763
 764    Allocates a logging target struct.  Note that the ``oldzt`` argument may be
 765    ``NULL`` to allocate a "from scratch".  If ``oldzt`` is not ``NULL``, the
 766    generic bits in :c:struct:`zlog_target` are copied.  **Target specific
 767    bits are not copied.**
 768
 769 .. c:function:: struct zlog_target *zlog_target_replace(struct zlog_target *oldzt, struct zlog_target *newzt)
 770
 771    Adds, replaces or deletes a logging target (either ``oldzt`` or ``newzt`` may be ``NULL``.)
 772
 773    Returns ``oldzt`` for freeing.  The target remains possibly in use by
 774    other threads until the RCU cycle ends.  This implies you cannot release
 775    resources (e.g. memory, file descriptors) immediately.
 776
 777    The replace operation is not atomic; for a brief period it is possible that
 778    messages are delivered on both ``oldzt`` and ``newzt``.
 779
 780    .. warning::
 781
 782       ``oldzt`` must remain **functional** until the RCU cycle ends.
 783
 784 .. c:function:: void zlog_target_free(struct memtype *mt, struct zlog_target *zt)
 785
 786    Counterpart to :c:func:`zlog_target_clone()`, frees a target (using RCU.)
 787
 788 .. c:member:: void (*zlog_target.logfn)(struct zlog_target *zt, struct zlog_msg *msgs[], size_t nmsg)
 789
 790    Called on a target to deliver "normal" logging messages.  ``msgs`` is an
 791    array of opaque structs containing the actual message.  Use ``zlog_msg_*``
 792    functions to access message data (this is done to allow some optimizations,
 793    e.g.  lazy formatting the message text and timestamp as needed.)
 794
 795    .. note::
 796
 797       ``logfn()`` must check each individual message's priority value against
 798       the configured ``prio_min``.  While the ``prio_min`` field is common to
 799       all targets and used by the core logging code to early-drop unneeded log
 800       messages, the array is **not** filtered for each ``logfn()`` call.
 801
 802 .. c:member:: void (*zlog_target.logfn_sigsafe)(struct zlog_target *zt, const char *text, size_t len)
 803
 804    Called to deliver "exception" logging messages (i.e. SEGV messages.)
 805    Must be Async-Signal-Safe (may not allocate memory or call "complicated"
 806    libc functions.)  May be ``NULL`` if the log target cannot handle this.
 807
 808 Standard targets
 809 ^^^^^^^^^^^^^^^^
 810
 811 :file:`lib/zlog_targets.c` provides the standard file / fd / syslog targets.
 812 The syslog target is single-instance while file / fd targets can be
 813 instantiated as needed.  There are 3 built-in targets that are fully
 814 autonomous without any config:
 815
 816 - startup logging to `stderr`, until either :c:func:`zlog_startup_end()` or
 817   :c:func:`zlog_aux_init()` is called.
 818 - stdout logging for non-daemon programs using :c:func:`zlog_aux_init()`
 819 - crashlogs written to :file:`/var/tmp/frr.daemon.crashlog`
 820
 821 The regular CLI/command-line logging setup is handled by :file:`lib/log_vty.c`
 822 which makes the appropriate instantiations of syslog / file / fd targets.
 823
 824 .. todo::
 825
 826   :c:func:`zlog_startup_end()` should do an explicit switchover from
 827   startup stderr logging to configured logging.  Currently, configured logging
 828   starts in parallel as soon as the respective setup is executed.  This results
 829   in some duplicate logging.