]> git.proxmox.com Git - mirror_frr.git/blame - doc/developer/logging.rst
doc: Add some basic how-to for lua
[mirror_frr.git] / doc / developer / logging.rst
CommitLineData
3e5d2f71
DL
1Developer's Guide to Logging
2============================
3
4One of the most frequent decisions to make while writing code for FRR is what
5to log, what level to log it at, and when to log it. Here is a list of
6recommendations for these decisions.
7
8
2a37d6c5
DL
9printfrr()
10----------
11
12``printfrr()`` is FRR's modified version of ``printf()``, designed to make
13life easier when printing nontrivial datastructures. The following variants
14are available:
15
16.. c:function:: ssize_t snprintfrr(char *buf, size_t len, const char *fmt, ...)
17.. c:function:: ssize_t vsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
18
19 These correspond to ``snprintf``/``vsnprintf``. If you pass NULL for buf
20 or 0 for len, no output is written but the return value is still calculated.
21
22 The return value is always the full length of the output, unconstrained by
23 `len`. It does **not** include the terminating ``\0`` character. A
24 malformed format string can result in a ``-1`` return value.
25
26.. c:function:: ssize_t csnprintfrr(char *buf, size_t len, const char *fmt, ...)
27.. c:function:: ssize_t vcsnprintfrr(char *buf, size_t len, const char *fmt, va_list)
28
29 Same as above, but the ``c`` stands for "continue" or "concatenate". The
30 output is appended to the string instead of overwriting it.
31
32.. c:function:: char *asprintfrr(struct memtype *mt, const char *fmt, ...)
33.. c:function:: char *vasprintfrr(struct memtype *mt, const char *fmt, va_list)
34
35 These functions allocate a dynamic buffer (using MTYPE `mt`) and print to
36 that. If the format string is malformed, they return a copy of the format
37 string, so the return value is always non-NULL and always dynamically
38 allocated with `mt`.
39
40.. c:function:: char *asnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, ...)
41.. c:function:: char *vasnprintfrr(struct memtype *mt, char *buf, size_t len, const char *fmt, va_list)
42
43 This variant tries to use the static buffer provided, but falls back to
44 dynamic allocation if it is insufficient.
45
46 The return value can be either `buf` or a newly allocated string using
47 `mt`. You MUST free it like this::
48
49 char *ret = asnprintfrr(MTYPE_FOO, buf, sizeof(buf), ...);
50 if (ret != buf)
51 XFREE(MTYPE_FOO, ret);
52
53Extensions
54^^^^^^^^^^
55
56``printfrr()`` format strings can be extended with suffixes after `%p` or
57`%d`. The following extended format specifiers are available:
58
59+-----------+--------------------------+----------------------------------------------+
60| Specifier | Argument | Output |
61+===========+==========================+==============================================+
62| ``%Lu`` | ``uint64_t`` | ``12345`` |
63+-----------+--------------------------+----------------------------------------------+
64| ``%Ld`` | ``int64_t`` | ``-12345`` |
65+-----------+--------------------------+----------------------------------------------+
66| ``%pI4`` | ``struct in_addr *`` | ``1.2.3.4`` |
67| | | |
68| | ``in_addr_t *`` | |
69+-----------+--------------------------+----------------------------------------------+
70| ``%pI6`` | ``struct in6_addr *`` | ``fe80::1234`` |
71+-----------+--------------------------+----------------------------------------------+
72| ``%pFX`` | ``struct prefix *`` | ``fe80::1234/64`` |
73+-----------+--------------------------+----------------------------------------------+
74| ``%pSG4`` | ``struct prefix_sg *`` | ``(*,1.2.3.4)`` |
75+-----------+--------------------------+----------------------------------------------+
76| ``%pRN`` | ``struct route_node *`` | ``192.168.1.0/24`` (dst-only node) |
77| | | |
78| | | ``2001:db8::/32 from fe80::/64`` (SADR node) |
79+-----------+--------------------------+----------------------------------------------+
80| ``%pNHv`` | ``struct nexthop *`` | ``1.2.3.4, via eth0`` |
81+-----------+--------------------------+----------------------------------------------+
82| ``%pNHs`` | ``struct nexthop *`` | ``1.2.3.4 if 15`` |
83+-----------+--------------------------+----------------------------------------------+
84
85Printf features like field lengths can be used normally with these extensions,
86e.g. ``%-15pI4`` works correctly.
87
88The extension specifier after ``%p`` or ``%d`` is always an uppercase letter;
89by means of established pattern uppercase letters and numbers form the type
90identifier which may be followed by lowercase flags.
91
92You can grep the FRR source for ``printfrr_ext_autoreg`` to see all extended
93printers and what exactly they do. More printers are likely to be added as
94needed/useful, so the list above may become outdated.
95
96``%Ld`` is not an "extension" for printfrr; it's wired directly into the main
97printf logic.
98
99.. note::
100
101 The ``zlog_*``/``flog_*`` and ``vty_out`` functions all use printfrr
102 internally, so these extensions are available there. However, they are
103 **not** available when calling ``snprintf`` directly. You need to call
104 ``snprintfrr`` instead.
105
106AS-Safety
107^^^^^^^^^
108
109``printfrr()`` are AS-Safe under the following conditions:
110
111* the ``[v]as[n]printfrr`` variants are not AS-Safe (allocating memory)
112* floating point specifiers are not AS-Safe (system printf is used for these)
113* the positional ``%1$d`` syntax should not be used (8 arguments are supported
114 while AS-Safe)
115* extensions are only AS-Safe if their printer is AS-Safe
116
3e5d2f71
DL
117Errors and warnings
118-------------------
119
120If it is something that the user will want to look at and maybe do
121something, it is either an **error** or a **warning**.
122
123We're expecting that warnings and errors are in some way visible to the
124user (in the worst case by looking at the log after the network broke, but
125maybe by a syslog collector from all routers.) Therefore, anything that
126needs to get the user in the loop—and only these things—are warnings or
127errors.
128
56f0bea7 129Note that this doesn't necessarily mean the user needs to fix something in
3e5d2f71
DL
130the FRR instance. It also includes when we detect something else needs
131fixing, for example another router, the system we're running on, or the
132configuration. The common point is that the user should probably do
133*something*.
134
135Deciding between a warning and an error is slightly less obvious; the rule
136of thumb here is that an error will cause considerable fallout beyond its
137direct effect. Closing a BGP session due to a malformed update is an error
138since all routes from the peer are dropped; discarding one route because
139its attributes don't make sense is a warning.
140
141This also loosely corresponds to the kind of reaction we're expecting from
142the user. An error is likely to need immediate response while a warning
143might be snoozed for a bit and addressed as part of general maintenance.
144If a problem will self-repair (e.g. by retransmits), it should be a
145warning—unless the impact until that self-repair is very harsh.
146
147Examples for warnings:
148
149* a BGP update, LSA or LSP could not be processed, but operation is
150 proceeding and the broken pieces are likely to self-fix later
151* some kind of controller cannot be reached, but we can work without it
152* another router is using some unknown or unsupported capability
153
154Examples for errors:
155
156* dropping a BGP session due to malformed data
157* a socket for routing protocol operation cannot be opened
158* desynchronization from network state because something went wrong
159* *everything that we as developers would really like to be notified about,
160 i.e. some assumption in the code isn't holding up*
161
162
163Informational messages
164----------------------
165
166Anything that provides introspection to the user during normal operation
167is an **info** message.
168
169This includes all kinds of operational state transitions and events,
170especially if they might be interesting to the user during the course of
171figuring out a warning or an error.
172
173By itself, these messages should mostly be statements of fact. They might
174indicate the order and relationship in which things happened. Also covered
175are conditions that might be "operational issues" like a link failure due
176to an unplugged cable. If it's pretty much the point of running a routing
177daemon for, it's not a warning or an error, just business as usual.
178
179The user should be able to see the state of these bits from operational
180state output, i.e. `show interface` or `show foobar neighbors`. The log
181message indicating the change may have been printed weeks ago, but the
182state can always be viewed. (If some state change has an info message but
183no "show" command, maybe that command needs to be added.)
184
185Examples:
186
187* all kinds of up/down state changes
188
189 * interface coming up or going down
190 * addresses being added or deleted
191 * peers and neighbors coming up or going down
192
193* rejection of some routes due to user-configured route maps
194* backwards compatibility handling because another system on the network
195 has a different or smaller feature set
196
197.. note::
198 The previously used **notify** priority is replaced with *info* in all
199 cases. We don't currently have a well-defined use case for it.
200
201
202Debug messages and asserts
203--------------------------
204
205Everything that is only interesting on-demand, or only while developing,
206is a **debug** message. It might be interesting to the user for a
207particularly evasive issue, but in general these are details that an
208average user might not even be able to make sense of.
209
210Most (or all?) debug messages should be behind a `debug foobar` category
211switch that controls which subset of these messages is currently
212interesting and thus printed. If a debug message doesn't have such a
213guard, there should be a good explanation as to why.
214
215Conversely, debug messages are the only thing that should be guarded by
216these switches. Neither info nor warning or error messages should be
217hidden in this way.
218
219**Asserts** should only be used as pretty crashes. We are expecting that
220asserts remain enabled in production builds, but please try to not use
221asserts in a way that would cause a security problem if the assert wasn't
222there (i.e. don't use them for length checks.)
223
224The purpose of asserts is mainly to help development and bug hunting. If
225the daemon crashes, then having some more information is nice, and the
226assert can provide crucial hints that cut down on the time needed to track
227an issue. That said, if the issue can be reasonably handled and/or isn't
228going to crash the daemon, it shouldn't be an assert.
229
230For anything else where internal constraints are violated but we're not
231breaking due to it, it's an error instead (not a debug.) These require
232"user action" of notifying the developers.
233
234Examples:
235
236* mismatched :code:`prev`/:code:`next` pointers in lists
237* some field that is absolutely needed is :code:`NULL`
238* any other kind of data structure corruption that will cause the daemon
239 to crash sooner or later, one way or another