]>
Commit | Line | Data |
---|---|---|
87d38317 DL |
1 | .. _xrefs: |
2 | ||
3 | Introspection (xrefs) | |
4 | ===================== | |
5 | ||
6 | The FRR library provides an introspection facility called "xrefs." The intent | |
7 | is to provide structured access to annotated entities in the compiled binary, | |
8 | such as log messages and thread scheduling calls. | |
9 | ||
10 | Enabling and use | |
11 | ---------------- | |
12 | ||
13 | Support for emitting an xref is included in the macros for the specific | |
14 | entities, e.g. :c:func:`zlog_info` contains the relevant statements. The only | |
15 | requirement for the system to work is a GNU compatible linker that supports | |
16 | section start/end symbols. (The only known linker on any system FRR supports | |
17 | that does not do this is the Solaris linker.) | |
18 | ||
19 | To verify xrefs have been included in a binary or dynamic library, run | |
20 | ``readelf -n binary``. For individual object files, it's | |
21 | ``readelf -S object.o | grep xref_array`` instead. | |
22 | ||
87d38317 DL |
23 | Structure and contents |
24 | ---------------------- | |
25 | ||
26 | As a slight improvement to security and fault detection, xrefs are divided into | |
27 | a ``const struct xref *`` and an optional ``struct xrefdata *``. The required | |
28 | const part contains: | |
29 | ||
30 | .. c:member:: enum xref_type xref.type | |
31 | ||
32 | Identifies what kind of object the xref points to. | |
33 | ||
34 | .. c:member:: int line | |
35 | .. c:member:: const char *xref.file | |
36 | .. c:member:: const char *xref.func | |
37 | ||
38 | Source code location of the xref. ``func`` will be ``<global>`` for | |
39 | xrefs outside of a function. | |
40 | ||
41 | .. c:member:: struct xrefdata *xref.xrefdata | |
42 | ||
43 | The optional writable part of the xref. NULL if no non-const part exists. | |
44 | ||
45 | The optional non-const part has: | |
46 | ||
47 | .. c:member:: const struct xref *xrefdata.xref | |
48 | ||
49 | Pointer back to the constant part. Since circular pointers are close to | |
50 | impossible to emit from inside a function body's static variables, this | |
51 | is initialized at startup. | |
52 | ||
53 | .. c:member:: char xrefdata.uid[16] | |
54 | ||
55 | Unique identifier, see below. | |
56 | ||
57 | .. c:member:: const char *xrefdata.hashstr | |
58 | .. c:member:: uint32_t xrefdata.hashu32[2] | |
59 | ||
60 | Input to unique identifier calculation. These should encompass all | |
61 | details needed to make an xref unique. If more than one string should | |
62 | be considered, use string concatenation for the initializer. | |
63 | ||
64 | Both structures can be extended by embedding them in a larger type-specific | |
65 | struct, e.g. ``struct xref_logmsg *``. | |
66 | ||
67 | Unique identifiers | |
68 | ------------------ | |
69 | ||
70 | All xrefs that have a writable ``struct xrefdata *`` part are assigned an | |
71 | unique identifier, which is formed as base32 (crockford) SHA256 on: | |
72 | ||
73 | - the source filename | |
74 | - the ``hashstr`` field | |
75 | - the ``hashu32`` fields | |
76 | ||
77 | .. note:: | |
78 | ||
79 | Function names and line numbers are intentionally not included to allow | |
80 | moving items within a file without affecting the identifier. | |
81 | ||
82 | For running executables, this hash is calculated once at startup. When | |
83 | directly reading from an ELF file with external tooling, the value must be | |
84 | calculated when necessary. | |
85 | ||
86 | The identifiers have the form ``AXXXX-XXXXX`` where ``X`` is | |
87 | ``0-9, A-Z except I,L,O,U`` and ``A`` is ``G-Z except I,L,O,U`` (i.e. the | |
88 | identifiers always start with a letter.) When reading identifiers from user | |
89 | input, ``I`` and ``L`` should be replaced with ``1`` and ``O`` should be | |
90 | replaced with ``0``. There are 49 bits of entropy in this identifier. | |
91 | ||
92 | Underlying machinery | |
93 | -------------------- | |
94 | ||
95 | Xrefs are nothing other than global variables with some extra glue to make | |
96 | them possible to find from the outside by looking at the binary. The first | |
97 | non-obvious part is that they can occur inside of functions, since they're | |
98 | defined as ``static``. They don't have a visible name -- they don't need one. | |
99 | ||
100 | To make finding these variables possible, another global variable, a pointer | |
101 | to the first one, is created in the same way. However, it is put in a special | |
102 | ELF section through ``__attribute__((section("xref_array")))``. This is the | |
103 | section you can see with readelf. | |
104 | ||
105 | Finally, on the level of a whole executable or library, the linker will stuff | |
106 | the individual pointers consecutive to each other since they're in the same | |
107 | section — hence the array. Start and end of this array is given by the | |
108 | linker-autogenerated ``__start_xref_array`` and ``__stop_xref_array`` symbols. | |
109 | Using these, both a constructor to run at startup as well as an ELF note are | |
110 | created. | |
111 | ||
112 | The ELF note is the entrypoint for externally retrieving xrefs from a binary | |
113 | without having to run it. It can be found by walking through the ELF data | |
114 | structures even if the binary has been fully stripped of debug and section | |
115 | information. SystemTap's SDT probes & LTTng's trace points work in the same | |
116 | way (though they emit 1 note for each probe, while xrefs only emit one note | |
117 | in total which refers to the array.) Using xrefs does not impact SystemTap | |
118 | or LTTng, the notes have identifiers they can be distinguished by. | |
119 | ||
120 | The ELF structure of a linked binary (library or executable) will look like | |
121 | this:: | |
122 | ||
123 | $ readelf --wide -l -n lib/.libs/libfrr.so | |
124 | ||
125 | Elf file type is DYN (Shared object file) | |
126 | Entry point 0x67d21 | |
127 | There are 12 program headers, starting at offset 64 | |
128 | ||
129 | Program Headers: | |
130 | Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align | |
131 | PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002a0 0x0002a0 R 0x8 | |
132 | INTERP 0x125560 0x0000000000125560 0x0000000000125560 0x00001c 0x00001c R 0x10 | |
133 | [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] | |
134 | LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x02aff0 0x02aff0 R 0x1000 | |
135 | LOAD 0x02b000 0x000000000002b000 0x000000000002b000 0x0b2889 0x0b2889 R E 0x1000 | |
136 | LOAD 0x0de000 0x00000000000de000 0x00000000000de000 0x070048 0x070048 R 0x1000 | |
137 | LOAD 0x14e428 0x000000000014f428 0x000000000014f428 0x00fb70 0x01a2b8 RW 0x1000 | |
138 | DYNAMIC 0x157a40 0x0000000000158a40 0x0000000000158a40 0x000270 0x000270 RW 0x8 | |
139 | NOTE 0x0002e0 0x00000000000002e0 0x00000000000002e0 0x00004c 0x00004c R 0x4 | |
140 | TLS 0x14e428 0x000000000014f428 0x000000000014f428 0x000000 0x000008 R 0x8 | |
141 | GNU_EH_FRAME 0x12557c 0x000000000012557c 0x000000000012557c 0x00819c 0x00819c R 0x4 | |
142 | GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 | |
143 | GNU_RELRO 0x14e428 0x000000000014f428 0x000000000014f428 0x009bd8 0x009bd8 R 0x1 | |
144 | ||
145 | (...) | |
146 | ||
147 | Displaying notes found in: .note.gnu.build-id | |
148 | Owner Data size Description | |
149 | GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 6a1f66be38b523095ebd6ec13cc15820cede903d | |
150 | ||
151 | Displaying notes found in: .note.FRR | |
152 | Owner Data size Description | |
153 | FRRouting 0x00000010 Unknown note type: (0x46455258) description data: 6c eb 15 00 00 00 00 00 74 ec 15 00 00 00 00 00 | |
154 | ||
155 | Where 0x15eb6c…0x15ec74 are the offsets (relative to the note itself) where | |
156 | the xref array is in the file. Also note the owner is clearly marked as | |
157 | "FRRouting" and the type is "XREF" in hex. | |
158 | ||
159 | For SystemTap's use of ELF notes, refer to | |
160 | https://libstapsdt.readthedocs.io/en/latest/how-it-works/internals.html as an | |
161 | entry point. | |
08a73c42 DL |
162 | |
163 | .. note:: | |
164 | ||
165 | Due to GCC bug 41091, the "xref_array" section is not correctly generated | |
166 | for C++ code when compiled by GCC. A workaround is present for runtime | |
167 | functionality, but to extract the xrefs from a C++ source file, it needs | |
168 | to be built with clang (or a future fixed version of GCC) instead. | |
9e6c14a4 DL |
169 | |
170 | Extraction tool | |
171 | --------------- | |
172 | ||
173 | The FRR source contains a matching tool to extract xref data from compiled ELF | |
174 | binaries in ``python/xrelfo.py``. This tool uses CPython extensions | |
175 | implemented in ``clippy`` and must therefore be executed with that. | |
176 | ||
177 | ``xrelfo.py`` processes input from one or more ELF file (.o, .so, executable), | |
178 | libtool object (.lo, .la, executable wrapper script) or JSON (output from | |
179 | ``xrelfo.py``) and generates an output JSON file. During standard FRR build, | |
180 | it is invoked on all binaries and libraries and the result is combined into | |
181 | ``frr.json``. | |
182 | ||
183 | ELF files from any operating system, CPU architecture and endianness can be | |
184 | processed on any host. Any issues with this are bugs in ``xrelfo.py`` | |
185 | (or clippy's ELF code.) | |
186 | ||
187 | ``xrelfo.py`` also performs some sanity checking, particularly on log | |
188 | messages. The following options are available: | |
189 | ||
190 | .. option:: -o OUTPUT | |
191 | ||
192 | Filename to write JSON output to. As a convention, a ``.xref`` filename | |
193 | extension is used. | |
194 | ||
195 | .. option:: -Wlog-format | |
196 | ||
197 | Performs extra checks on log message format strings, particularly checks | |
198 | for ``\t`` and ``\n`` characters (which should not be used in log messages). | |
199 | ||
200 | .. option:: -Wlog-args | |
201 | ||
202 | Generates cleanup hints for format string arguments where | |
203 | :c:func:`printfrr()` extensions could be used, e.g. replacing ``inet_ntoa`` | |
204 | with ``%pI4``. | |
205 | ||
206 | .. option:: --profile | |
207 | ||
208 | Runs the Python profiler to identify hotspots in the ``xrelfo.py`` code. | |
209 | ||
210 | ``xrelfo.py`` uses information about C structure definitions saved in | |
211 | ``python/xrefstructs.json``. This file is included with the FRR sources and | |
212 | only needs to be regenerated when some of the ``struct xref_*`` definitions | |
213 | are changed (which should be almost never). The file is written by | |
214 | ``python/tiabwarfo.py``, which uses ``pahole`` to extract the necessary data | |
215 | from DWARF information. |