]>
Commit | Line | Data |
---|---|---|
5eed7898 SF |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
80695946 SF |
3 | ============================ |
4 | BPF_PROG_TYPE_FLOW_DISSECTOR | |
5 | ============================ | |
5eed7898 SF |
6 | |
7 | Overview | |
8 | ======== | |
9 | ||
10 | Flow dissector is a routine that parses metadata out of the packets. It's | |
11 | used in the various places in the networking subsystem (RFS, flow hash, etc). | |
12 | ||
13 | BPF flow dissector is an attempt to reimplement C-based flow dissector logic | |
14 | in BPF to gain all the benefits of BPF verifier (namely, limits on the | |
15 | number of instructions and tail calls). | |
16 | ||
17 | API | |
18 | === | |
19 | ||
20 | BPF flow dissector programs operate on an ``__sk_buff``. However, only the | |
21 | limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. | |
22 | ``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input | |
23 | and output arguments. | |
24 | ||
25 | The inputs are: | |
26 | * ``nhoff`` - initial offset of the networking header | |
27 | * ``thoff`` - initial offset of the transport header, initialized to nhoff | |
28 | * ``n_proto`` - L3 protocol type, parsed out of L2 header | |
1ac6b126 | 29 | * ``flags`` - optional flags |
5eed7898 SF |
30 | |
31 | Flow dissector BPF program should fill out the rest of the ``struct | |
32 | bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be | |
33 | also adjusted accordingly. | |
34 | ||
35 | The return code of the BPF program is either BPF_OK to indicate successful | |
36 | dissection, or BPF_DROP to indicate parsing error. | |
37 | ||
38 | __sk_buff->data | |
39 | =============== | |
40 | ||
41 | In the VLAN-less case, this is what the initial state of the BPF flow | |
42 | dissector looks like:: | |
43 | ||
44 | +------+------+------------+-----------+ | |
45 | | DMAC | SMAC | ETHER_TYPE | L3_HEADER | | |
46 | +------+------+------------+-----------+ | |
47 | ^ | |
48 | | | |
49 | +-- flow dissector starts here | |
50 | ||
51 | ||
52 | .. code:: c | |
53 | ||
54 | skb->data + flow_keys->nhoff point to the first byte of L3_HEADER | |
55 | flow_keys->thoff = nhoff | |
56 | flow_keys->n_proto = ETHER_TYPE | |
57 | ||
58 | In case of VLAN, flow dissector can be called with the two different states. | |
59 | ||
60 | Pre-VLAN parsing:: | |
61 | ||
62 | +------+------+------+-----+-----------+-----------+ | |
63 | | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | | |
64 | +------+------+------+-----+-----------+-----------+ | |
65 | ^ | |
66 | | | |
67 | +-- flow dissector starts here | |
68 | ||
69 | .. code:: c | |
70 | ||
71 | skb->data + flow_keys->nhoff point the to first byte of TCI | |
72 | flow_keys->thoff = nhoff | |
73 | flow_keys->n_proto = TPID | |
74 | ||
75 | Please note that TPID can be 802.1AD and, hence, BPF program would | |
76 | have to parse VLAN information twice for double tagged packets. | |
77 | ||
78 | ||
79 | Post-VLAN parsing:: | |
80 | ||
81 | +------+------+------+-----+-----------+-----------+ | |
82 | | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | | |
83 | +------+------+------+-----+-----------+-----------+ | |
84 | ^ | |
85 | | | |
86 | +-- flow dissector starts here | |
87 | ||
88 | .. code:: c | |
89 | ||
90 | skb->data + flow_keys->nhoff point the to first byte of L3_HEADER | |
91 | flow_keys->thoff = nhoff | |
92 | flow_keys->n_proto = ETHER_TYPE | |
93 | ||
94 | In this case VLAN information has been processed before the flow dissector | |
95 | and BPF flow dissector is not required to handle it. | |
96 | ||
97 | ||
98 | The takeaway here is as follows: BPF flow dissector program can be called with | |
99 | the optional VLAN header and should gracefully handle both cases: when single | |
100 | or double VLAN is present and when it is not present. The same program | |
101 | can be called for both cases and would have to be written carefully to | |
102 | handle both cases. | |
103 | ||
104 | ||
1ac6b126 SF |
105 | Flags |
106 | ===== | |
107 | ||
108 | ``flow_keys->flags`` might contain optional input flags that work as follows: | |
109 | ||
110 | * ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to | |
111 | continue parsing first fragment; the default expected behavior is that | |
112 | flow dissector returns as soon as it finds out that the packet is fragmented; | |
113 | used by ``eth_get_headlen`` to estimate length of all headers for GRO. | |
114 | * ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to | |
115 | stop parsing as soon as it reaches IPv6 flow label; used by | |
116 | ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash. | |
117 | * ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop | |
118 | parsing as soon as it reaches encapsulated headers; used by routing | |
119 | infrastructure. | |
120 | ||
121 | ||
5eed7898 SF |
122 | Reference Implementation |
123 | ======================== | |
124 | ||
125 | See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference | |
126 | implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` | |
127 | for the loader. bpftool can be used to load BPF flow dissector program as well. | |
128 | ||
129 | The reference implementation is organized as follows: | |
130 | * ``jmp_table`` map that contains sub-programs for each supported L3 protocol | |
131 | * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and | |
132 | does ``bpf_tail_call`` to the appropriate L3 handler | |
133 | ||
134 | Since BPF at this point doesn't support looping (or any jumping back), | |
135 | jmp_table is used instead to handle multiple levels of encapsulation (and | |
136 | IPv6 options). | |
137 | ||
138 | ||
139 | Current Limitations | |
140 | =================== | |
141 | BPF flow dissector doesn't support exporting all the metadata that in-kernel | |
142 | C-based implementation can export. Notable example is single VLAN (802.1Q) | |
143 | and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` | |
144 | for a set of information that's currently can be exported from the BPF context. | |
a11c397c SF |
145 | |
146 | When BPF flow dissector is attached to the root network namespace (machine-wide | |
147 | policy), users can't override it in their child network namespaces. |