]>
Commit | Line | Data |
---|---|---|
19b351f1 PNA |
1 | Netfilter's flowtable infrastructure |
2 | ==================================== | |
3 | ||
4 | This documentation describes the software flowtable infrastructure available in | |
5 | Netfilter since Linux kernel 4.16. | |
6 | ||
7 | Overview | |
8 | -------- | |
9 | ||
10 | Initial packets follow the classic forwarding path, once the flow enters the | |
11 | established state according to the conntrack semantics (ie. we have seen traffic | |
12 | in both directions), then you can decide to offload the flow to the flowtable | |
13 | from the forward chain via the 'flow offload' action available in nftables. | |
14 | ||
15 | Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the | |
16 | output netdevice via neigh_xmit(), hence, they bypass the classic forwarding | |
17 | path (the visible effect is that you do not see these packets from any of the | |
18 | netfilter hooks coming after the ingress). In case of flowtable miss, the packet | |
19 | follows the classic forward path. | |
20 | ||
21 | The flowtable uses a resizable hashtable, lookups are based on the following | |
22 | 7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source | |
23 | and destination ports and the input interface (useful in case there are several | |
24 | conntrack zones in place). | |
25 | ||
26 | Flowtables are populated via the 'flow offload' nftables action, so the user can | |
27 | selectively specify what flows are placed into the flow table. Hence, packets | |
28 | follow the classic forwarding path unless the user explicitly instruct packets | |
29 | to use this new alternative forwarding path via nftables policy. | |
30 | ||
31 | This is represented in Fig.1, which describes the classic forwarding path | |
32 | including the Netfilter hooks and the flowtable fastpath bypass. | |
33 | ||
34 | userspace process | |
35 | ^ | | |
36 | | | | |
37 | _____|____ ____\/___ | |
38 | / \ / \ | |
39 | | input | | output | | |
40 | \__________/ \_________/ | |
41 | ^ | | |
42 | | | | |
43 | _________ __________ --------- _____\/_____ | |
44 | / \ / \ |Routing | / \ | |
45 | --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit | |
46 | \_________/ \__________/ ---------- \____________/ ^ | |
7c9abe12 PNA |
47 | | ^ | ^ | |
48 | flowtable | ____\/___ | | | |
49 | | | / \ | | | |
50 | __\/___ | | forward |------------ | | |
19b351f1 PNA |
51 | |-----| | \_________/ | |
52 | |-----| | 'flow offload' rule | | |
53 | |-----| | adds entry to | | |
54 | |_____| | flowtable | | |
55 | | | | | |
56 | / \ | | | |
57 | /hit\_no_| | | |
58 | \ ? / | | |
59 | \ / | | |
60 | |__yes_________________fastpath bypass ____________________________| | |
61 | ||
62 | Fig.1 Netfilter hooks and flowtable interactions | |
63 | ||
64 | The flowtable entry also stores the NAT configuration, so all packets are | |
65 | mangled according to the NAT policy that matches the initial packets that went | |
66 | through the classic forwarding path. The TTL is decremented before calling | |
67 | neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding | |
68 | path given that the transport selectors are missing, therefore flowtable lookup | |
69 | is not possible. | |
70 | ||
71 | Example configuration | |
72 | --------------------- | |
73 | ||
74 | Enabling the flowtable bypass is relatively easy, you only need to create a | |
75 | flowtable and add one rule to your forward chain. | |
76 | ||
77 | table inet x { | |
78 | flowtable f { | |
78e06cf4 | 79 | hook ingress priority 0; devices = { eth0, eth1 }; |
19b351f1 PNA |
80 | } |
81 | chain y { | |
82 | type filter hook forward priority 0; policy accept; | |
83 | ip protocol tcp flow offload @f | |
84 | counter packets 0 bytes 0 | |
85 | } | |
86 | } | |
87 | ||
88 | This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1 | |
89 | netdevices. You can create as many flowtables as you want in case you need to | |
90 | perform resource partitioning. The flowtable priority defines the order in which | |
91 | hooks are run in the pipeline, this is convenient in case you already have a | |
92 | nftables ingress chain (make sure the flowtable priority is smaller than the | |
93 | nftables ingress chain hence the flowtable runs before in the pipeline). | |
94 | ||
95 | The 'flow offload' action from the forward chain 'y' adds an entry to the | |
96 | flowtable for the TCP syn-ack packet coming in the reply direction. Once the | |
97 | flow is offloaded, you will observe that the counter rule in the example above | |
98 | does not get updated for the packets that are being forwarded through the | |
99 | forwarding bypass. | |
100 | ||
101 | More reading | |
102 | ------------ | |
103 | ||
104 | This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also | |
105 | made a very complete and comprehensive summary called "A state of network | |
106 | acceleration" that describes how things were before this infrastructure was | |
107 | mailined [3] and it also makes a rough summary of this work [4]. | |
108 | ||
109 | [1] https://lwn.net/Articles/738214/ | |
110 | [2] https://lwn.net/Articles/742164/ | |
111 | [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html | |
112 | [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html |