[ceph.git] / ceph / src / seastar / dpdk / doc / guides / prog_guide / lpm_lib.rst

..  SPDX-License-Identifier: BSD-3-Clause
    Copyright(c) 2010-2014 Intel Corporation.

.. _LPM_Library:

LPM Library
===========

The DPDK LPM library component implements the Longest Prefix Match (LPM) table search method for 32-bit keys
that is typically used to find the best route match in IP forwarding applications.

LPM API Overview
----------------

The main configuration parameter for LPM component instances is the maximum number of rules to support.
An LPM prefix is represented by a pair of parameters (32- bit key, depth), with depth in the range of 1 to 32.
An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
The prefix serves as the unique identifier of the LPM rule.
In this implementation, the user data is 1-byte long and is called next hop,
in correlation with its main use of storing the ID of the next hop in a routing table entry.

The main methods exported by the LPM component are:

*   Add LPM rule: The LPM rule is provided as input.
    If there is no rule with the same prefix present in the table, then the new rule is added to the LPM table.
    If a rule with the same prefix is already present in the table, the next hop of the rule is updated.
    An error is returned when there is no available rule space left.

*   Delete LPM rule: The prefix of the LPM rule is provided as input.
    If a rule with the specified prefix is present in the LPM table, then it is removed.

*   Lookup LPM key: The 32-bit key is provided as input.
    The algorithm selects the rule that represents the best match for the given key and returns the next hop of that rule.
    In the case that there are multiple rules present in the LPM table that have the same 32-bit key,
    the algorithm picks the rule with the highest depth as the best match rule,
    which means that the rule has the highest number of most significant bits matching between the input key and the rule key.

.. _lpm4_details:

Implementation Details
----------------------

The current implementation uses a variation of the DIR-24-8 algorithm that trades memory usage for improved LPM lookup speed.
The algorithm allows the lookup operation to be performed with typically a single memory read access.
In the statistically rare case when the best match rule is having a depth bigger than 24,
the lookup operation requires two memory read accesses.
Therefore, the performance of the LPM lookup operation is greatly influenced by
whether the specific memory location is present in the processor cache or not.

The main data structure is built using the following elements:

*   A table with 2^24 entries.

*   A number of tables (RTE_LPM_TBL8_NUM_GROUPS) with 2^8 entries.

The first table, called tbl24, is indexed using the first 24 bits of the IP address to be looked up,
while the second table(s), called tbl8, is indexed using the last 8 bits of the IP address.
This means that depending on the outcome of trying to match the IP address of an incoming packet to the rule stored in the tbl24
we might need to continue the lookup process in the second level.

Since every entry of the tbl24 can potentially point to a tbl8, ideally, we would have 2^24 tbl8s,
which would be the same as having a single table with 2^32 entries.
This is not feasible due to resource restrictions.
Instead, this approach takes advantage of the fact that rules longer than 24 bits are very rare.
By splitting the process in two different tables/levels and limiting the number of tbl8s,
we can greatly reduce memory consumption while maintaining a very good lookup speed (one memory access, most of the times).


.. figure:: img/tbl24_tbl8.*

   Table split into different levels


An entry in tbl24 contains the following fields:

*   next hop / index to the tbl8

*   valid flag

*   external entry flag

*   depth of the rule (length)

The first field can either contain a number indicating the tbl8 in which the lookup process should continue
or the next hop itself if the longest prefix match has already been found.
The two flags are used to determine whether the entry is valid or not and
whether the search process have finished or not respectively.
The depth or length of the rule is the number of bits of the rule that is stored in a specific entry.

An entry in a tbl8 contains the following fields:

*   next hop

*   valid

*   valid group

*   depth

Next hop and depth contain the same information as in the tbl24.
The two flags show whether the entry and the table are valid respectively.

The other main data structure is a table containing the main information about the rules (IP and next hop).
This is a higher level table, used for different things:

*   Check whether a rule already exists or not, prior to addition or deletion,
    without having to actually perform a lookup.

*   When deleting, to check whether there is a rule containing the one that is to be deleted.
    This is important, since the main data structure will have to be updated accordingly.

Addition
~~~~~~~~

When adding a rule, there are different possibilities.
If the rule's depth is exactly 24 bits, then:

*   Use the rule (IP address) as an index to the tbl24.

*   If the entry is invalid (i.e. it doesn't already contain a rule) then set its next hop to its value,
    the valid flag to 1 (meaning this entry is in use),
    and the external entry flag to 0
    (meaning the lookup process ends at this point, since this is the longest prefix that matches).

If the rule's depth is exactly 32 bits, then:

*   Use the first 24 bits of the rule as an index to the tbl24.

*   If the entry is invalid (i.e. it doesn't already contain a rule) then look for a free tbl8,
    set the index to the tbl8 to this value,
    the valid flag to 1 (meaning this entry is in use), and the external entry flag to 1
    (meaning the lookup process must continue since the rule hasn't been explored completely).

If the rule's depth is any other value, prefix expansion must be performed.
This means the rule is copied to all the entries (as long as they are not in use) which would also cause a match.

As a simple example, let's assume the depth is 20 bits.
This means that there are 2^(24 - 20) = 16 different combinations of the first 24 bits of an IP address that
would cause a match.
Hence, in this case, we copy the exact same entry to every position indexed by one of these combinations.

By doing this we ensure that during the lookup process, if a rule matching the IP address exists,
it is found in either one or two memory accesses,
depending on whether we need to move to the next table or not.
Prefix expansion is one of the keys of this algorithm,
since it improves the speed dramatically by adding redundancy.

Lookup
~~~~~~

The lookup process is much simpler and quicker. In this case:

*   Use the first 24 bits of the IP address as an index to the tbl24.
    If the entry is not in use, then it means we don't have a rule matching this IP.
    If it is valid and the external entry flag is set to 0, then the next hop is returned.

*   If it is valid and the external entry flag is set to 1,
    then we use the tbl8 index to find out the tbl8 to be checked,
    and the last 8 bits of the IP address as an index to this table.
    Similarly, if the entry is not in use, then we don't have a rule matching this IP address.
    If it is valid then the next hop is returned.

Limitations in the Number of Rules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are different things that limit the number of rules that can be added.
The first one is the maximum number of rules, which is a parameter passed through the API.
Once this number is reached,
it is not possible to add any more rules to the routing table unless one or more are removed.

The second reason is an intrinsic limitation of the algorithm.
As explained before, to avoid high memory consumption, the number of tbl8s is limited in compilation time
(this value is by default 256).
If we exhaust tbl8s, we won't be able to add any more rules.
How many of them are necessary for a specific routing table is hard to determine in advance.

A tbl8 is consumed whenever we have a new rule with depth bigger than 24,
and the first 24 bits of this rule are not the same as the first 24 bits of a rule previously added.
If they are, then the new rule will share the same tbl8 than the previous one,
since the only difference between the two rules is within the last byte.

With the default value of 256, we can have up to 256 rules longer than 24 bits that differ on their first three bytes.
Since routes longer than 24 bits are unlikely, this shouldn't be a problem in most setups.
Even if it is, however, the number of tbl8s can be modified.

Use Case: IPv4 Forwarding
~~~~~~~~~~~~~~~~~~~~~~~~~

The LPM algorithm is used to implement Classless Inter-Domain Routing (CIDR) strategy used by routers implementing IPv4 forwarding.

References
~~~~~~~~~~

*   RFC1519 Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy,
    `http://www.ietf.org/rfc/rfc1519 <http://www.ietf.org/rfc/rfc1519>`_

*   Pankaj Gupta, Algorithms for Routing Lookups and Packet Classification, PhD Thesis, Stanford University,
    2000  (`http://klamath.stanford.edu/~pankaj/thesis/thesis_1sided.pdf <http://klamath.stanford.edu/~pankaj/thesis/thesis_1sided.pdf>`_ )
Commit	Line	Data
9f95a23c TL	1	.. SPDX-License-Identifier: BSD-3-Clause
9f95a23c TL	2	Copyright(c) 2010-2014 Intel Corporation.
7c673cae FG	3
	4	.. _LPM_Library:
	5
	6	LPM Library
	7	===========
	8
	9	The DPDK LPM library component implements the Longest Prefix Match (LPM) table search method for 32-bit keys
	10	that is typically used to find the best route match in IP forwarding applications.
	11
	12	LPM API Overview
	13	----------------
	14
	15	The main configuration parameter for LPM component instances is the maximum number of rules to support.
	16	An LPM prefix is represented by a pair of parameters (32- bit key, depth), with depth in the range of 1 to 32.
	17	An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
	18	The prefix serves as the unique identifier of the LPM rule.
	19	In this implementation, the user data is 1-byte long and is called next hop,
	20	in correlation with its main use of storing the ID of the next hop in a routing table entry.
	21
	22	The main methods exported by the LPM component are:
	23
	24	* Add LPM rule: The LPM rule is provided as input.
	25	If there is no rule with the same prefix present in the table, then the new rule is added to the LPM table.
	26	If a rule with the same prefix is already present in the table, the next hop of the rule is updated.
	27	An error is returned when there is no available rule space left.
	28
	29	* Delete LPM rule: The prefix of the LPM rule is provided as input.
	30	If a rule with the specified prefix is present in the LPM table, then it is removed.
	31
	32	* Lookup LPM key: The 32-bit key is provided as input.
	33	The algorithm selects the rule that represents the best match for the given key and returns the next hop of that rule.
	34	In the case that there are multiple rules present in the LPM table that have the same 32-bit key,
	35	the algorithm picks the rule with the highest depth as the best match rule,
	36	which means that the rule has the highest number of most significant bits matching between the input key and the rule key.
	37
	38	.. _lpm4_details:
	39
	40	Implementation Details
	41	----------------------
	42
	43	The current implementation uses a variation of the DIR-24-8 algorithm that trades memory usage for improved LPM lookup speed.
	44	The algorithm allows the lookup operation to be performed with typically a single memory read access.
	45	In the statistically rare case when the best match rule is having a depth bigger than 24,
	46	the lookup operation requires two memory read accesses.
	47	Therefore, the performance of the LPM lookup operation is greatly influenced by
	48	whether the specific memory location is present in the processor cache or not.
	49
	50	The main data structure is built using the following elements:
	51
	52	* A table with 2^24 entries.
	53
	54	* A number of tables (RTE_LPM_TBL8_NUM_GROUPS) with 2^8 entries.
	55
	56	The first table, called tbl24, is indexed using the first 24 bits of the IP address to be looked up,
	57	while the second table(s), called tbl8, is indexed using the last 8 bits of the IP address.
	58	This means that depending on the outcome of trying to match the IP address of an incoming packet to the rule stored in the tbl24
	59	we might need to continue the lookup process in the second level.
	60
	61	Since every entry of the tbl24 can potentially point to a tbl8, ideally, we would have 2^24 tbl8s,
	62	which would be the same as having a single table with 2^32 entries.
	63	This is not feasible due to resource restrictions.
	64	Instead, this approach takes advantage of the fact that rules longer than 24 bits are very rare.
	65	By splitting the process in two different tables/levels and limiting the number of tbl8s,
	66	we can greatly reduce memory consumption while maintaining a very good lookup speed (one memory access, most of the times).
67
68
69	.. figure:: img/tbl24_tbl8.*
70
71	Table split into different levels
72
73
74	An entry in tbl24 contains the following fields:
75
76	* next hop / index to the tbl8
77
78	* valid flag
79
80	* external entry flag
81
82	* depth of the rule (length)
83
84	The first field can either contain a number indicating the tbl8 in which the lookup process should continue
85	or the next hop itself if the longest prefix match has already been found.
86	The two flags are used to determine whether the entry is valid or not and
87	whether the search process have finished or not respectively.
88	The depth or length of the rule is the number of bits of the rule that is stored in a specific entry.
89
90	An entry in a tbl8 contains the following fields:
91
92	* next hop
93
94	* valid
95
96	* valid group
97
98	* depth
99
100	Next hop and depth contain the same information as in the tbl24.
101	The two flags show whether the entry and the table are valid respectively.
102
103	The other main data structure is a table containing the main information about the rules (IP and next hop).
104	This is a higher level table, used for different things:
105
106	* Check whether a rule already exists or not, prior to addition or deletion,
107	without having to actually perform a lookup.
108
109	* When deleting, to check whether there is a rule containing the one that is to be deleted.
110	This is important, since the main data structure will have to be updated accordingly.
111
112	Addition
113	~~~~~~~~
114
115	When adding a rule, there are different possibilities.
116	If the rule's depth is exactly 24 bits, then:
117
118	* Use the rule (IP address) as an index to the tbl24.
119
120	* If the entry is invalid (i.e. it doesn't already contain a rule) then set its next hop to its value,
121	the valid flag to 1 (meaning this entry is in use),
122	and the external entry flag to 0
123	(meaning the lookup process ends at this point, since this is the longest prefix that matches).
124
125	If the rule's depth is exactly 32 bits, then:
126
127	* Use the first 24 bits of the rule as an index to the tbl24.
128
129	* If the entry is invalid (i.e. it doesn't already contain a rule) then look for a free tbl8,
130	set the index to the tbl8 to this value,
131	the valid flag to 1 (meaning this entry is in use), and the external entry flag to 1
132	(meaning the lookup process must continue since the rule hasn't been explored completely).
133
134	If the rule's depth is any other value, prefix expansion must be performed.
135	This means the rule is copied to all the entries (as long as they are not in use) which would also cause a match.
136
137	As a simple example, let's assume the depth is 20 bits.
138	This means that there are 2^(24 - 20) = 16 different combinations of the first 24 bits of an IP address that
139	would cause a match.
140	Hence, in this case, we copy the exact same entry to every position indexed by one of these combinations.
141
142	By doing this we ensure that during the lookup process, if a rule matching the IP address exists,
143	it is found in either one or two memory accesses,
144	depending on whether we need to move to the next table or not.
145	Prefix expansion is one of the keys of this algorithm,
146	since it improves the speed dramatically by adding redundancy.
147
148	Lookup
149	~~~~~~
150
151	The lookup process is much simpler and quicker. In this case:
152
153	* Use the first 24 bits of the IP address as an index to the tbl24.
154	If the entry is not in use, then it means we don't have a rule matching this IP.
155	If it is valid and the external entry flag is set to 0, then the next hop is returned.
156
157	* If it is valid and the external entry flag is set to 1,
158	then we use the tbl8 index to find out the tbl8 to be checked,
159	and the last 8 bits of the IP address as an index to this table.
160	Similarly, if the entry is not in use, then we don't have a rule matching this IP address.
161	If it is valid then the next hop is returned.
162
163	Limitations in the Number of Rules
164	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
165
166	There are different things that limit the number of rules that can be added.
167	The first one is the maximum number of rules, which is a parameter passed through the API.
168	Once this number is reached,
169	it is not possible to add any more rules to the routing table unless one or more are removed.
170
171	The second reason is an intrinsic limitation of the algorithm.
172	As explained before, to avoid high memory consumption, the number of tbl8s is limited in compilation time
173	(this value is by default 256).
174	If we exhaust tbl8s, we won't be able to add any more rules.
175	How many of them are necessary for a specific routing table is hard to determine in advance.
176
177	A tbl8 is consumed whenever we have a new rule with depth bigger than 24,
178	and the first 24 bits of this rule are not the same as the first 24 bits of a rule previously added.
179	If they are, then the new rule will share the same tbl8 than the previous one,
180	since the only difference between the two rules is within the last byte.
181
182	With the default value of 256, we can have up to 256 rules longer than 24 bits that differ on their first three bytes.
183	Since routes longer than 24 bits are unlikely, this shouldn't be a problem in most setups.
184	Even if it is, however, the number of tbl8s can be modified.
185
186	Use Case: IPv4 Forwarding
187	~~~~~~~~~~~~~~~~~~~~~~~~~
188
189	The LPM algorithm is used to implement Classless Inter-Domain Routing (CIDR) strategy used by routers implementing IPv4 forwarding.
190
191	References
192	~~~~~~~~~~
193
194	* RFC1519 Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy,
195	`http://www.ietf.org/rfc/rfc1519 <http://www.ietf.org/rfc/rfc1519>`_
196
197	* Pankaj Gupta, Algorithms for Routing Lookups and Packet Classification, PhD Thesis, Stanford University,
9f95a23c	198	2000 (`http://klamath.stanford.edu/~pankaj/thesis/thesis_1sided.pdf <http://klamath.stanford.edu/~pankaj/thesis/thesis_1sided.pdf>`_ )