[ceph.git] / ceph / src / dpdk / doc / guides / prog_guide / mempool_lib.rst

..  BSD LICENSE
    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:

    * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.
    * Neither the name of Intel Corporation nor the names of its
    contributors may be used to endorse or promote products derived
    from this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

.. _Mempool_Library:

Mempool Library
===============

A memory pool is an allocator of a fixed-sized object.
In the DPDK, it is identified by name and uses a mempool handler to store free objects.
The default mempool handler is ring based.
It provides some other optional services such as a per-core object cache and
an alignment helper to ensure that objects are padded to spread them equally on all DRAM or DDR3 channels.

This library is used by the :ref:`Mbuf Library <Mbuf_Library>`.

Cookies
-------

In debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled), cookies are added at the beginning and end of allocated blocks.
The allocated objects then contain overwrite protection fields to help debugging buffer overflows.

Stats
-----

In debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled),
statistics about get from/put in the pool are stored in the mempool structure.
Statistics are per-lcore to avoid concurrent access to statistics counters.

Memory Alignment Constraints
----------------------------

Depending on hardware memory configuration, performance can be greatly improved by adding a specific padding between objects.
The objective is to ensure that the beginning of each object starts on a different channel and rank in memory so that all channels are equally loaded.

This is particularly true for packet buffers when doing L3 forwarding or flow classification.
Only the first 64 bytes are accessed, so performance can be increased by spreading the start addresses of objects among the different channels.

The number of ranks on any DIMM is the number of independent sets of DRAMs that can be accessed for the full data bit-width of the DIMM.
The ranks cannot be accessed simultaneously since they share the same data path.
The physical layout of the DRAM chips on the DIMM itself does not necessarily relate to the number of ranks.

When running an application, the EAL command line options provide the ability to add the number of memory channels and ranks.

.. note::

    The command line must always have the number of memory channels specified for the processor.

Examples of alignment for different DIMM architectures are shown in
:numref:`figure_memory-management` and :numref:`figure_memory-management2`.

.. _figure_memory-management:

.. figure:: img/memory-management.*

   Two Channels and Quad-ranked DIMM Example


In this case, the assumption is that a packet is 16 blocks of 64 bytes, which is not true.

The Intel® 5520 chipset has three channels, so in most cases,
no padding is required between objects (except for objects whose size are n x 3 x 64 bytes blocks).

.. _figure_memory-management2:

.. figure:: img/memory-management2.*

   Three Channels and Two Dual-ranked DIMM Example


When creating a new pool, the user can specify to use this feature or not.

.. _mempool_local_cache:

Local Cache
-----------

In terms of CPU usage, the cost of multiple cores accessing a memory pool's ring of free buffers may be high
since each access requires a compare-and-set (CAS) operation.
To avoid having too many access requests to the memory pool's ring,
the memory pool allocator can maintain a per-core cache and do bulk requests to the memory pool's ring,
via the cache with many fewer locks on the actual memory pool structure.
In this way, each core has full access to its own cache (with locks) of free objects and
only when the cache fills does the core need to shuffle some of the free objects back to the pools ring or
obtain more objects when the cache is empty.

While this may mean a number of buffers may sit idle on some core's cache,
the speed at which a core can access its own cache for a specific memory pool without locks provides performance gains.

The cache is composed of a small, per-core table of pointers and its length (used as a stack).
This internal cache can be enabled or disabled at creation of the pool.

The maximum size of the cache is static and is defined at compilation time (CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE).

:numref:`figure_mempool` shows a cache in operation.

.. _figure_mempool:

.. figure:: img/mempool.*

   A mempool in Memory with its Associated Ring

Alternatively to the internal default per-lcore local cache, an application can create and manage external caches through the ``rte_mempool_cache_create()``, ``rte_mempool_cache_free()`` and ``rte_mempool_cache_flush()`` calls.
These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()``.
The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
In contrast to the default caches, user-owned caches can be used by non-EAL threads too.

Mempool Handlers
------------------------

This allows external memory subsystems, such as external hardware memory
management systems and software based memory allocators, to be used with DPDK.

There are two aspects to a mempool handler.

* Adding the code for your new mempool operations (ops). This is achieved by
  adding a new mempool ops code, and using the ``MEMPOOL_REGISTER_OPS`` macro.

* Using the new API to call ``rte_mempool_create_empty()`` and
  ``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which
  ops to use.

Several different mempool handlers may be used in the same application. A new
mempool can be created by using the ``rte_mempool_create_empty()`` function,
then using ``rte_mempool_set_ops_byname()`` to point the mempool to the
relevant mempool handler callback (ops) structure.

Legacy applications may continue to use the old ``rte_mempool_create()`` API
call, which uses a ring based mempool handler by default. These applications
will need to be modified to use a new mempool handler.

For applications that use ``rte_pktmbuf_create()``, there is a config setting
(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of
an alternative mempool handler.


Use Cases
---------

All allocations that require a high level of performance should use a pool-based memory allocator.
Below are some examples:

*   :ref:`Mbuf Library <Mbuf_Library>`

*   :ref:`Environment Abstraction Layer <Environment_Abstraction_Layer>` , for logging service

*   Any application that needs to allocate fixed-sized objects in the data plane and that will be continuously utilized by the system.
Commit	Line	Data
7c673cae FG	1	.. BSD LICENSE
	2	Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
	3	All rights reserved.
	4
	5	Redistribution and use in source and binary forms, with or without
	6	modification, are permitted provided that the following conditions
	7	are met:
	8
	9	* Redistributions of source code must retain the above copyright
	10	notice, this list of conditions and the following disclaimer.
	11	* Redistributions in binary form must reproduce the above copyright
	12	notice, this list of conditions and the following disclaimer in
	13	the documentation and/or other materials provided with the
	14	distribution.
	15	* Neither the name of Intel Corporation nor the names of its
	16	contributors may be used to endorse or promote products derived
	17	from this software without specific prior written permission.
	18
	19	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
	20	"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
	21	LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
	22	A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
	23	OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
	24	SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
	25	LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	26	DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
	27	THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
	28	(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
	29	OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	30
	31	.. _Mempool_Library:
	32
	33	Mempool Library
	34	===============
	35
	36	A memory pool is an allocator of a fixed-sized object.
	37	In the DPDK, it is identified by name and uses a mempool handler to store free objects.
	38	The default mempool handler is ring based.
	39	It provides some other optional services such as a per-core object cache and
	40	an alignment helper to ensure that objects are padded to spread them equally on all DRAM or DDR3 channels.
	41
	42	This library is used by the :ref:`Mbuf Library <Mbuf_Library>`.
	43
	44	Cookies
	45	-------
	46
	47	In debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled), cookies are added at the beginning and end of allocated blocks.
	48	The allocated objects then contain overwrite protection fields to help debugging buffer overflows.
	49
	50	Stats
	51	-----
	52
	53	In debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled),
	54	statistics about get from/put in the pool are stored in the mempool structure.
	55	Statistics are per-lcore to avoid concurrent access to statistics counters.
	56
	57	Memory Alignment Constraints
	58	----------------------------
	59
	60	Depending on hardware memory configuration, performance can be greatly improved by adding a specific padding between objects.
	61	The objective is to ensure that the beginning of each object starts on a different channel and rank in memory so that all channels are equally loaded.
	62
	63	This is particularly true for packet buffers when doing L3 forwarding or flow classification.
	64	Only the first 64 bytes are accessed, so performance can be increased by spreading the start addresses of objects among the different channels.
65
66	The number of ranks on any DIMM is the number of independent sets of DRAMs that can be accessed for the full data bit-width of the DIMM.
67	The ranks cannot be accessed simultaneously since they share the same data path.
68	The physical layout of the DRAM chips on the DIMM itself does not necessarily relate to the number of ranks.
69
70	When running an application, the EAL command line options provide the ability to add the number of memory channels and ranks.
71
72	.. note::
73
74	The command line must always have the number of memory channels specified for the processor.
75
76	Examples of alignment for different DIMM architectures are shown in
77	:numref:`figure_memory-management` and :numref:`figure_memory-management2`.
78
79	.. _figure_memory-management:
80
81	.. figure:: img/memory-management.*
82
83	Two Channels and Quad-ranked DIMM Example
84
85
86	In this case, the assumption is that a packet is 16 blocks of 64 bytes, which is not true.
87
88	The Intel® 5520 chipset has three channels, so in most cases,
89	no padding is required between objects (except for objects whose size are n x 3 x 64 bytes blocks).
90
91	.. _figure_memory-management2:
92
93	.. figure:: img/memory-management2.*
94
95	Three Channels and Two Dual-ranked DIMM Example
96
97
98	When creating a new pool, the user can specify to use this feature or not.
99
100	.. _mempool_local_cache:
101
102	Local Cache
103	-----------
104
105	In terms of CPU usage, the cost of multiple cores accessing a memory pool's ring of free buffers may be high
106	since each access requires a compare-and-set (CAS) operation.
107	To avoid having too many access requests to the memory pool's ring,
108	the memory pool allocator can maintain a per-core cache and do bulk requests to the memory pool's ring,
109	via the cache with many fewer locks on the actual memory pool structure.
110	In this way, each core has full access to its own cache (with locks) of free objects and
111	only when the cache fills does the core need to shuffle some of the free objects back to the pools ring or
112	obtain more objects when the cache is empty.
113
114	While this may mean a number of buffers may sit idle on some core's cache,
115	the speed at which a core can access its own cache for a specific memory pool without locks provides performance gains.
116
117	The cache is composed of a small, per-core table of pointers and its length (used as a stack).
118	This internal cache can be enabled or disabled at creation of the pool.
119
120	The maximum size of the cache is static and is defined at compilation time (CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE).
121
122	:numref:`figure_mempool` shows a cache in operation.
123
124	.. _figure_mempool:
125
126	.. figure:: img/mempool.*
127
128	A mempool in Memory with its Associated Ring
129
130	Alternatively to the internal default per-lcore local cache, an application can create and manage external caches through the ``rte_mempool_cache_create()``, ``rte_mempool_cache_free()`` and ``rte_mempool_cache_flush()`` calls.
131	These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()``.
132	The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
133	In contrast to the default caches, user-owned caches can be used by non-EAL threads too.
134
135	Mempool Handlers
136	------------------------
137
138	This allows external memory subsystems, such as external hardware memory
139	management systems and software based memory allocators, to be used with DPDK.
140
141	There are two aspects to a mempool handler.
142
143	* Adding the code for your new mempool operations (ops). This is achieved by
144	adding a new mempool ops code, and using the ``MEMPOOL_REGISTER_OPS`` macro.
145
146	* Using the new API to call ``rte_mempool_create_empty()`` and
147	``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which
148	ops to use.
149
150	Several different mempool handlers may be used in the same application. A new
151	mempool can be created by using the ``rte_mempool_create_empty()`` function,
152	then using ``rte_mempool_set_ops_byname()`` to point the mempool to the
153	relevant mempool handler callback (ops) structure.
154
155	Legacy applications may continue to use the old ``rte_mempool_create()`` API
156	call, which uses a ring based mempool handler by default. These applications
157	will need to be modified to use a new mempool handler.
158
159	For applications that use ``rte_pktmbuf_create()``, there is a config setting
160	(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of
161	an alternative mempool handler.
162
163
164	Use Cases
165	---------
166
167	All allocations that require a high level of performance should use a pool-based memory allocator.
168	Below are some examples:
169
170	* :ref:`Mbuf Library <Mbuf_Library>`
171
172	* :ref:`Environment Abstraction Layer <Environment_Abstraction_Layer>` , for logging service
173
174	* Any application that needs to allocate fixed-sized objects in the data plane and that will be continuously utilized by the system.