doc/developer/rcu.rst

   1 .. highlight:: c
   2
   3 RCU
   4 ===
   5
   6 Introduction
   7 ------------
   8
   9 RCU (Read-Copy-Update) is, fundamentally, a paradigm of multithreaded
  10 operation (and not a set of APIs.)  The core ideas are:
  11
  12 * longer, complicated updates to structures are made only on private,
  13   "invisible" copies.  Other threads, when they access the structure, see an
  14   older (but consistent) copy.
  15
  16 * once done, the updated copy is swapped in a single operation so that
  17   other threads see either the old or the new data but no inconsistent state
  18   between.
  19
  20 * the old instance is only released after making sure that it is impossible
  21   any other thread might still be reading it.
  22
  23 For more information, please search for general or Linux kernel RCU
  24 documentation; there is no way this doc can be comprehensive in explaining the
  25 interactions:
  26
  27 * https://en.wikipedia.org/wiki/Read-copy-update
  28 * https://www.kernel.org/doc/html/latest/kernel-hacking/locking.html#avoiding-locks-read-copy-update
  29 * https://lwn.net/Articles/262464/
  30 * http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf
  31 * http://lse.sourceforge.net/locking/rcupdate.html
  32
  33 RCU, the TL;DR
  34 ^^^^^^^^^^^^^^
  35
  36 #. data structures are always consistent for reading.  That's the "R" part.
  37 #. reading never blocks / takes a lock.
  38 #. rcu_read_lock is not a lock in the traditional sense.  Think of it as a
  39    "reservation";  it notes what the *oldest* possible thing the thread might
  40    be seeing is, and which thus can't be deleted yet.
  41 #. you create some object, finish it up, and then publish it.
  42 #. publishing is an ``atomic_*`` call with ``memory_order_release``, which
  43    tells the compiler to make sure prior memory writes have completed before
  44    doing the atomic op.
  45 #. ``ATOMLIST_*`` ``add`` operations do the ``memory_order_release`` for you.
  46 #. you can't touch the object after it is published, except with atomic ops.
  47 #. because you can't touch it, if you want to change it you make a new copy,
  48    work on that, and then publish the new copy.  That's the "CU" part.
  49 #. deleting the object is also an atomic op.
  50 #. other threads that started working before you published / deleted an object
  51    might not see the new object / still see the deleted object.
  52 #. because other threads may still see deleted objects, the ``free()`` needs
  53    to be delayed.  That's what :c:func:`rcu_free()` is for.
  54
  55
  56 When (not) to use RCU
  57 ^^^^^^^^^^^^^^^^^^^^^
  58
  59 RCU is designed for read-heavy workloads where objects are updated relatively
  60 rarely, but frequently accessed.  Do *not* indiscriminately replace locking by
  61 RCU patterns.
  62
  63 The "copy" part of RCU implies that, while updating, several copies of a given
  64 object exist in parallel.  Even after the updated copy is swapped in, the old
  65 object remains queued for freeing until all other threads are guaranteed to
  66 not be accessing it anymore, due to passing a sequence point.  In addition to
  67 the increased memory usage, there may be some bursted (due to batching) malloc
  68 contention when the RCU cleanup thread does its thing and frees memory.
  69
  70 Other useful patterns
  71 ^^^^^^^^^^^^^^^^^^^^^
  72
  73 In addition to the full "copy object, apply changes, atomically update"
  74 approach, there are 2 "reduced" usage cases that can be done:
  75
  76 * atomically updating single pieces of a particular object, e.g. some flags
  77   or configuration piece
  78
  79 * straight up read-only / immutable objects
  80
  81 Both of these cases can be considered RCU "subsets".  For example, when
  82 maintaining an atomic list of items, but these items only have a single
  83 integer value that needs to be updated, that value can be atomically updated
  84 without copying the entire object.  However, the object still needs to be
  85 free'd through :c:func:`rcu_free()` since reading/updating and deleting might
  86 be happening concurrently.  The same applies for immutable objects;  deletion
  87 might still race with reading so they need to be free'd through RCU.
  88
  89 FRR API
  90 -------
  91
  92 Before diving into detail on the provided functions, it is important to note
  93 that the FRR RCU API covers the **cleanup part of RCU, not the read-copy-update
  94 paradigm itself**.  These parts are handled by standard C11 atomic operations,
  95 and by extension through the atomic data structures (ATOMLIST, ATOMSORT & co.)
  96
  97 The ``rcu_*`` functions only make sense in conjunction with these RCU access
  98 patterns.  If you're calling the RCU API but not using these, something is
  99 wrong.  The other way around is not necessarily true;  it is possible to use
 100 atomic ops & datastructures with other types of locking, e.g. rwlocks.
 101
 102 .. c:function:: void rcu_read_lock()
 103 .. c:function:: void rcu_read_unlock()
 104
 105    These functions acquire / release the RCU read-side lock.  All access to
 106    RCU-guarded data must be inside a block guarded by these.  Any number of
 107    threads may hold the RCU read-side lock at a given point in time, including
 108    both no threads at all and all threads.
 109
 110    The functions implement a depth counter, i.e. can be nested.  The nested
 111    calls are cheap, since they only increment/decrement the counter.
 112    Therefore, any place that uses RCU data and doesn't have a guarantee that
 113    the caller holds RCU (e.g. ``lib/`` code) should just have its own
 114    rcu_read_lock/rcu_read_unlock pair.
 115
 116    At the "root" level (e.g. un-nested), these calls can incur the cost of one
 117    syscall (to ``futex()``).  That puts them on about the same cost as a
 118    mutex lock/unlock.
 119
 120    The ``thread_master`` code currently always holds RCU everywhere, except
 121    while doing the actual ``poll()`` syscall.  This is both an optimization as
 122    well as an "easement" into getting RCU going.  The current implementation
 123    contract is that any ``struct event *`` callback is called with a RCU
 124    holding depth of 1, and that this is owned by the thread so it may (should)
 125    drop and reacquire it when doing some longer-running work.
 126
 127    .. warning::
 128
 129       The RCU read-side lock must be held **continuously** for the entire time
 130       any piece of RCU data is used.  This includes any access to RCU data
 131       after the initial ``atomic_load``.  If the RCU read-side lock is
 132       released, any RCU-protected pointers as well as the data they refer to
 133       become invalid, as another thread may have called :c:func:`rcu_free` on
 134       them.
 135
 136 .. c:struct:: rcu_head
 137 .. c:struct:: rcu_head_close
 138 .. c:struct:: rcu_action
 139
 140    The ``rcu_head`` structures are small (16-byte) bits that contain the
 141    queueing machinery for the RCU sweeper/cleanup mechanisms.
 142
 143    Any piece of data that is cleaned up by RCU needs to have a matching
 144    ``rcu_head`` embedded in it.  If there is more than one cleanup operation
 145    to be done (e.g. closing a file descriptor), more than one ``rcu_head`` may
 146    be embedded.
 147
 148    .. warning::
 149
 150       It is not possible to reuse a ``rcu_head``.  It is owned by the RCU code
 151       as soon as ``rcu_*`` is called on it.
 152
 153    The ``_close`` variant carries an extra ``int fd`` field to store the fd to
 154    be closed.
 155
 156    To minimize the amount of memory used for ``rcu_head``, details about the
 157    RCU operation to be performed are moved into the ``rcu_action`` structure.
 158    It contains e.g. the MTYPE for :c:func:`rcu_free` calls.  The pointer to be
 159    freed is stored as an offset relative to the ``rcu_head``, which means it
 160    must be embedded as a struct field so the offset is constant.
 161
 162    The ``rcu_action`` structure is an implementation detail.  Using
 163    ``rcu_free`` or ``rcu_close`` will set it up correctly without further
 164    code needed.
 165
 166    The ``rcu_head`` may be put in an union with other data if the other data
 167    is only used during "life" of the data, since the ``rcu_head`` is used only
 168    for the "death" of data.  But note that other threads may still be reading
 169    a piece of data while a thread is working to free it.
 170
 171 .. c:function:: void rcu_free(struct memtype *mtype, struct X *ptr, field)
 172
 173    Free a block of memory after RCU has ensured no other thread can be
 174    accessing it anymore.  The pointer remains valid for any other thread that
 175    has called :c:func:`rcu_read_lock` before the ``rcu_free`` call.
 176
 177    .. warning::
 178
 179       In some other RCU implementations, the pointer remains valid to the
 180       *calling* thread if it is holding the RCU read-side lock.  This is not
 181       the case in FRR, particularly when running single-threaded.  Enforcing
 182       this rule also allows static analysis to find use-after-free issues.
 183
 184    ``mtype`` is the libfrr ``MTYPE_FOO`` allocation type to pass to
 185    :c:func:`XFREE`.
 186
 187    ``field`` must be the name of a ``struct rcu_head`` member field in ``ptr``.
 188    The offset of this field (which must be constant) is used to reduce the
 189    memory size of ``struct rcu_head``.
 190
 191    .. note::
 192
 193       ``rcu_free`` (and ``rcu_close``) calls are more efficient if they are
 194       put close to each other.  When freeing several RCU'd resources, try to
 195       move the calls next to each other (even if the data structures do not
 196       directly point to each other.)
 197
 198       Having the calls bundled reduces the cost of adding the ``rcu_head`` to
 199       the RCU queue;  the RCU queue is an atomic data structure whose usage
 200       will require the CPU to acquire an exclusive hold on relevant cache
 201       lines.
 202
 203 .. c:function:: void rcu_close(struct rcu_head_close *head, int fd)
 204
 205    Close a file descriptor after ensuring no other thread might be using it
 206    anymore.  Same as :c:func:`rcu_free`, except it calls ``close`` instead of
 207    ``free``.
 208
 209 Internals
 210 ^^^^^^^^^
 211
 212 .. c:struct:: rcu_thread
 213
 214    Per-thread state maintained by the RCU code, set up by the following
 215    functions.  A pointer to a thread's own ``rcu_thread`` is saved in
 216    thread-local storage.
 217
 218 .. c:function:: struct rcu_thread *rcu_thread_prepare(void)
 219 .. c:function:: void rcu_thread_unprepare(struct rcu_thread *rcu_thread)
 220 .. c:function:: void rcu_thread_start(struct rcu_thread *rcu_thread)
 221
 222    Since the RCU code needs to have a list of all active threads, these
 223    functions are used by the ``frr_pthread`` code to set up threads.  Teardown
 224    is automatic.  It should not be necessary to call these functions.
 225
 226    Any thread that accesses RCU-protected data needs to be registered with
 227    these functions.  Threads that do not access RCU-protected data may call
 228    these functions but do not need to.
 229
 230    Note that passing a pointer to RCU-protected data to some library which
 231    accesses that pointer makes the library "access RCU-protected data".  In
 232    that case, either all of the library's threads must be registered for RCU,
 233    or the code must instead pass a (non-RCU) copy of the data to the library.
 234
 235 .. c:function:: void rcu_shutdown(void)
 236
 237    Stop the RCU sweeper thread and make sure all cleanup has finished.
 238
 239    This function is called on daemon exit by the libfrr code to ensure pending
 240    RCU operations are completed.  This is mostly to get a clean exit without
 241    memory leaks from queued RCU operations.  It should not be necessary to
 242    call this function as libfrr handles this.
 243
 244 FRR specifics and implementation details
 245 ----------------------------------------
 246
 247 The FRR RCU infrastructure has the following characteristics:
 248
 249 * it is Epoch-based with a 32-bit wrapping counter.  (This is somewhat
 250   different from other Epoch-based approaches which may be designed to only
 251   use 3 counter values, but works out to a simple implementation.)
 252
 253 * instead of tracking CPUs as the Linux kernel does, threads are tracked.  This
 254   has exactly zero semantic impact, RCU just cares about "threads of
 255   execution", which the kernel can optimize to CPUs but we can't.  But it
 256   really boils down to the same thing.
 257
 258 * there are no ``rcu_dereference`` and ``rcu_assign_pointer`` - use
 259   ``atomic_load`` and ``atomic_store`` instead.  (These didn't exist when the
 260   Linux RCU code was created.)
 261
 262 * there is no ``synchronize_rcu``; this is a design choice but may be revisited
 263   at a later point.  ``synchronize_rcu`` blocks a thread until it is guaranteed
 264   that no other threads might still be accessing data structures that they may
 265   have access to at the beginning of the function call.  This is a blocking
 266   design and probably not appropriate for FRR.  Instead, ``rcu_call`` can be
 267   used to have the RCU sweeper thread make a callback after the same constraint
 268   is fulfilled in an asynchronous way.  Most needs should be covered by
 269   ``rcu_free`` and ``rcu_close``.