[mirror_frr.git] / doc / developer / process-architecture.rst

.. _process-architecture:

Process Architecture
====================

FRR inherited its overall design architecture from Quagga. The chosen model for
Quagga is that of a suite of independent daemons that do IPC via Unix domain
sockets. Within each daemon, the architecture follows the event-driven model.
FRR has inherited this model as well. As FRR is deployed at larger scales and
gains ever more features, each adding to the overall processing workload, we
are approaching the saturation point for a single thread per daemon. In light
of this, there are ongoing efforts to introduce multithreading to various
components of FRR. This document aims to describe the current design choices
and overall model for integrating the event-driven and multithreaded
architectures into a cohesive whole.

Terminology
-----------
Because this document describes the architecture for true kernel threads as
well as the event system, a digression on terminology is in order here.

Historically Quagga's event system was viewed as an implementation of userspace
threading. Because of this design choice, the names for various datastructures
within the event system are variations on the term "thread". The primary
context datastructure in this system is called a "threadmaster". What would
today be called an 'event' or 'task' in systems such as libevent are called
"threads" and the datastructure for them is ``struct thread``. To add to the
confusion, these "threads" have various types, one of which is "event". To
hopefully avoid some of this confusion, this document refers to these "threads"
as a 'task' except where the datastructures are explicitly named. When they are
explicitly named, they will be formatted ``like this`` to differentiate from
the conceptual names. When speaking of kernel threads, the term used will be
"pthread" since FRR's kernel threading implementation is POSIX threads.

.. This should be broken into its document under :ref:`libfrr`
.. _event-architecture:

Event Architecture
------------------
This section presents a brief overview of the event model as currently
implemented in FRR. This doc should be expanded and broken off into its own
section. For now it provides basic information necessary to understand the
interplay between the event system and kernel threads.

The core event system is implemented in :file:`lib/thread.[ch]`. The primary
structure is ``struct thread_master``, hereafter referred to as a
``threadmaster``. A ``threadmaster`` is a global state object, or context, that
holds all the tasks currently pending execution as well as statistics on tasks
that have already executed. The event system is driven by adding tasks to this
data structure and then calling a function to retrieve the next task to
execute. At initialization, a daemon will typically create one
``threadmaster``, add a small set of initial tasks, and then run a loop to
fetch each task and execute it.

These tasks have various types corresponding to their general action. The types
are given by integer macros in :file:`thread.h` and are:

``THREAD_READ``
   Task which waits for a file descriptor to become ready for reading and then
   executes.

``THREAD_WRITE``
   Task which waits for a file descriptor to become ready for writing and then
   executes.

``THREAD_TIMER``
   Task which executes after a certain amount of time has passed since it was
   scheduled.

``THREAD_EVENT``
   Generic task that executes with high priority and carries an arbitrary
   integer indicating the event type to its handler. These are commonly used to
   implement the finite state machines typically found in routing protocols.

``THREAD_READY``
   Type used internally for tasks on the ready queue.

``THREAD_UNUSED``
   Type used internally for ``struct thread`` objects that aren't being used.
   The event system pools ``struct thread`` to avoid heap allocations; this is
   the type they have when they're in the pool.

``THREAD_EXECUTE``
   Just before a task is run its type is changed to this. This is used to show
   ``X`` as the type in the output of :clicmd:`show thread cpu`.

The programmer never has to work with these types explicitly. Each type of task
is created and queued via special-purpose functions (actually macros, but
irrelevant for the time being) for the specific type. For example, to add a
``THREAD_READ`` task, you would call

::

   thread_add_read(struct thread_master *master, int (*handler)(struct thread *), void *arg, int fd, struct thread **ref);

The ``struct thread`` is then created and added to the appropriate internal
datastructure within the ``threadmaster``.

The Event Loop
^^^^^^^^^^^^^^
To use the event system, after creating a ``threadmaster`` the program adds an
initial set of tasks. As these tasks execute, they add more tasks that execute
at some point in the future. This sequence of tasks drives the lifecycle of the
program. When no more tasks are available, the program dies. Typically at
startup the first task added is an I/O task for VTYSH as well as any network
sockets needed for peerings or IPC.

To retrieve the next task to run the program calls ``thread_fetch()``.
``thread_fetch()`` internally computes which task to execute next based on
rudimentary priority logic. Events (type ``THREAD_EVENT``) execute with the
highest priority, followed by expired timers and finally I/O tasks (type
``THREAD_READ`` and ``THREAD_WRITE``). When scheduling a task a function and an
arbitrary argument are provided. The task returned from ``thread_fetch()`` is
then executed with ``thread_call()``.

The following diagram illustrates a simplified version of this infrastructure.

.. todo: replace these with SVG
.. figure:: ../figures/threadmaster-single.png
   :align: center

   Lifecycle of a program using a single threadmaster.

The series of "task" boxes represents the current ready task queue. The various
other queues for other types are not shown. The fetch-execute loop is
illustrated at the bottom.

Mapping the general names used in the figure to specific FRR functions:

- ``task`` is ``struct thread *``
- ``fetch`` is ``thread_fetch()``
- ``exec()`` is ``thread_call``
- ``cancel()`` is ``thread_cancel()``
- ``schedule()`` is any of the various task-specific ``thread_add_*`` functions

Adding tasks is done with various task-specific function-like macros. These
macros wrap underlying functions in :file:`thread.c` to provide additional
information added at compile time, such as the line number the task was
scheduled from, that can be accessed at runtime for debugging, logging and
informational purposes. Each task type has its own specific scheduling function
that follow the naming convention ``thread_add_<type>``; see :file:`thread.h`
for details.

There are some gotchas to keep in mind:

- I/O tasks are keyed off the file descriptor associated with the I/O
  operation. This means that for any given file descriptor, only one of each
  type of I/O task (``THREAD_READ`` and ``THREAD_WRITE``) can be scheduled. For
  example, scheduling two write tasks one after the other will overwrite the
  first task with the second, resulting in total loss of the first task and
  difficult bugs.

- Timer tasks are only as accurate as the monotonic clock provided by the
  underlying operating system.

- Memory management of the arbitrary handler argument passed in the schedule
  call is the responsibility of the caller.


Kernel Thread Architecture
--------------------------
Efforts have begun to introduce kernel threads into FRR to improve performance
and stability. Naturally a kernel thread architecture has long been seen as
orthogonal to an event-driven architecture, and the two do have significant
overlap in terms of design choices. Since the event model is tightly integrated
into FRR, careful thought has been put into how pthreads are introduced, what
role they fill, and how they will interoperate with the event model.

Design Overview
^^^^^^^^^^^^^^^
Each kernel thread behaves as a lightweight process within FRR, sharing the
same process memory space. On the other hand, the event system is designed to
run in a single process and drive serial execution of a set of tasks. With this
consideration, a natural choice is to implement the event system within each
kernel thread. This allows us to leverage the event-driven execution model with
the currently existing task and context primitives. In this way the familiar
execution model of FRR gains the ability to execute tasks simultaneously while
preserving the existing model for concurrency.

The following figure illustrates the architecture with multiple pthreads, each
running their own ``threadmaster``-based event loop.

.. todo: replace these with SVG
.. figure:: ../figures/threadmaster-multiple.png
   :align: center

   Lifecycle of a program using multiple pthreads, each running their own
   ``threadmaster``

Each roundrect represents a single pthread running the same event loop
described under :ref:`event-architecture`. Note the arrow from the ``exec()``
box on the right to the ``schedule()`` box in the middle pthread. This
illustrates code running in one pthread scheduling a task onto another
pthread's threadmaster. A global lock for each ``threadmaster`` is used to
synchronize these operations. The pthread names are examples.


.. This should be broken into its document under :ref:`libfrr`
.. _kernel-thread-wrapper:

Kernel Thread Wrapper
^^^^^^^^^^^^^^^^^^^^^
The basis for the integration of pthreads and the event system is a lightweight
wrapper for both systems implemented in :file:`lib/frr_pthread.[ch]`. The
header provides a core datastructure, ``struct frr_pthread``, that encapsulates
structures from both POSIX threads and :file:`thread.[ch]`. In particular, this
datastructure has a pointer to a ``threadmaster`` that runs within the pthread.
It also has fields for a name as well as start and stop functions that have
signatures similar to the POSIX arguments for ``pthread_create()``.

Calling ``frr_pthread_new()`` creates and registers a new ``frr_pthread``. The
returned structure has a pre-initialized ``threadmaster``, and its ``start``
and ``stop`` functions are initialized to defaults that will run a basic event
loop with the given threadmaster. Calling ``frr_pthread_run`` starts the thread
with the ``start`` function. From there, the model is the same as the regular
event model. To schedule tasks on a particular pthread, simply use the regular
:file:`thread.c` functions as usual and provide the ``threadmaster`` pointed to
from the ``frr_pthread``. As part of implementing the wrapper, the
:file:`thread.c` functions were made thread-safe. Consequently, it is safe to
schedule events on a ``threadmaster`` belonging both to the calling thread as
well as *any other pthread*. This serves as the basis for inter-thread
communication and boils down to a slightly more complicated method of message
passing, where the messages are the regular task events as used in the
event-driven model. The only difference is thread cancellation, which requires
calling ``thread_cancel_async()`` instead of ``thread_cancel`` to cancel a task
currently scheduled on a ``threadmaster`` belonging to a different pthread.
This is necessary to avoid race conditions in the specific case where one
pthread wants to guarantee that a task on another pthread is cancelled before
proceeding.

In addition, the existing commands to show statistics and other information for
tasks within the event driven model have been expanded to handle multiple
pthreads; running :clicmd:`show thread cpu` will display the usual event
breakdown, but it will do so for each pthread running in the program. For
example, :ref:`bgpd` runs a dedicated I/O pthread and shows the following
output for :clicmd:`show thread cpu`:

::

   frr# show thread cpu

   Thread statistics for bgpd:

   Showing statistics for pthread main
   ------------------------------------
                         CPU (user+system): Real (wall-clock):
   Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  Type  Thread
       0       1389.000        10   138900    248000   135549    255349   T   subgroup_coalesce_timer
       0          0.000         1        0         0       18        18   T   bgp_startup_timer_expire
       0        850.000        18    47222    222000    47795    233814   T   work_queue_run
       0          0.000        10        0         0        6        14   T   update_subgroup_merge_check_thread_cb
       0          0.000         8        0         0      117       160  W    zclient_flush_data
       2          2.000         1     2000      2000      831       831 R     bgp_accept
       0          1.000         1     1000      1000     2832      2832    E  zclient_connect
       1      42082.000    240574      174     37000      178     72810 R     vtysh_read
       1        152.000      1885       80      2000       96      6292 R     zclient_read
       0     549346.000   2997298      183      7000      153     20242    E  bgp_event
       0       2120.000       300     7066     14000     6813     22046   T   (bgp_holdtime_timer)
       0          0.000         2        0         0       57        59   T   update_group_refresh_default_originate_route_map
       0         90.000         1    90000     90000    73729     73729   T   bgp_route_map_update_timer
       0       1417.000      9147      154     48000      132     61998   T   bgp_process_packet
     300      71807.000   2995200       23      3000       24     11066   T   (bgp_connect_timer)
       0       1894.000     12713      148     45000      112     33606   T   (bgp_generate_updgrp_packets)
       0          0.000         1        0         0      105       105  W    vtysh_write
       0         52.000       599       86      2000      138      6992   T   (bgp_start_timer)
       1          1.000         8      125      1000      164       593 R     vtysh_accept
       0         15.000       600       25      2000       15       153   T   (bgp_routeadv_timer)
       0         11.000       299       36      3000       53      3128 RW    bgp_connect_check


   Showing statistics for pthread BGP I/O thread
   ----------------------------------------------
                         CPU (user+system): Real (wall-clock):
   Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  Type  Thread
       0       1611.000      9296      173     13000      188     13685 R     bgp_process_reads
       0       2995.000     11753      254     26000      182     29355  W    bgp_process_writes


   Showing statistics for pthread BGP Keepalives thread
   -----------------------------------------------------
                         CPU (user+system): Real (wall-clock):
   Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  Type  Thread
   No data to display yet.

Attentive readers will notice that there is a third thread, the Keepalives
thread. This thread is responsible for -- surprise -- generating keepalives for
peers. However, there are no statistics showing for that thread. Although the
pthread uses the ``frr_pthread`` wrapper, it opts not to use the embedded
``threadmaster`` facilities. Instead it replaces the ``start`` and ``stop``
functions with custom functions. This was done because the ``threadmaster``
facilities introduce a small but significant amount of overhead relative to the
pthread's task. In this case since the pthread does not need the event-driven
model and does not need to receive tasks from other pthreads, it is simpler and
more efficient to implement it outside of the provided event facilities.  The
point to take away from this example is that while the facilities to make using
pthreads within FRR easy are already implemented, the wrapper is flexible and
allows usage of other models while still integrating with the rest of the FRR
core infrastructure. Starting and stopping this pthread works the same as it
does for any other ``frr_pthread``; the only difference is that event
statistics are not collected for it, because there are no events.

Notes on Design and Documentation
---------------------------------
Because of the choice to embed the existing event system into each pthread
within FRR, at this time there is not integrated support for other models of
pthread use such as divide and conquer. Similarly, there is no explicit support
for thread pooling or similar higher level constructs. The currently existing
infrastructure is designed around the concept of long-running worker threads
responsible for specific jobs within each daemon. This is not to say that
divide and conquer, thread pooling, etc. could not be implemented in the
future. However, designs in this direction must be very careful to take into
account the existing codebase. Introducing kernel threads into programs that
have been written under the assumption of a single thread of execution must be
done very carefully to avoid insidious errors and to ensure the program remains
understandable and maintainable.

In keeping with these goals, future work on kernel threading should be
extensively documented here and FRR developers should be very careful with
their design choices, as poor choices tightly integrated can prove to be
catastrophic for development efforts in the future.
Commit	Line	Data
2307575a QY	1	.. _process-architecture:
	2
	3	Process Architecture
	4	====================
	5
	6	FRR inherited its overall design architecture from Quagga. The chosen model for
	7	Quagga is that of a suite of independent daemons that do IPC via Unix domain
	8	sockets. Within each daemon, the architecture follows the event-driven model.
	9	FRR has inherited this model as well. As FRR is deployed at larger scales and
	10	gains ever more features, each adding to the overall processing workload, we
	11	are approaching the saturation point for a single thread per daemon. In light
	12	of this, there are ongoing efforts to introduce multithreading to various
	13	components of FRR. This document aims to describe the current design choices
	14	and overall model for integrating the event-driven and multithreaded
	15	architectures into a cohesive whole.
	16
	17	Terminology
	18	-----------
	19	Because this document describes the architecture for true kernel threads as
	20	well as the event system, a digression on terminology is in order here.
	21
	22	Historically Quagga's event system was viewed as an implementation of userspace
	23	threading. Because of this design choice, the names for various datastructures
	24	within the event system are variations on the term "thread". The primary
	25	context datastructure in this system is called a "threadmaster". What would
	26	today be called an 'event' or 'task' in systems such as libevent are called
	27	"threads" and the datastructure for them is ``struct thread``. To add to the
	28	confusion, these "threads" have various types, one of which is "event". To
	29	hopefully avoid some of this confusion, this document refers to these "threads"
	30	as a 'task' except where the datastructures are explicitly named. When they are
	31	explicitly named, they will be formatted ``like this`` to differentiate from
	32	the conceptual names. When speaking of kernel threads, the term used will be
	33	"pthread" since FRR's kernel threading implementation is POSIX threads.
	34
c9efc8d3 QY	35	.. This should be broken into its document under :ref:`libfrr`
c9efc8d3 QY	36	.. _event-architecture:
2307575a QY	37
	38	Event Architecture
	39	------------------
	40	This section presents a brief overview of the event model as currently
	41	implemented in FRR. This doc should be expanded and broken off into its own
	42	section. For now it provides basic information necessary to understand the
	43	interplay between the event system and kernel threads.
	44
	45	The core event system is implemented in :file:`lib/thread.[ch]`. The primary
	46	structure is ``struct thread_master``, hereafter referred to as a
	47	``threadmaster``. A ``threadmaster`` is a global state object, or context, that
	48	holds all the tasks currently pending execution as well as statistics on tasks
	49	that have already executed. The event system is driven by adding tasks to this
	50	data structure and then calling a function to retrieve the next task to
	51	execute. At initialization, a daemon will typically create one
	52	``threadmaster``, add a small set of initial tasks, and then run a loop to
	53	fetch each task and execute it.
	54
	55	These tasks have various types corresponding to their general action. The types
	56	are given by integer macros in :file:`thread.h` and are:
	57
	58	``THREAD_READ``
	59	Task which waits for a file descriptor to become ready for reading and then
	60	executes.
	61
	62	``THREAD_WRITE``
	63	Task which waits for a file descriptor to become ready for writing and then
	64	executes.
	65
	66	``THREAD_TIMER``
	67	Task which executes after a certain amount of time has passed since it was
	68	scheduled.
	69
	70	``THREAD_EVENT``
	71	Generic task that executes with high priority and carries an arbitrary
c9efc8d3 QY	72	integer indicating the event type to its handler. These are commonly used to
c9efc8d3 QY	73	implement the finite state machines typically found in routing protocols.
2307575a QY	74
	75	``THREAD_READY``
	76	Type used internally for tasks on the ready queue.
	77
	78	``THREAD_UNUSED``
	79	Type used internally for ``struct thread`` objects that aren't being used.
	80	The event system pools ``struct thread`` to avoid heap allocations; this is
	81	the type they have when they're in the pool.
	82
	83	``THREAD_EXECUTE``
	84	Just before a task is run its type is changed to this. This is used to show
	85	``X`` as the type in the output of :clicmd:`show thread cpu`.
	86
	87	The programmer never has to work with these types explicitly. Each type of task
	88	is created and queued via special-purpose functions (actually macros, but
	89	irrelevant for the time being) for the specific type. For example, to add a
	90	``THREAD_READ`` task, you would call
	91
	92	::
	93
	94	thread_add_read(struct thread_master master, int (handler)(struct thread ), void arg, int fd, struct thread **ref);
	95
	96	The ``struct thread`` is then created and added to the appropriate internal
	97	datastructure within the ``threadmaster``.
	98
	99	The Event Loop
	100	^^^^^^^^^^^^^^
	101	To use the event system, after creating a ``threadmaster`` the program adds an
	102	initial set of tasks. As these tasks execute, they add more tasks that execute
	103	at some point in the future. This sequence of tasks drives the lifecycle of the
	104	program. When no more tasks are available, the program dies. Typically at
	105	startup the first task added is an I/O task for VTYSH as well as any network
	106	sockets needed for peerings or IPC.
	107
	108	To retrieve the next task to run the program calls ``thread_fetch()``.
	109	``thread_fetch()`` internally computes which task to execute next based on
	110	rudimentary priority logic. Events (type ``THREAD_EVENT``) execute with the
	111	highest priority, followed by expired timers and finally I/O tasks (type
	112	``THREAD_READ`` and ``THREAD_WRITE``). When scheduling a task a function and an
	113	arbitrary argument are provided. The task returned from ``thread_fetch()`` is
	114	then executed with ``thread_call()``.
	115
c9efc8d3 QY	116	The following diagram illustrates a simplified version of this infrastructure.
	117
	118	.. todo: replace these with SVG
	119	.. figure:: ../figures/threadmaster-single.png
	120	:align: center
4c97fd1a	121
c9efc8d3 QY	122	Lifecycle of a program using a single threadmaster.
	123
	124	The series of "task" boxes represents the current ready task queue. The various
	125	other queues for other types are not shown. The fetch-execute loop is
	126	illustrated at the bottom.
	127
	128	Mapping the general names used in the figure to specific FRR functions:
	129
	130	- ``task`` is ``struct thread *``
	131	- ``fetch`` is ``thread_fetch()``
	132	- ``exec()`` is ``thread_call``
	133	- ``cancel()`` is ``thread_cancel()``
	134	- ``schedule()`` is any of the various task-specific ``thread_add_*`` functions
	135
2307575a QY	136	Adding tasks is done with various task-specific function-like macros. These
	137	macros wrap underlying functions in :file:`thread.c` to provide additional
	138	information added at compile time, such as the line number the task was
	139	scheduled from, that can be accessed at runtime for debugging, logging and
	140	informational purposes. Each task type has its own specific scheduling function
	141	that follow the naming convention ``thread_add_<type>``; see :file:`thread.h`
	142	for details.
	143
	144	There are some gotchas to keep in mind:
	145
	146	- I/O tasks are keyed off the file descriptor associated with the I/O
	147	operation. This means that for any given file descriptor, only one of each
	148	type of I/O task (``THREAD_READ`` and ``THREAD_WRITE``) can be scheduled. For
	149	example, scheduling two write tasks one after the other will overwrite the
	150	first task with the second, resulting in total loss of the first task and
	151	difficult bugs.
	152
	153	- Timer tasks are only as accurate as the monotonic clock provided by the
	154	underlying operating system.
	155
	156	- Memory management of the arbitrary handler argument passed in the schedule
	157	call is the responsibility of the caller.
	158
	159
2307575a QY	160	Kernel Thread Architecture
	161	--------------------------
	162	Efforts have begun to introduce kernel threads into FRR to improve performance
	163	and stability. Naturally a kernel thread architecture has long been seen as
	164	orthogonal to an event-driven architecture, and the two do have significant
	165	overlap in terms of design choices. Since the event model is tightly integrated
	166	into FRR, careful thought has been put into how pthreads are introduced, what
	167	role they fill, and how they will interoperate with the event model.
	168
	169	Design Overview
	170	^^^^^^^^^^^^^^^
	171	Each kernel thread behaves as a lightweight process within FRR, sharing the
	172	same process memory space. On the other hand, the event system is designed to
	173	run in a single process and drive serial execution of a set of tasks. With this
	174	consideration, a natural choice is to implement the event system within each
	175	kernel thread. This allows us to leverage the event-driven execution model with
	176	the currently existing task and context primitives. In this way the familiar
	177	execution model of FRR gains the ability to execute tasks simultaneously while
	178	preserving the existing model for concurrency.
	179
c9efc8d3 QY	180	The following figure illustrates the architecture with multiple pthreads, each
	181	running their own ``threadmaster``-based event loop.
	182
	183	.. todo: replace these with SVG
	184	.. figure:: ../figures/threadmaster-multiple.png
	185	:align: center
4c97fd1a	186
c9efc8d3 QY	187	Lifecycle of a program using multiple pthreads, each running their own
	188	``threadmaster``
	189
	190	Each roundrect represents a single pthread running the same event loop
	191	described under :ref:`event-architecture`. Note the arrow from the ``exec()``
	192	box on the right to the ``schedule()`` box in the middle pthread. This
	193	illustrates code running in one pthread scheduling a task onto another
	194	pthread's threadmaster. A global lock for each ``threadmaster`` is used to
	195	synchronize these operations. The pthread names are examples.
	196
	197
	198	.. This should be broken into its document under :ref:`libfrr`
	199	.. _kernel-thread-wrapper:
	200
2307575a QY	201	Kernel Thread Wrapper
	202	^^^^^^^^^^^^^^^^^^^^^
	203	The basis for the integration of pthreads and the event system is a lightweight
	204	wrapper for both systems implemented in :file:`lib/frr_pthread.[ch]`. The
	205	header provides a core datastructure, ``struct frr_pthread``, that encapsulates
	206	structures from both POSIX threads and :file:`thread.[ch]`. In particular, this
	207	datastructure has a pointer to a ``threadmaster`` that runs within the pthread.
	208	It also has fields for a name as well as start and stop functions that have
	209	signatures similar to the POSIX arguments for ``pthread_create()``.
	210
	211	Calling ``frr_pthread_new()`` creates and registers a new ``frr_pthread``. The
	212	returned structure has a pre-initialized ``threadmaster``, and its ``start``
	213	and ``stop`` functions are initialized to defaults that will run a basic event
	214	loop with the given threadmaster. Calling ``frr_pthread_run`` starts the thread
	215	with the ``start`` function. From there, the model is the same as the regular
	216	event model. To schedule tasks on a particular pthread, simply use the regular
	217	:file:`thread.c` functions as usual and provide the ``threadmaster`` pointed to
	218	from the ``frr_pthread``. As part of implementing the wrapper, the
	219	:file:`thread.c` functions were made thread-safe. Consequently, it is safe to
	220	schedule events on a ``threadmaster`` belonging both to the calling thread as
	221	well as any other pthread. This serves as the basis for inter-thread
	222	communication and boils down to a slightly more complicated method of message
	223	passing, where the messages are the regular task events as used in the
	224	event-driven model. The only difference is thread cancellation, which requires
	225	calling ``thread_cancel_async()`` instead of ``thread_cancel`` to cancel a task
	226	currently scheduled on a ``threadmaster`` belonging to a different pthread.
c9efc8d3 QY	227	This is necessary to avoid race conditions in the specific case where one
	228	pthread wants to guarantee that a task on another pthread is cancelled before
	229	proceeding.
2307575a QY	230
	231	In addition, the existing commands to show statistics and other information for
	232	tasks within the event driven model have been expanded to handle multiple
	233	pthreads; running :clicmd:`show thread cpu` will display the usual event
	234	breakdown, but it will do so for each pthread running in the program. For
	235	example, :ref:`bgpd` runs a dedicated I/O pthread and shows the following
	236	output for :clicmd:`show thread cpu`:
	237
	238	::
	239
	240	frr# show thread cpu
	241
	242	Thread statistics for bgpd:
	243
	244	Showing statistics for pthread main
	245	------------------------------------
	246	CPU (user+system): Real (wall-clock):
	247	Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
	248	0 1389.000 10 138900 248000 135549 255349 T subgroup_coalesce_timer
	249	0 0.000 1 0 0 18 18 T bgp_startup_timer_expire
	250	0 850.000 18 47222 222000 47795 233814 T work_queue_run
	251	0 0.000 10 0 0 6 14 T update_subgroup_merge_check_thread_cb
	252	0 0.000 8 0 0 117 160 W zclient_flush_data
	253	2 2.000 1 2000 2000 831 831 R bgp_accept
	254	0 1.000 1 1000 1000 2832 2832 E zclient_connect
	255	1 42082.000 240574 174 37000 178 72810 R vtysh_read
	256	1 152.000 1885 80 2000 96 6292 R zclient_read
	257	0 549346.000 2997298 183 7000 153 20242 E bgp_event
	258	0 2120.000 300 7066 14000 6813 22046 T (bgp_holdtime_timer)
	259	0 0.000 2 0 0 57 59 T update_group_refresh_default_originate_route_map
	260	0 90.000 1 90000 90000 73729 73729 T bgp_route_map_update_timer
	261	0 1417.000 9147 154 48000 132 61998 T bgp_process_packet
	262	300 71807.000 2995200 23 3000 24 11066 T (bgp_connect_timer)
	263	0 1894.000 12713 148 45000 112 33606 T (bgp_generate_updgrp_packets)
	264	0 0.000 1 0 0 105 105 W vtysh_write
	265	0 52.000 599 86 2000 138 6992 T (bgp_start_timer)
	266	1 1.000 8 125 1000 164 593 R vtysh_accept
	267	0 15.000 600 25 2000 15 153 T (bgp_routeadv_timer)
	268	0 11.000 299 36 3000 53 3128 RW bgp_connect_check
	269
	270
	271	Showing statistics for pthread BGP I/O thread
	272	----------------------------------------------
	273	CPU (user+system): Real (wall-clock):
	274	Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
	275	0 1611.000 9296 173 13000 188 13685 R bgp_process_reads
	276	0 2995.000 11753 254 26000 182 29355 W bgp_process_writes
	277
	278
	279	Showing statistics for pthread BGP Keepalives thread
	280	-----------------------------------------------------
	281	CPU (user+system): Real (wall-clock):
	282	Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
	283	No data to display yet.
	284
	285	Attentive readers will notice that there is a third thread, the Keepalives
	286	thread. This thread is responsible for -- surprise -- generating keepalives for
	287	peers. However, there are no statistics showing for that thread. Although the
	288	pthread uses the ``frr_pthread`` wrapper, it opts not to use the embedded
	289	``threadmaster`` facilities. Instead it replaces the ``start`` and ``stop``
	290	functions with custom functions. This was done because the ``threadmaster``
	291	facilities introduce a small but significant amount of overhead relative to the
	292	pthread's task. In this case since the pthread does not need the event-driven
	293	model and does not need to receive tasks from other pthreads, it is simpler and
294	more efficient to implement it outside of the provided event facilities. The
295	point to take away from this example is that while the facilities to make using
296	pthreads within FRR easy are already implemented, the wrapper is flexible and
297	allows usage of other models while still integrating with the rest of the FRR
298	core infrastructure. Starting and stopping this pthread works the same as it
299	does for any other ``frr_pthread``; the only difference is that event
300	statistics are not collected for it, because there are no events.
301
302	Notes on Design and Documentation
303	---------------------------------
304	Because of the choice to embed the existing event system into each pthread
305	within FRR, at this time there is not integrated support for other models of
306	pthread use such as divide and conquer. Similarly, there is no explicit support
307	for thread pooling or similar higher level constructs. The currently existing
308	infrastructure is designed around the concept of long-running worker threads
309	responsible for specific jobs within each daemon. This is not to say that
310	divide and conquer, thread pooling, etc. could not be implemented in the
311	future. However, designs in this direction must be very careful to take into
312	account the existing codebase. Introducing kernel threads into programs that
313	have been written under the assumption of a single thread of execution must be
314	done very carefully to avoid insidious errors and to ensure the program remains
315	understandable and maintainable.
316
317	In keeping with these goals, future work on kernel threading should be
318	extensively documented here and FRR developers should be very careful with
319	their design choices, as poor choices tightly integrated can prove to be
320	catastrophic for development efforts in the future.