[ceph.git] / ceph / src / boost / libs / graph_parallel / doc / overview.rst

.. Copyright (C) 2004-2008 The Trustees of Indiana University.
   Use, modification and distribution is subject to the Boost Software
   License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
   http://www.boost.org/LICENSE_1_0.txt)

===============================================
An Overview of the Parallel Boost Graph Library
===============================================

.. image:: ../graph.png
  :width: 206
  :height: 184
  :alt: An example graph
  :align: right

The Parallel Boost Graph Library (Parallel BGL) is a C++ library for
parallel, distributed computation on graphs. The Parallel BGL contains
distributed graph data structures, distributed graph algorithms,
abstractions over the communication medium (such as MPI), and
supporting data structures. A graph (also called a *network*) consists
of a set of *vertices* and a set of relationships between vertices,
called *edges*. The edges may be *undirected*, meaning that the
relationship between vertices is mutual, e.g., "X is related to Y", or
they can be *directed*, meaning that the relationship goes only one
way, e.g., "X is the child of Y". The following figure illustrates a
typical directed graph, where *a-i* are the vertices and the arrows
represent edges.

.. image:: ../distributed-graph.png
  :width: 229
  :height: 199
  :alt: A distributed graph
  :align: right

The Parallel BGL is primarily concerned with *distributed*
graphs. Distributed graphs are conceptually graphs, but their storage
is spread across multiple processors. The following figure
demonstrates a distributed version of the graph above, where the graph
has been divided among three processors (represented by the grey
rectangles). Edges in the graph may be either local (with both
endpoints stored on the same processor) or remote (the target of the
edge is stored on a different processor).

The Parallel BGL is a generic library. At its core are *generic*
distributed graph algorithms, which can operate on any distributed
graph data structure provided that data structure meets certain
requirements. For instance, the algorithm may need to enumerate the
set of vertices stored on the current processor, enumerate the set of
outgoing edges from a particular vertex, and determine on which
processor the target of each edge resides. The graph algorithms in the
Parallel BGL are also generic with respect to the *properties*
attached to edges and vertices in a graph; for instance, the weight of
each edge can be stored as part of the graph or allocated in a
completely separate data structure. 

The genericity available in the algorithms of the Parallel BGL allows
them to be applied to existing graph data structures. However, most
users will instead be writing new code that takes advantage of the
Parallel BGL. The Parallel BGL provides distributed graph data
structures that meet the requirements of the Parallel BGL
algorithms. The primary data structure is the `distributed adjacency
list`_, which allows storage and manipulation of a (distributed)
graph. The vertices in the graph are divided among the various
processors, and each of the edges outgoing from a vertex are stored on
the processor that "owns" (stores) that vertex. The following figure
illustrates the distributed adjacency list representation.

.. image:: ../dist-adjlist.png
  :width: 446
  :height: 154
  :alt: A distributed adjacency list
  :align: center

.. image:: ../dist-pmap.png
  :width: 271
  :height: 175
  :alt: A distributed property map
  :align: right

The `distributed adjacency list`_ distributes the structure of a graph
over multiple processors. While graph structure is in important part
of many graph problems, there are typically other properties attached
to the vertices and edges, such as edge weights or the position of
vertices within a grid. These properties are stored in *property
maps*, which associate a single piece of data with each edge or vertex
in a graph. Distributed property maps extend this notion to
distributed computing, where properties are stored on the same
processor as the vertex or edge. The following figure illustrates the
distribution of a property map storing colors (white, gray, black) for
each vertex. In addition to the storage for each vertex, the
processors store some "ghost cells" that cache values actually stored
on other processors, represented by the dashed boxes.

Tying together all of the distributed data structures of the Parallel
BGL are its process groups and distributed graph algorithms. Process
groups coordinate the interactions between multiple processes and
distributed data structures by abstracting the communication
mechanism. The algorithms are typically written using the SPMD model
(Single Program, Multiple Data) and interact with both the distributed
data structures and the process group itself. At various points in the
algorithm's execution, all processes execute a synchronization point,
which allows all of the distributed data structures to ensure an
appropriate degree of consistency across processes. The following
diagram illustrates the communication patterns within the the
execution of a distributed algorithm in the Parallel BGL. In
particular, the diagram illustrates the distributed data structures
used in a distributed breadth-first search, from the top-left and
proceeding clockwise:

   - a user-defined property map that tracks the distance from the
     source vertex to all other vertices,

   - an automatically-generated property map that tracks the "color"
     of vertices in the (distributed) graph, to determine which
     vertices have been seen before,

   - a distributed queue, which coordinates the breadth-first search
     and distributes new vertices to search, and

   - a distributed graph, on which the breadth-first search is
     operating.

.. image:: ../architecture.png
  :width: 485
  :height: 410
  :alt: Parallel Boost Graph Library architecture
  :align: center

----------------------------------------------------------------------------

Copyright (C) 2005 The Trustees of Indiana University.

Authors: Douglas Gregor and Andrew Lumsdaine

.. _Distributed adjacency list: distributed_adjacency_list.html
.. _Process groups:
Commit	Line	Data
7c673cae FG	1	.. Copyright (C) 2004-2008 The Trustees of Indiana University.
	2	Use, modification and distribution is subject to the Boost Software
	3	License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
	4	http://www.boost.org/LICENSE_1_0.txt)
	5
	6	===============================================
	7	An Overview of the Parallel Boost Graph Library
	8	===============================================
	9
	10	.. image:: ../graph.png
	11	:width: 206
	12	:height: 184
	13	:alt: An example graph
	14	:align: right
	15
	16	The Parallel Boost Graph Library (Parallel BGL) is a C++ library for
	17	parallel, distributed computation on graphs. The Parallel BGL contains
	18	distributed graph data structures, distributed graph algorithms,
	19	abstractions over the communication medium (such as MPI), and
	20	supporting data structures. A graph (also called a network) consists
	21	of a set of vertices and a set of relationships between vertices,
	22	called edges. The edges may be undirected, meaning that the
	23	relationship between vertices is mutual, e.g., "X is related to Y", or
	24	they can be directed, meaning that the relationship goes only one
	25	way, e.g., "X is the child of Y". The following figure illustrates a
	26	typical directed graph, where a-i are the vertices and the arrows
	27	represent edges.
	28
	29	.. image:: ../distributed-graph.png
	30	:width: 229
	31	:height: 199
	32	:alt: A distributed graph
	33	:align: right
	34
	35	The Parallel BGL is primarily concerned with distributed
	36	graphs. Distributed graphs are conceptually graphs, but their storage
	37	is spread across multiple processors. The following figure
	38	demonstrates a distributed version of the graph above, where the graph
	39	has been divided among three processors (represented by the grey
	40	rectangles). Edges in the graph may be either local (with both
	41	endpoints stored on the same processor) or remote (the target of the
	42	edge is stored on a different processor).
	43
	44	The Parallel BGL is a generic library. At its core are generic
	45	distributed graph algorithms, which can operate on any distributed
	46	graph data structure provided that data structure meets certain
	47	requirements. For instance, the algorithm may need to enumerate the
	48	set of vertices stored on the current processor, enumerate the set of
	49	outgoing edges from a particular vertex, and determine on which
	50	processor the target of each edge resides. The graph algorithms in the
	51	Parallel BGL are also generic with respect to the properties
	52	attached to edges and vertices in a graph; for instance, the weight of
	53	each edge can be stored as part of the graph or allocated in a
	54	completely separate data structure.
	55
	56	The genericity available in the algorithms of the Parallel BGL allows
	57	them to be applied to existing graph data structures. However, most
	58	users will instead be writing new code that takes advantage of the
	59	Parallel BGL. The Parallel BGL provides distributed graph data
	60	structures that meet the requirements of the Parallel BGL
	61	algorithms. The primary data structure is the `distributed adjacency
	62	list`_, which allows storage and manipulation of a (distributed)
	63	graph. The vertices in the graph are divided among the various
	64	processors, and each of the edges outgoing from a vertex are stored on
65	the processor that "owns" (stores) that vertex. The following figure
66	illustrates the distributed adjacency list representation.
67
68	.. image:: ../dist-adjlist.png
69	:width: 446
70	:height: 154
71	:alt: A distributed adjacency list
72	:align: center
73
74	.. image:: ../dist-pmap.png
75	:width: 271
76	:height: 175
77	:alt: A distributed property map
78	:align: right
79
80	The `distributed adjacency list`_ distributes the structure of a graph
81	over multiple processors. While graph structure is in important part
82	of many graph problems, there are typically other properties attached
83	to the vertices and edges, such as edge weights or the position of
84	vertices within a grid. These properties are stored in *property
85	maps*, which associate a single piece of data with each edge or vertex
86	in a graph. Distributed property maps extend this notion to
87	distributed computing, where properties are stored on the same
88	processor as the vertex or edge. The following figure illustrates the
89	distribution of a property map storing colors (white, gray, black) for
90	each vertex. In addition to the storage for each vertex, the
91	processors store some "ghost cells" that cache values actually stored
92	on other processors, represented by the dashed boxes.
93
94	Tying together all of the distributed data structures of the Parallel
95	BGL are its process groups and distributed graph algorithms. Process
96	groups coordinate the interactions between multiple processes and
97	distributed data structures by abstracting the communication
98	mechanism. The algorithms are typically written using the SPMD model
99	(Single Program, Multiple Data) and interact with both the distributed
100	data structures and the process group itself. At various points in the
101	algorithm's execution, all processes execute a synchronization point,
102	which allows all of the distributed data structures to ensure an
103	appropriate degree of consistency across processes. The following
104	diagram illustrates the communication patterns within the the
105	execution of a distributed algorithm in the Parallel BGL. In
106	particular, the diagram illustrates the distributed data structures
107	used in a distributed breadth-first search, from the top-left and
108	proceeding clockwise:
109
110	- a user-defined property map that tracks the distance from the
111	source vertex to all other vertices,
112
113	- an automatically-generated property map that tracks the "color"
114	of vertices in the (distributed) graph, to determine which
115	vertices have been seen before,
116
117	- a distributed queue, which coordinates the breadth-first search
118	and distributes new vertices to search, and
119
120	- a distributed graph, on which the breadth-first search is
121	operating.
122
123	.. image:: ../architecture.png
124	:width: 485
125	:height: 410
126	:alt: Parallel Boost Graph Library architecture
127	:align: center
128
129	----------------------------------------------------------------------------
130
131	Copyright (C) 2005 The Trustees of Indiana University.
132
133	Authors: Douglas Gregor and Andrew Lumsdaine
134
135	.. _Distributed adjacency list: distributed_adjacency_list.html
136	.. _Process groups: