ceph/src/boost/libs/graph_parallel/doc/overview.rst

   1 .. Copyright (C) 2004-2008 The Trustees of Indiana University.
   2    Use, modification and distribution is subject to the Boost Software
   3    License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
   4    http://www.boost.org/LICENSE_1_0.txt)
   5
   6 ===============================================
   7 An Overview of the Parallel Boost Graph Library
   8 ===============================================
   9
  10 .. image:: ../graph.png
  11   :width: 206
  12   :height: 184
  13   :alt: An example graph
  14   :align: right
  15
  16 The Parallel Boost Graph Library (Parallel BGL) is a C++ library for
  17 parallel, distributed computation on graphs. The Parallel BGL contains
  18 distributed graph data structures, distributed graph algorithms,
  19 abstractions over the communication medium (such as MPI), and
  20 supporting data structures. A graph (also called a *network*) consists
  21 of a set of *vertices* and a set of relationships between vertices,
  22 called *edges*. The edges may be *undirected*, meaning that the
  23 relationship between vertices is mutual, e.g., "X is related to Y", or
  24 they can be *directed*, meaning that the relationship goes only one
  25 way, e.g., "X is the child of Y". The following figure illustrates a
  26 typical directed graph, where *a-i* are the vertices and the arrows
  27 represent edges.
  28
  29 .. image:: ../distributed-graph.png
  30   :width: 229
  31   :height: 199
  32   :alt: A distributed graph
  33   :align: right
  34
  35 The Parallel BGL is primarily concerned with *distributed*
  36 graphs. Distributed graphs are conceptually graphs, but their storage
  37 is spread across multiple processors. The following figure
  38 demonstrates a distributed version of the graph above, where the graph
  39 has been divided among three processors (represented by the grey
  40 rectangles). Edges in the graph may be either local (with both
  41 endpoints stored on the same processor) or remote (the target of the
  42 edge is stored on a different processor).
  43
  44 The Parallel BGL is a generic library. At its core are *generic*
  45 distributed graph algorithms, which can operate on any distributed
  46 graph data structure provided that data structure meets certain
  47 requirements. For instance, the algorithm may need to enumerate the
  48 set of vertices stored on the current processor, enumerate the set of
  49 outgoing edges from a particular vertex, and determine on which
  50 processor the target of each edge resides. The graph algorithms in the
  51 Parallel BGL are also generic with respect to the *properties*
  52 attached to edges and vertices in a graph; for instance, the weight of
  53 each edge can be stored as part of the graph or allocated in a
  54 completely separate data structure.
  55
  56 The genericity available in the algorithms of the Parallel BGL allows
  57 them to be applied to existing graph data structures. However, most
  58 users will instead be writing new code that takes advantage of the
  59 Parallel BGL. The Parallel BGL provides distributed graph data
  60 structures that meet the requirements of the Parallel BGL
  61 algorithms. The primary data structure is the `distributed adjacency
  62 list`_, which allows storage and manipulation of a (distributed)
  63 graph. The vertices in the graph are divided among the various
  64 processors, and each of the edges outgoing from a vertex are stored on
  65 the processor that "owns" (stores) that vertex. The following figure
  66 illustrates the distributed adjacency list representation.
  67
  68 .. image:: ../dist-adjlist.png
  69   :width: 446
  70   :height: 154
  71   :alt: A distributed adjacency list
  72   :align: center
  73
  74 .. image:: ../dist-pmap.png
  75   :width: 271
  76   :height: 175
  77   :alt: A distributed property map
  78   :align: right
  79
  80 The `distributed adjacency list`_ distributes the structure of a graph
  81 over multiple processors. While graph structure is in important part
  82 of many graph problems, there are typically other properties attached
  83 to the vertices and edges, such as edge weights or the position of
  84 vertices within a grid. These properties are stored in *property
  85 maps*, which associate a single piece of data with each edge or vertex
  86 in a graph. Distributed property maps extend this notion to
  87 distributed computing, where properties are stored on the same
  88 processor as the vertex or edge. The following figure illustrates the
  89 distribution of a property map storing colors (white, gray, black) for
  90 each vertex. In addition to the storage for each vertex, the
  91 processors store some "ghost cells" that cache values actually stored
  92 on other processors, represented by the dashed boxes.
  93
  94 Tying together all of the distributed data structures of the Parallel
  95 BGL are its process groups and distributed graph algorithms. Process
  96 groups coordinate the interactions between multiple processes and
  97 distributed data structures by abstracting the communication
  98 mechanism. The algorithms are typically written using the SPMD model
  99 (Single Program, Multiple Data) and interact with both the distributed
 100 data structures and the process group itself. At various points in the
 101 algorithm's execution, all processes execute a synchronization point,
 102 which allows all of the distributed data structures to ensure an
 103 appropriate degree of consistency across processes. The following
 104 diagram illustrates the communication patterns within the the
 105 execution of a distributed algorithm in the Parallel BGL. In
 106 particular, the diagram illustrates the distributed data structures
 107 used in a distributed breadth-first search, from the top-left and
 108 proceeding clockwise:
 109
 110    - a user-defined property map that tracks the distance from the
 111      source vertex to all other vertices,
 112
 113    - an automatically-generated property map that tracks the "color"
 114      of vertices in the (distributed) graph, to determine which
 115      vertices have been seen before,
 116
 117    - a distributed queue, which coordinates the breadth-first search
 118      and distributes new vertices to search, and
 119
 120    - a distributed graph, on which the breadth-first search is
 121      operating.
 122
 123 .. image:: ../architecture.png
 124   :width: 485
 125   :height: 410
 126   :alt: Parallel Boost Graph Library architecture
 127   :align: center
 128
 129 ----------------------------------------------------------------------------
 130
 131 Copyright (C) 2005 The Trustees of Indiana University.
 132
 133 Authors: Douglas Gregor and Andrew Lumsdaine
 134
 135 .. _Distributed adjacency list: distributed_adjacency_list.html
 136 .. _Process groups: