]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. Copyright (C) 2004-2008 The Trustees of Indiana University. |
2 | Use, modification and distribution is subject to the Boost Software | |
3 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at | |
4 | http://www.boost.org/LICENSE_1_0.txt) | |
5 | ||
6 | =============================================== | |
7 | An Overview of the Parallel Boost Graph Library | |
8 | =============================================== | |
9 | ||
10 | .. image:: ../graph.png | |
11 | :width: 206 | |
12 | :height: 184 | |
13 | :alt: An example graph | |
14 | :align: right | |
15 | ||
16 | The Parallel Boost Graph Library (Parallel BGL) is a C++ library for | |
17 | parallel, distributed computation on graphs. The Parallel BGL contains | |
18 | distributed graph data structures, distributed graph algorithms, | |
19 | abstractions over the communication medium (such as MPI), and | |
20 | supporting data structures. A graph (also called a *network*) consists | |
21 | of a set of *vertices* and a set of relationships between vertices, | |
22 | called *edges*. The edges may be *undirected*, meaning that the | |
23 | relationship between vertices is mutual, e.g., "X is related to Y", or | |
24 | they can be *directed*, meaning that the relationship goes only one | |
25 | way, e.g., "X is the child of Y". The following figure illustrates a | |
26 | typical directed graph, where *a-i* are the vertices and the arrows | |
27 | represent edges. | |
28 | ||
29 | .. image:: ../distributed-graph.png | |
30 | :width: 229 | |
31 | :height: 199 | |
32 | :alt: A distributed graph | |
33 | :align: right | |
34 | ||
35 | The Parallel BGL is primarily concerned with *distributed* | |
36 | graphs. Distributed graphs are conceptually graphs, but their storage | |
37 | is spread across multiple processors. The following figure | |
38 | demonstrates a distributed version of the graph above, where the graph | |
39 | has been divided among three processors (represented by the grey | |
40 | rectangles). Edges in the graph may be either local (with both | |
41 | endpoints stored on the same processor) or remote (the target of the | |
42 | edge is stored on a different processor). | |
43 | ||
44 | The Parallel BGL is a generic library. At its core are *generic* | |
45 | distributed graph algorithms, which can operate on any distributed | |
46 | graph data structure provided that data structure meets certain | |
47 | requirements. For instance, the algorithm may need to enumerate the | |
48 | set of vertices stored on the current processor, enumerate the set of | |
49 | outgoing edges from a particular vertex, and determine on which | |
50 | processor the target of each edge resides. The graph algorithms in the | |
51 | Parallel BGL are also generic with respect to the *properties* | |
52 | attached to edges and vertices in a graph; for instance, the weight of | |
53 | each edge can be stored as part of the graph or allocated in a | |
54 | completely separate data structure. | |
55 | ||
56 | The genericity available in the algorithms of the Parallel BGL allows | |
57 | them to be applied to existing graph data structures. However, most | |
58 | users will instead be writing new code that takes advantage of the | |
59 | Parallel BGL. The Parallel BGL provides distributed graph data | |
60 | structures that meet the requirements of the Parallel BGL | |
61 | algorithms. The primary data structure is the `distributed adjacency | |
62 | list`_, which allows storage and manipulation of a (distributed) | |
63 | graph. The vertices in the graph are divided among the various | |
64 | processors, and each of the edges outgoing from a vertex are stored on | |
65 | the processor that "owns" (stores) that vertex. The following figure | |
66 | illustrates the distributed adjacency list representation. | |
67 | ||
68 | .. image:: ../dist-adjlist.png | |
69 | :width: 446 | |
70 | :height: 154 | |
71 | :alt: A distributed adjacency list | |
72 | :align: center | |
73 | ||
74 | .. image:: ../dist-pmap.png | |
75 | :width: 271 | |
76 | :height: 175 | |
77 | :alt: A distributed property map | |
78 | :align: right | |
79 | ||
80 | The `distributed adjacency list`_ distributes the structure of a graph | |
81 | over multiple processors. While graph structure is in important part | |
82 | of many graph problems, there are typically other properties attached | |
83 | to the vertices and edges, such as edge weights or the position of | |
84 | vertices within a grid. These properties are stored in *property | |
85 | maps*, which associate a single piece of data with each edge or vertex | |
86 | in a graph. Distributed property maps extend this notion to | |
87 | distributed computing, where properties are stored on the same | |
88 | processor as the vertex or edge. The following figure illustrates the | |
89 | distribution of a property map storing colors (white, gray, black) for | |
90 | each vertex. In addition to the storage for each vertex, the | |
91 | processors store some "ghost cells" that cache values actually stored | |
92 | on other processors, represented by the dashed boxes. | |
93 | ||
94 | Tying together all of the distributed data structures of the Parallel | |
95 | BGL are its process groups and distributed graph algorithms. Process | |
96 | groups coordinate the interactions between multiple processes and | |
97 | distributed data structures by abstracting the communication | |
98 | mechanism. The algorithms are typically written using the SPMD model | |
99 | (Single Program, Multiple Data) and interact with both the distributed | |
100 | data structures and the process group itself. At various points in the | |
101 | algorithm's execution, all processes execute a synchronization point, | |
102 | which allows all of the distributed data structures to ensure an | |
103 | appropriate degree of consistency across processes. The following | |
104 | diagram illustrates the communication patterns within the the | |
105 | execution of a distributed algorithm in the Parallel BGL. In | |
106 | particular, the diagram illustrates the distributed data structures | |
107 | used in a distributed breadth-first search, from the top-left and | |
108 | proceeding clockwise: | |
109 | ||
110 | - a user-defined property map that tracks the distance from the | |
111 | source vertex to all other vertices, | |
112 | ||
113 | - an automatically-generated property map that tracks the "color" | |
114 | of vertices in the (distributed) graph, to determine which | |
115 | vertices have been seen before, | |
116 | ||
117 | - a distributed queue, which coordinates the breadth-first search | |
118 | and distributes new vertices to search, and | |
119 | ||
120 | - a distributed graph, on which the breadth-first search is | |
121 | operating. | |
122 | ||
123 | .. image:: ../architecture.png | |
124 | :width: 485 | |
125 | :height: 410 | |
126 | :alt: Parallel Boost Graph Library architecture | |
127 | :align: center | |
128 | ||
129 | ---------------------------------------------------------------------------- | |
130 | ||
131 | Copyright (C) 2005 The Trustees of Indiana University. | |
132 | ||
133 | Authors: Douglas Gregor and Andrew Lumsdaine | |
134 | ||
135 | .. _Distributed adjacency list: distributed_adjacency_list.html | |
136 | .. _Process groups: |