ceph/src/boost/libs/graph_parallel/doc/connected_components.rst

   1 .. Copyright (C) 2004-2008 The Trustees of Indiana University.
   2    Use, modification and distribution is subject to the Boost Software
   3    License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
   4    http://www.boost.org/LICENSE_1_0.txt)
   5
   6 ===========================
   7 |Logo| Connected Components
   8 ===========================
   9
  10 ::
  11
  12   namespace graph {
  13     // Default constructed ParentMap
  14     template<typename Graph, typename ComponentMap, typename ParentMap>
  15     typename property_traits<ComponentMap>::value_type
  16     connected_components( const Graph& g, ComponentMap c);
  17
  18     // User supplied ParentMap
  19     template<typename Graph, typename ComponentMap, typename ParentMap>
  20     typename property_traits<ComponentMap>::value_type
  21     connected_components( const Graph& g, ComponentMap c, ParentMap p);
  22   }
  23
  24 The ``connected_components()`` function computes the connected
  25 components of an undirected graph.  The distributed connected
  26 components algorithm uses the sequential version of the connected
  27 components algorithm to compute the connected components of the local
  28 subgraph, then executes the parallel phase of the algorithm.  The
  29 parallel portion of the connected components algorithm is loosely
  30 based on the work of Goddard, Kumar, and Prins. The interface is a
  31 superset of the interface to the BGL `sequential connected
  32 components`_ algorithm.
  33
  34 Prior to executing the sequential phase of the algorithm, each process
  35 identifies the roots of its local components.  An adjacency list of
  36 all vertices adjacent to members of the component is then constructed
  37 at the root vertex of each component.
  38
  39 The parallel phase of the distributed connected components algorithm
  40 consists of a series of supersteps.  In each superstep, each root
  41 attempts to hook to a member of it's adjacency list by assigning it's
  42 parent pointer to that vertex.  Hooking is restricted to vertices
  43 which are logically less than the current vertex to prevent looping.
  44 Vertices which hook successfully are removed from the list of roots
  45 and placed on another list of completed vertices.  All completed
  46 vertices now execute a pointer jumping step until every completed
  47 vertex has as its parent the root of a component.  This pointer
  48 jumping step may be further optimized by the addition of Cycle
  49 Reduction (CR) rules developed by Johnson and Metaxas, however current
  50 performance evaluations indicate that this would have a negligible
  51 impact on the overall performance of the algorithm.  These CR rules
  52 reduce the number of pointer jumping steps from *O(n)* to *O(log n)*.
  53 Following this pointer jumping step, roots which have hooked in this
  54 phase transmit their adjacency list to their new parent.  The
  55 remaining roots receive these edges and execute a pruning step on
  56 their adjacency lists to remove vertices that are now members of their
  57 component.  The parallel phase of the algorithm is complete when no
  58 root successfully hooks.  Once the parallel phase is complete a final
  59 pointer jumping step is performed on all vertices to assign the parent
  60 pointers of the leaves of the initial local subgraph components to
  61 their final parent which has now been determined.
  62
  63 The single largest performance bottleneck in the distributed connected
  64 components algorithm is the effect of poor vertex distribution on the
  65 algorithm.  For sparse graphs with a single large component, many
  66 roots may hook to the same component, resulting in severe load
  67 imbalance at the process owning this component.  Several methods of
  68 modifying the hooking strategy to avoid this behavior have been
  69 implemented but none has been successful as of yet.
  70
  71 .. contents::
  72
  73 Where Defined
  74 -------------
  75 <``boost/graph/connected_components.hpp``>
  76
  77 Parameters
  78 ----------
  79
  80 IN:  ``Graph& g``
  81   The graph typed must be a model of `Distributed Graph`_.
  82
  83 OUT:  ``ComponentMap c``
  84   The algorithm computes how many connected components are in the
  85   graph, and assigns each component an integer label.  The algorithm
  86   then records to which component each vertex in the graph belongs by
  87   recording the component number in the component property map.  The
  88   ``ComponentMap`` type must be a `Distributed Property Map`_.  The
  89   value type must be the ``vertices_size_type`` of the graph.  The key
  90   type must be the graph's vertex descriptor type. If you do not wish
  91   to compute component numbers, pass ``dummy_property_map`` as the
  92   component map and parent information will be provided in the parent
  93   map.
  94
  95 UTIL:  ``ParentMap p``
  96   A parent map may be supplied to the algorithm, if not supplied the
  97   parent map will be constructed automatically.  The ``ParentMap`` type
  98   must be a `Distributed Property Map`_.  The value type and key type
  99   must be the graph's vertex descriptor type.
 100
 101 OUT:  ``property_traits<ComponentMap>::value_type``
 102   The number of components found will be returned as the value type of
 103   the component map.
 104
 105 Complexity
 106 ----------
 107
 108 The local phase of the algorithm is *O(V + E)*.  The parallel phase of
 109 the algorithm requires at most *O(d)* supersteps where *d* is the
 110 number of initial roots.  *d* is at most *O(V)* but becomes
 111 significantly smaller as *E* increases.  The pointer jumping phase
 112 within each superstep requires at most *O(c)* steps on each of the
 113 completed roots where *c* is the length of the longest cycle.
 114 Application of CR rules can reduce this to *O(log c)*.
 115
 116 Performance
 117 -----------
 118 The following charts illustrate the performance of the Parallel BGL
 119 connected components algorithm. It performs well on very sparse and
 120 very dense graphs. However, for cases where the graph has a medium
 121 density with a giant connected component, the algorithm will perform
 122 poorly. This is a known problem with the algorithm and as far as we
 123 know all implemented algorithms have this degenerate behavior.
 124
 125 .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeSparse&columns=9
 126   :align: left
 127 .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeSparse&columns=9&speedup=1
 128
 129 .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeDense&columns=9
 130   :align: left
 131 .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeDense&columns=9&speedup=1
 132
 133
 134 -----------------------------------------------------------------------------
 135
 136 Copyright (C) 2004 The Trustees of Indiana University.
 137
 138 Authors: Nick Edmonds, Douglas Gregor, and Andrew Lumsdaine
 139
 140 .. |Logo| image:: pbgl-logo.png
 141             :align: middle
 142             :alt: Parallel BGL
 143             :target: http://www.osl.iu.edu/research/pbgl
 144
 145 .. _Sequential connected components: http://www.boost.org/libs/graph/doc/connected_components.html
 146 .. _Distributed Graph: DistributedGraph.html
 147 .. _Distributed Property Map: distributed_property_map.html