]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <HTML> |
2 | <!-- | |
3 | Copyright (c) Jeremy Siek 2000 | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. | |
6 | (See accompanying file LICENSE_1_0.txt or copy at | |
7 | http://www.boost.org/LICENSE_1_0.txt) | |
8 | --> | |
9 | <Head> | |
10 | <Title>Sparse Matrix Ordering Example</Title> | |
11 | <BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b" | |
12 | ALINK="#ff0000"> | |
13 | <IMG SRC="../../../boost.png" | |
14 | ALT="C++ Boost" width="277" height="86"> | |
15 | ||
16 | <BR Clear> | |
17 | ||
18 | <H1><A NAME="sec:sparse-matrix-ordering"></A> | |
19 | Sparse Matrix Ordering | |
20 | </H1> | |
21 | ||
22 | <P> | |
23 | Graph theory was identified as a powerful tool for sparse matrix | |
24 | computation when Seymour Parter used undirected graphs to model | |
25 | symmetric Gaussian elimination more than 30 years ago [<A HREF="bibliography.html#parter61:_gauss">28</A>]. | |
26 | Graphs can be used to model symmetric matrices, factorizations and | |
27 | algorithms on non-symmetric matrices, such as fill paths in Gaussian | |
28 | elimination, strongly connected components in matrix irreducibility, | |
29 | bipartite matching, and alternating paths in linear dependence and | |
30 | structural singularity. Not only do graphs make it easier to | |
31 | understand and analyze sparse matrix algorithms, but they broaden the | |
32 | area of manipulating sparse matrices using existing graph algorithms | |
33 | and techniques [<A | |
34 | HREF="bibliography.html#george93:graphtheory">13</A>]. In this section, we are | |
35 | going to illustrate how to use BGL in sparse matrix computation such | |
36 | as ordering algorithms. Before we go further into the sparse matrix | |
37 | algorithms, let us take a step back and review a few things. | |
38 | ||
39 | <P> | |
40 | ||
41 | <H2>Graphs and Sparse Matrices</H2> | |
42 | ||
43 | <P> | |
44 | A graph is fundamentally a way to represent a binary relation between | |
45 | objects. The nonzero pattern of a sparse matrix also describes a | |
46 | binary relation between unknowns. Therefore the nonzero pattern of a | |
47 | sparse matrix of a linear system can be modeled with a graph | |
48 | <I>G(V,E)</I>, whose <I>n</I> vertices in <I>V</I> represent the | |
49 | <I>n</I> unknowns, and where there is an edge from vertex <I>i</I> to | |
50 | vertex <I>j</I> when <i>A<sub>ij</sub></i> is nonzero. Thus, when a | |
51 | matrix has a symmetric nonzero pattern, the corresponding graph is | |
52 | undirected. | |
53 | ||
54 | <P> | |
55 | ||
56 | <H2>Sparse Matrix Ordering Algorithms</H2> | |
57 | ||
58 | <P> | |
59 | The process for solving a sparse symmetric positive definite linear | |
60 | system, <i>Ax=b</i>, can be divided into four stages as follows: | |
61 | <DL> | |
62 | <DT><STRONG>Ordering:</STRONG></DT> | |
63 | <DD>Find a permutation <i>P</i> of matrix <i>A</i>, | |
64 | </DD> | |
65 | <DT><STRONG>Symbolic factorization:</STRONG></DT> | |
66 | <DD>Set up a data structure for the Cholesky | |
67 | factor <i>L</i> of <i>PAP<sup>T</sup></i>, | |
68 | </DD> | |
69 | <DT><STRONG>Numerical factorization:</STRONG></DT> | |
70 | <DD>Decompose <i>PAP<sup>T</sup></i> into <i>LL<sup>T</sup></i>, | |
71 | </DD> | |
72 | <DT><STRONG>Triangular system solution:</STRONG></DT> | |
73 | <DD>Solve <i>LL<sup>T</sup>Px=Pb</i> for <i>x</i>. | |
74 | </DD> | |
75 | </DL> | |
76 | Because the choice of permutation <i>P</i> will directly determine the | |
77 | number of fill-in elements (elements present in the non-zero structure | |
78 | of <i>L</i> that are not present in the non-zero structure of | |
79 | <i>A</i>), the ordering has a significant impact on the memory and | |
80 | computational requirements for the latter stages. However, finding | |
81 | the optimal ordering for <i>A</i> (in the sense of minimizing fill-in) | |
82 | has been proven to be NP-complete [<a | |
83 | href="bibliography.html#garey79:computers-and-intractability">30</a>] | |
84 | requiring that heuristics be used for all but simple (or specially | |
85 | structured) cases. | |
86 | ||
87 | <P> | |
88 | A widely used but rather simple ordering algorithm is a variant of the | |
89 | Cuthill-McKee orderings, the reverse Cuthill-McKee ordering algorithm. | |
90 | This algorithm can be used as a preordering method to improve ordering | |
91 | in more sophisticated methods such as minimum degree | |
92 | algorithms [<a href="bibliography.html#LIU:MMD">21</a>]. | |
93 | ||
94 | <P> | |
95 | ||
96 | <H3><A NAME="SECTION001032100000000000000"> | |
97 | Reverse Cuthill-McKee Ordering Algorithm</A> | |
98 | </H3> | |
99 | The original Cuthill-McKee ordering algorithm is primarily designed to | |
100 | reduce the profile of a matrix [<A | |
101 | HREF="bibliography.html#george81:__sparse_pos_def">14</A>]. | |
102 | George discovered that the reverse ordering often turned out to be | |
103 | superior to the original ordering in 1971. Here we describe the | |
104 | Reverse Cuthill-McKee algorithm in terms of a graph: | |
105 | ||
106 | <OL> | |
107 | <LI><I>Finding a starting vertex</I>: Determine a starting vertex | |
108 | <I>r</I> and assign <i>r</i> to <I>x<sub>1</sub></I>. | |
109 | </LI> | |
110 | <LI><I>Main part</I>: For <I>i=1,...,N</I>, find all | |
111 | the unnumbered neighbors of the vertex <I>x<sub>i</sub></I> and number them | |
112 | in increasing order of degree. | |
113 | </LI> | |
114 | <LI><I>Reversing ordering</I>: The reverse Cuthill-McKee ordering | |
115 | is given by <I>y<sub>1</sub>,...,y<sub>N</sub></I> where | |
116 | <I>y<sub>i</sub></I> is <I>x<sub>N-i+1</sub></I> for <I>i = 1,...,N</I>. | |
117 | </LI> | |
118 | </OL> | |
119 | In the first step a good starting vertex needs to be determined. The | |
120 | study by George and Liu [<A | |
121 | HREF="bibliography.html#george81:__sparse_pos_def">14</A>] showed that | |
122 | a pair of vertices which are at maximum or near maximum ``distance'' | |
123 | apart are good ones. They also proposed an algorithm to find such a | |
124 | starting vertex in [<A | |
125 | HREF="bibliography.html#george81:__sparse_pos_def">14</A>], which we | |
126 | show in <A | |
127 | HREF="#fig:ggcl_find_starting_vertex">Figure | |
128 | 1</A> implemented using the BGL interface. | |
129 | ||
130 | <P> | |
131 | <P></P> | |
132 | <DIV ALIGN="CENTER"><A NAME="fig:ggcl_find_starting_vertex"></A> | |
133 | <table border><tr><td> | |
134 | <pre> | |
135 | namespace boost { | |
136 | template <class Graph, class Vertex, class Color, class Degree> | |
137 | Vertex | |
138 | pseudo_peripheral_pair(Graph& G, const Vertex& u, int& ecc, | |
139 | Color color, Degree degree) | |
140 | { | |
141 | typename property_traits<Color>::value_type c = get(color, u); | |
142 | ||
143 | rcm_queue<Vertex, Degree> Q(degree); | |
144 | ||
145 | typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end; | |
146 | for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui) | |
147 | put(color, *ui, white(c)); | |
148 | breadth_first_search(G, u, Q, bfs_visitor<>(), color); | |
149 | ||
150 | ecc = Q.eccentricity(); | |
151 | return Q.spouse(); | |
152 | } | |
153 | ||
154 | template <class Graph, class Vertex, class Color, class Degree> | |
155 | Vertex find_starting_node(Graph& G, Vertex r, Color c, Degree d) { | |
156 | int eccen_r, eccen_x; | |
157 | Vertex x = pseudo_peripheral_pair(G, r, eccen_r, c, d); | |
158 | Vertex y = pseudo_peripheral_pair(G, x, eccen_x, c, d); | |
159 | ||
160 | while (eccen_x > eccen_r) { | |
161 | r = x; | |
162 | eccen_r = eccen_x; | |
163 | x = y; | |
164 | y = pseudo_peripheral_pair(G, x, eccen_x, c, d); | |
165 | } | |
166 | return x; | |
167 | } | |
168 | } // namespace boost | |
169 | </pre> | |
170 | </td></tr></table> | |
171 | <CAPTION ALIGN="BOTTOM"> | |
172 | <table><tr><td> | |
173 | <STRONG>Figure 1:</STRONG> | |
174 | The BGL implementation of <TT>find_starting_node</TT>. The key | |
175 | part <TT>pseudo_peripheral_pair</TT> is BFS with a custom queue | |
176 | type virtually. | |
177 | </td></tr></table></CAPTION> | |
178 | </DIV> | |
179 | <P></P> | |
180 | ||
181 | The key part of step one is a custom queue type with BFS as shown in | |
182 | <A | |
183 | HREF="#fig:ggcl_find_starting_vertex">Figure | |
184 | 1</A>. The main Cuthill-McKee algorithm shown in Figure <A | |
185 | HREF="#fig:cuthill-mckee">Figure 2</A> | |
186 | is a fenced priority queue with a generalized BFS. If | |
187 | <TT>inverse_permutation</TT> is a normal iterator, the result stored | |
188 | is the inverse permutation of original Cuthill-McKee ordering. On the | |
189 | other hand, if <TT>inverse_permutation</TT> is a reverse iterator, the | |
190 | result stored is the inverse permutation of reverse Cuthill-McKee | |
191 | ordering. Our implementation is quite concise because many components | |
192 | from BGL can be reused. | |
193 | ||
194 | ||
195 | <P></P> | |
196 | <DIV ALIGN="CENTER"><A NAME="fig:cuthill-mckee"></A><A NAME="6261"></A> | |
197 | <table border><tr><td> | |
198 | <pre> | |
199 | template < class Graph, class Vertex, class OutputIterator, | |
200 | class Color, class Degree > | |
201 | inline void | |
202 | cuthill_mckee_ordering(Graph& G, | |
203 | Vertex s, | |
204 | OutputIterator inverse_permutation, | |
205 | Color color, Degree degree) | |
206 | { | |
207 | typedef typename property_traits<Degree>::value_type DS; | |
208 | typename property_traits<Color>::value_type c = get(color, s); | |
209 | typedef indirect_cmp<Degree, std::greater<DS> > Compare; | |
210 | Compare comp(degree); | |
211 | fenced_priority_queue<Vertex, Compare > Q(comp); | |
212 | ||
213 | typedef cuthill_mckee_visitor<OutputIterator> CMVisitor; | |
214 | CMVisitor cm_visitor(inverse_permutation); | |
215 | ||
216 | typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end; | |
217 | for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui) | |
218 | put(color, *ui, white(c)); | |
219 | breadth_first_search(G, s, Q, cm_visitor, color); | |
220 | } | |
221 | </pre> | |
222 | </td></tr></table> | |
223 | <CAPTION ALIGN="BOTTOM"> | |
224 | <TABLE> | |
225 | <TR><TD> <STRONG>Figure 2:</STRONG> The BGL implementation of | |
226 | Cuthill-McKee algoithm. </TD></TR> </TABLE></CAPTION> </DIV><P></P> | |
227 | <P> | |
228 | ||
229 | ||
230 | <P> | |
231 | ||
232 | <H3>Minimum Degree Ordering Algorithm</H3> | |
233 | ||
234 | <P> | |
235 | The pattern of another category of high-quality ordering algorithms in | |
236 | wide use is based on a greedy approach such that the ordering is | |
237 | chosen to minimize some quantity at each step of a simulated -step | |
238 | symmetric Gaussian elimination process. The algorithms using such an | |
239 | approach are typically distinguished by their greedy minimization | |
240 | criteria [<a href="bibliography.html#ng-raghavan">34</a>]. | |
241 | ||
242 | <P> | |
243 | In graph terms, the basic ordering process used by most greedy | |
244 | algorithms is as follows: | |
245 | ||
246 | <OL> | |
247 | <LI><EM>Start:</EM> Construct undirected graph <i>G<sup>0</sup></i> | |
248 | corresponding to matrix <i>A</i> | |
249 | ||
250 | </LI> | |
251 | <LI><EM>Iterate:</EM> For <i>k = 1, 2, ... </i> until | |
252 | <i>G<sup>k</sup> = { }</i> do: | |
253 | ||
254 | <UL> | |
255 | <LI>Choose a vertex <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> | |
256 | according to some criterion | |
257 | </LI> | |
258 | <LI>Eliminate <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> to form | |
259 | <i>G<sup>k+1</sup></i> | |
260 | ||
261 | </LI> | |
262 | </UL> | |
263 | </LI> | |
264 | </OL> | |
265 | The resulting ordering is the sequence of vertices <i>{v<sup>0</sup>, | |
266 | v<sup>1</sup>,...}</i> selected by the algorithm. | |
267 | ||
268 | <P> | |
269 | One of the most important examples of such an algorithm is the | |
270 | <I>Minimum Degree</I> algorithm. At each step the minimum degree | |
271 | algorithm chooses the vertex with minimum degree in the corresponding | |
272 | graph as <i>v<sup>k</sup></i>. A number of enhancements to the basic | |
273 | minimum degree algorithm have been developed, such as the use of a | |
274 | quotient graph representation, mass elimination, incomplete degree | |
275 | update, multiple elimination, and external degree. See [<A | |
276 | href="bibliography.html#George:evolution">35</a>] for a historical | |
277 | survey of the minimum degree algorithm. | |
278 | ||
279 | <P> | |
280 | The BGL implementation of the Minimum Degree algorithm closely follows | |
281 | the algorithmic descriptions of the one in [<A | |
282 | HREF="bibliography.html#LIU:MMD">21</A>]. The implementation presently | |
283 | includes the enhancements for mass elimination, incomplete degree | |
284 | update, multiple elimination, and external degree. | |
285 | ||
286 | <P> | |
287 | In particular, we create a graph representation to improve the | |
288 | performance of the algorithm. It is based on a templated ``vector of | |
289 | vectors.'' The vector container used is an adaptor class built on top | |
290 | the STL <TT>std::vector</TT> class. Particular characteristics of this | |
291 | adaptor class include the following: | |
292 | ||
293 | <UL> | |
294 | <LI>Erasing elements does not shrink the associated memory. | |
295 | Adding new elements after erasing will not need to allocate | |
296 | additional memory. | |
297 | </LI> | |
298 | <LI>Additional memory is allocated efficiently on demand when new | |
299 | elements are added (doubling the capacity every time it is | |
300 | increased). This property comes from STL vector. | |
301 | </LI> | |
302 | </UL> | |
303 | ||
304 | <P> | |
305 | Note that this representation is similar to that used in Liu's | |
306 | implementation, with some important differences due to dynamic memory | |
307 | allocation. With the dynamic memory allocation we do not need to | |
308 | over-write portions of the graph that have been eliminated, | |
309 | allowing for a more efficient graph traversal. More importantly, | |
310 | information about the elimination graph is preserved allowing for | |
311 | trivial symbolic factorization. Since symbolic factorization can be | |
312 | an expensive part of the entire solution process, improving its | |
313 | performance can result in significant computational savings. | |
314 | ||
315 | <P> | |
316 | The overhead of dynamic memory allocation could conceivably compromise | |
317 | performance in some cases. However, in practice, memory allocation | |
318 | overhead does not contribute significantly to run-time for our | |
319 | implementation as shown in [] because it is not | |
320 | done very often and the cost gets amortized. | |
321 | ||
322 | <P> | |
323 | ||
324 | <H2>Proper Self-Avoiding Walk</H2> | |
325 | ||
326 | <P> | |
327 | The finite element method (FEM) is a flexible and attractive numerical | |
328 | approach for solving partial differential equations since it is | |
329 | straightforward to handle geometrically complicated domains. However, | |
330 | unstructured meshes generated by FEM does not provide an obvious | |
331 | labeling (numbering) of the unknowns while it is vital to have it for | |
332 | matrix-vector notation of the underlying algebraic equations. Special | |
333 | numbering techniques have been developed to optimize memory usage and | |
334 | locality of such algorithms. One novel technique is the self-avoiding | |
335 | walk []. | |
336 | ||
337 | <P> | |
338 | If we think a mesh is a graph, a self-avoiding walk(SAW) over an | |
339 | arbitrary unstructured two-dimensional mesh is an enumeration of all | |
340 | the triangles of the mesh such that two successive triangles shares an | |
341 | edge or a vertex. A proper SAW is a SAW where jumping twice over the | |
342 | same vertex is forbidden for three consecutive triangles in the walk. | |
343 | it can be used to improve parallel efficiency of several irregular | |
344 | algorithms, in particular issues related to reducing runtime memory | |
345 | access (improving locality) and interprocess communications. The | |
346 | reference [] has proved the existence | |
347 | of A PSAW for any arbitrary triangular mesh by extending an existing | |
348 | PSAW over a small piece of mesh to a larger part. The proof | |
349 | effectively provides a set of rules to construct a PSAW. | |
350 | ||
351 | <P> | |
352 | The algorithm family of constructing a PSAW on a mesh is to start from | |
353 | any random triangle of the mesh, choose new triangles sharing an edge | |
354 | with the current sub-mesh and extend the existing partial PSAW over | |
355 | the new triangles. | |
356 | ||
357 | <P> | |
358 | Let us define a dual graph of a mesh. Let a triangle in the mesh be a | |
359 | vertex and two triangles sharing an edge in the mesh means there is an | |
360 | edge between two vertices in the dual graph. By using a dual graph of | |
361 | a mesh, one way to implement the algorithm family of constructing a | |
362 | PSAW is to reuse BGL algorithms such as BFS and DFS with a customized | |
363 | visitor to provide operations during | |
364 | traversal. The customized visitor has the function <TT>tree_edge()</TT> | |
365 | which is to extend partial PSAW in case by case and function | |
366 | <TT>start_vertex()</TT> which is to set up the PSAW for the starting vertex. | |
367 | ||
368 | <br> | |
369 | <HR> | |
370 | <TABLE> | |
371 | <TR valign=top> | |
372 | <TD nowrap>Copyright © 2000-2001</TD><TD> | |
373 | <A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>) | |
374 | </TD></TR></TABLE> | |
375 | ||
376 | </BODY> | |
377 | </HTML> |