]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================ |
2 | Boost.odeint | |
3 | ||
4 | Copyright 2013 Karsten Ahnert | |
5 | Copyright 2013 Pascal Germroth | |
6 | Copyright 2013 Mario Mulansky | |
7 | ||
8 | Use, modification and distribution is subject to the Boost Software License, | |
9 | Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at | |
10 | http://www.boost.org/LICENSE_1_0.txt) | |
11 | =============================================================================/] | |
12 | ||
13 | ||
14 | [section Parallel computation with OpenMP and MPI] | |
15 | ||
16 | Parallelization is a key feature for modern numerical libraries due to the vast | |
17 | availability of many cores nowadays, even on Laptops. | |
18 | odeint currently supports parallelization with OpenMP and MPI, as described in | |
19 | the following sections. | |
20 | However, it should be made clear from the beginning that the difficulty of | |
21 | efficiently distributing ODE integration on many cores/machines lies in the | |
22 | parallelization of the system function, which is still the user's | |
23 | responsibility. | |
24 | Simply using a parallel odeint backend without parallelizing the system function | |
25 | will bring you almost no performance gains. | |
26 | ||
27 | [section OpenMP] | |
28 | ||
29 | [import ../examples/openmp/phase_chain.cpp] | |
30 | ||
31 | odeint's OpenMP support is implemented as an external backend, which needs to be | |
32 | manually included. Depending on the compiler some additional flags may be | |
33 | needed, i.e. [^-fopenmp] for GCC. | |
34 | [phase_chain_openmp_header] | |
35 | ||
36 | In the easiest parallelization approach with OpenMP we use a standard `vector` | |
37 | as the state type: | |
38 | [phase_chain_vector_state] | |
39 | ||
40 | We initialize the state with some random data: | |
41 | [phase_chain_init] | |
42 | ||
43 | Now we have to configure the stepper to use the OpenMP backend. | |
44 | This is done by explicitly providing the `openmp_range_algebra` as a template | |
45 | parameter to the stepper. | |
46 | This algebra requires the state type to be a model of Random Access Range and | |
47 | will be used from multiple threads by the algebra. | |
48 | [phase_chain_stepper] | |
49 | ||
50 | Additional to providing the stepper with OpenMP parallelization we also need | |
51 | a parallelized system function to exploit the available cores. | |
52 | Here this is shown for a simple one-dimensional chain of phase oscillators with | |
53 | nearest neighbor coupling: | |
54 | [phase_chain_rhs] | |
55 | ||
56 | [note In the OpenMP backends the system function will always be called | |
57 | sequentially from the thread used to start the integration.] | |
58 | ||
59 | Finally, we perform the integration by using one of the integrate functions from | |
60 | odeint. | |
61 | As you can see, the parallelization is completely hidden in the stepper and the | |
62 | system function. | |
63 | OpenMP will take care of distributing the work among the threads and join them | |
64 | automatically. | |
65 | [phase_chain_integrate] | |
66 | ||
67 | After integrating, the data can be accessed immediately and be processed | |
68 | further. | |
69 | Note, that you can specify the OpenMP scheduling by calling `omp_set_schedule` | |
70 | in the beginning of your program: | |
71 | [phase_chain_scheduling] | |
72 | ||
73 | See [github_link examples/openmp/phase_chain.cpp | |
74 | openmp/phase_chain.cpp] for the complete example. | |
75 | ||
76 | [heading Split state] | |
77 | ||
78 | [import ../examples/openmp/phase_chain_omp_state.cpp] | |
79 | ||
80 | For advanced cases odeint offers another approach to use OpenMP that allows for | |
81 | a more exact control of the parallelization. | |
82 | For example, for odd-sized data where OpenMP's thread boundaries don't match | |
83 | cache lines and hurt performance it might be advisable to copy the data from the | |
84 | continuous `vector<T>` into separate, individually aligned, vectors. | |
85 | For this, odeint provides the `openmp_state<T>` type, essentially an alias for | |
86 | `vector<vector<T>>`. | |
87 | ||
88 | Here, the initialization is done with a `vector<double>`, but then we use | |
89 | odeint's `split` function to fill an `openmp_state`. | |
90 | The splitting is done such that the sizes of the individual regions differ at | |
91 | most by 1 to make the computation as uniform as possible. | |
92 | [phase_chain_state_init] | |
93 | ||
94 | Of course, the system function has to be changed to deal with the | |
95 | `openmp_state`. | |
96 | Note that each sub-region of the state is computed in a single task, but at the | |
97 | borders read access to the neighbouring regions is required. | |
98 | [phase_chain_state_rhs] | |
99 | ||
100 | Using the `openmp_state<T>` state type automatically selects `openmp_algebra` | |
101 | which executes odeint's internal computations on parallel regions. | |
102 | Hence, no manual configuration of the stepper is necessary. | |
103 | At the end of the integration, we use `unsplit` to concatenate the sub-regions | |
104 | back together into a single vector. | |
105 | [phase_chain_state_integrate] | |
106 | ||
107 | [note You don't actually need to use `openmp_state<T>` for advanced use cases, | |
108 | `openmp_algebra` is simply an alias for `openmp_nested_algebra<range_algebra>` | |
109 | and supports any model of Random Access Range as the outer, parallel state type, | |
110 | and will use the given algebra on its elements.] | |
111 | ||
112 | See [github_link examples/openmp/phase_chain_omp_state.cpp | |
113 | openmp/phase_chain_omp_state.cpp] for the complete example. | |
114 | ||
115 | [endsect] | |
116 | ||
117 | [section MPI] | |
118 | ||
119 | [import ../examples/mpi/phase_chain.cpp] | |
120 | ||
121 | To expand the parallel computation across multiple machines we can use MPI. | |
122 | ||
123 | The system function implementation is similar to the OpenMP variant with split | |
124 | data, the main difference being that while OpenMP uses a spawn/join model where | |
125 | everything not explicitly paralleled is only executed in the main thread, in | |
126 | MPI's model each node enters the `main()` method independently, diverging based | |
127 | on its rank and synchronizing through message-passing and explicit barriers. | |
128 | ||
129 | odeint's MPI support is implemented as an external backend, too. | |
130 | Depending on the MPI implementation the code might need to be compiled with i.e. | |
131 | [^mpic++]. | |
132 | [phase_chain_mpi_header] | |
133 | ||
134 | Instead of reading another thread's data, we asynchronously send and receive the | |
135 | relevant data from neighbouring nodes, performing some computation in the interim | |
136 | to hide the latency. | |
137 | [phase_chain_mpi_rhs] | |
138 | ||
139 | Analogous to `openmp_state<T>` we use `mpi_state< InnerState<T> >`, which | |
140 | automatically selects `mpi_nested_algebra` and the appropriate MPI-oblivious | |
141 | inner algebra (since our inner state is a `vector`, the inner algebra will be | |
142 | `range_algebra` as in the OpenMP example). | |
143 | [phase_chain_state] | |
144 | ||
145 | In the main program we construct a `communicator` which tells us the `size` of | |
146 | the cluster and the current node's `rank` within that. | |
147 | We generate the input data on the master node only, avoiding unnecessary work on | |
148 | the other nodes. | |
149 | Instead of simply copying chunks, `split` acts as a MPI collective function here | |
150 | and sends/receives regions from master to each slave. | |
151 | The input argument is ignored on the slaves, but the master node receives | |
152 | a region in its output and will participate in the computation. | |
153 | [phase_chain_mpi_init] | |
154 | ||
155 | Now that `x_split` contains (only) the local chunk for each node, we start the | |
156 | integration. | |
157 | ||
158 | To print the result on the master node, we send the processed data back using | |
159 | `unsplit`. | |
160 | [phase_chain_mpi_integrate] | |
161 | ||
162 | [note `mpi_nested_algebra::for_each`[~N] doesn't use any MPI constructs, it | |
163 | simply calls the inner algebra on the local chunk and the system function is not | |
164 | guarded by any barriers either, so if you don't manually place any (for example | |
165 | in parameter studies cases where the elements are completely independent) you | |
166 | might see the nodes diverging, returning from this call at different times.] | |
167 | ||
168 | See [github_link examples/mpi/phase_chain.cpp | |
169 | mpi/phase_chain.cpp] for the complete example. | |
170 | ||
171 | [endsect] | |
172 | ||
173 | [section Concepts] | |
174 | ||
175 | [section MPI State] | |
176 | As used by `mpi_nested_algebra`. | |
177 | [heading Notation] | |
178 | [variablelist | |
179 | [[`InnerState`] [The inner state type]] | |
180 | [[`State`] [The MPI-state type]] | |
181 | [[`state`] [Object of type `State`]] | |
182 | [[`world`] [Object of type `boost::mpi::communicator`]] | |
183 | ] | |
184 | [heading Valid Expressions] | |
185 | [table | |
186 | [[Name] [Expression] [Type] [Semantics]] | |
187 | [[Construct a state with a communicator] | |
188 | [`State(world)`] [`State`] [Constructs the State.]] | |
189 | [[Construct a state with the default communicator] | |
190 | [`State()`] [`State`] [Constructs the State.]] | |
191 | [[Get the current node's inner state] | |
192 | [`state()`] [`InnerState`] [Returns a (const) reference.]] | |
193 | [[Get the communicator] | |
194 | [`state.world`] [`boost::mpi::communicator`] [See __boost_mpi.]] | |
195 | ] | |
196 | [heading Models] | |
197 | * `mpi_state<InnerState>` | |
198 | ||
199 | [endsect] | |
200 | ||
201 | [section OpenMP Split State] | |
202 | As used by `openmp_nested_algebra`, essentially a Random Access Container with | |
203 | `ValueType = InnerState`. | |
204 | [heading Notation] | |
205 | [variablelist | |
206 | [[`InnerState`] [The inner state type]] | |
207 | [[`State`] [The split state type]] | |
208 | [[`state`] [Object of type `State`]] | |
209 | ] | |
210 | [heading Valid Expressions] | |
211 | [table | |
212 | [[Name] [Expression] [Type] [Semantics]] | |
213 | [[Construct a state for `n` chunks] | |
214 | [`State(n)`] [`State`] [Constructs underlying `vector`.]] | |
215 | [[Get a chunk] | |
216 | [`state[i]`] [`InnerState`] [Accesses underlying `vector`.]] | |
217 | [[Get the number of chunks] | |
218 | [`state.size()`] [`size_type`] [Returns size of underlying `vector`.]] | |
219 | ] | |
220 | [heading Models] | |
221 | * `openmp_state<ValueType>` with `InnerState = vector<ValueType>` | |
222 | ||
223 | [endsect] | |
224 | ||
225 | [section Splitter] | |
226 | [heading Notation] | |
227 | [variablelist | |
228 | [[`Container1`] [The continuous-data container type]] | |
229 | [[`x`] [Object of type `Container1`]] | |
230 | [[`Container2`] [The chunked-data container type]] | |
231 | [[`y`] [Object of type `Container2`]] | |
232 | ] | |
233 | [heading Valid Expressions] | |
234 | [table | |
235 | [[Name] [Expression] [Type] [Semantics]] | |
236 | [[Copy chunks of input to output elements] | |
237 | [`split(x, y)`] [`void`] | |
238 | [Calls `split_impl<Container1, Container2>::split(x, y)`, splits `x` into | |
239 | `y.size()` chunks.]] | |
240 | [[Join chunks of input elements to output] | |
241 | [`unsplit(y, x)`] [`void`] | |
242 | [Calls `unsplit_impl<Container2, Container1>::unsplit(y, x)`, assumes `x` | |
243 | is of the correct size ['__sigma `y[i].size()`], does not resize `x`.]] | |
244 | ] | |
245 | [heading Models] | |
246 | * defined for `Container1` = __boost_range and `Container2 = openmp_state` | |
247 | * and `Container2 = mpi_state`. | |
248 | ||
249 | To implement splitters for containers incompatible with __boost_range, | |
250 | specialize the `split_impl` and `unsplit_impl` types: | |
251 | ``` | |
252 | template< class Container1, class Container2 , class Enabler = void > | |
253 | struct split_impl { | |
254 | static void split( const Container1 &from , Container2 &to ); | |
255 | }; | |
256 | ||
257 | template< class Container2, class Container1 , class Enabler = void > | |
258 | struct unsplit_impl { | |
259 | static void unsplit( const Container2 &from , Container1 &to ); | |
260 | }; | |
261 | ``` | |
262 | [endsect] | |
263 | ||
264 | [endsect] | |
265 | ||
266 | [endsect] |