]> git.proxmox.com Git - rustc.git/blob - src/doc/rust-by-example/src/std_misc/threads/testcase_mapreduce.md
New upstream version 1.25.0+dfsg1
[rustc.git] / src / doc / rust-by-example / src / std_misc / threads / testcase_mapreduce.md
1 # Testcase: map-reduce
2
3 Rust makes it very easy to parallelise data processing, without many of the headaches traditionally associated with such an attempt.
4
5 The standard library provides great threading primitives out of the box.
6 These, combined with Rust's concept of Ownership and aliasing rules, automatically prevent
7 data races.
8
9 The aliasing rules (one writable reference XOR many readable references) automatically prevent
10 you from manipulating state that is visible to other threads. (Where synchronisation is needed,
11 there are synchronisation
12 primitives like `Mutex`es or `Channel`s.)
13
14 In this example, we will calculate the sum of all digits in a block of numbers.
15 We will do this by parcelling out chunks of the block into different threads. Each thread will sum
16 its tiny block of digits, and subsequently we will sum the intermediate sums produced by each
17 thread.
18
19 Note that, although we're passing references across thread boundaries, Rust understands that we're
20 only passing read-only references, and that thus no unsafety or data races can occur. Because
21 we're `move`-ing the data segments into the thread, Rust will also ensure the data is kept alive
22 until the threads exit, so no dangling pointers occur.
23
24 ```rust,editable
25 use std::thread;
26
27 // This is the `main` thread
28 fn main() {
29
30 // This is our data to process.
31 // We will calculate the sum of all digits via a threaded map-reduce algorithm.
32 // Each whitespace separated chunk will be handled in a different thread.
33 //
34 // TODO: see what happens to the output if you insert spaces!
35 let data = "86967897737416471853297327050364959
36 11861322575564723963297542624962850
37 70856234701860851907960690014725639
38 38397966707106094172783238747669219
39 52380795257888236525459303330302837
40 58495327135744041048897885734297812
41 69920216438980873548808413720956532
42 16278424637452589860345374828574668";
43
44 // Make a vector to hold the child-threads which we will spawn.
45 let mut children = vec![];
46
47 /*************************************************************************
48 * "Map" phase
49 *
50 * Divide our data into segments, and apply initial processing
51 ************************************************************************/
52
53 // split our data into segments for individual calculation
54 // each chunk will be a reference (&str) into the actual data
55 let chunked_data = data.split_whitespace();
56
57 // Iterate over the data segments.
58 // .enumerate() adds the current loop index to whatever is iterated
59 // the resulting tuple "(index, element)" is then immediately
60 // "destructured" into two variables, "i" and "data_segment" with a
61 // "destructuring assignment"
62 for (i, data_segment) in chunked_data.enumerate() {
63 println!("data segment {} is \"{}\"", i, data_segment);
64
65 // Process each data segment in a separate thread
66 //
67 // spawn() returns a handle to the new thread,
68 // which we MUST keep to access the returned value
69 //
70 // 'move || -> u32' is syntax for a closure that:
71 // * takes no arguments ('||')
72 // * takes ownership of its captured variables ('move') and
73 // * returns an unsigned 32-bit integer ('-> u32')
74 //
75 // Rust is smart enough to infer the '-> u32' from
76 // the closure itself so we could have left that out.
77 //
78 // TODO: try removing the 'move' and see what happens
79 children.push(thread::spawn(move || -> u32 {
80 // Calculate the intermediate sum of this segment:
81 let result = data_segment
82 // iterate over the characters of our segment..
83 .chars()
84 // .. convert text-characters to their number value..
85 .map(|c| c.to_digit(10).expect("should be a digit"))
86 // .. and sum the resulting iterator of numbers
87 .sum();
88
89 // println! locks stdout, so no text-interleaving occurs
90 println!("processed segment {}, result={}", i, result);
91
92 // "return" not needed, because Rust is an "expression language", the
93 // last evaluated expression in each block is automatically its value.
94 result
95
96 }));
97 }
98
99
100 /*************************************************************************
101 * "Reduce" phase
102 *
103 * Collect our intermediate results, and combine them into a final result
104 ************************************************************************/
105
106 // collect each thread's intermediate results into a new Vec
107 let mut intermediate_sums = vec![];
108 for child in children {
109 // collect each child thread's return-value
110 let intermediate_sum = child.join().unwrap();
111 intermediate_sums.push(intermediate_sum);
112 }
113
114 // combine all intermediate sums into a single final sum.
115 //
116 // we use the "turbofish" ::<> to provide sum() with a type hint.
117 //
118 // TODO: try without the turbofish, by instead explicitly
119 // specifying the type of intermediate_sums
120 let final_result = intermediate_sums.iter().sum::<u32>();
121
122 println!("Final sum result: {}", final_result);
123 }
124
125
126 ```
127
128 ### Assignments
129 It is not wise to let our number of threads depend on user inputted data.
130 What if the user decides to insert a lot of spaces? Do we _really_ want to spawn 2,000 threads?
131 Modify the program so that the data is always chunked into a limited number of chunks,
132 defined by a static constant at the beginning of the program.
133
134 ### See also:
135 * [Threads][thread]
136 * [vectors][vectors] and [iterators][iterators]
137 * [closures][closures], [move][move] semantics and [`move` closures][move_closure]
138 * [destructuring][destructuring] assignments
139 * [turbofish notation][turbofish] to help type inference
140 * [unwrap vs. expect][unwrap]
141 * [enumerate][enumerate]
142
143 [thread]: std_misc/threads.html
144 [vectors]: std/vec.html
145 [iterators]: trait/iter.html
146 [destructuring]: https://doc.rust-lang.org/book/second-edition/ch18-03-pattern-syntax.html#destructuring-to-break-apart-values
147 [closures]: fn/closures.html
148 [move]: scope/move.html
149 [move_closure]: https://doc.rust-lang.org/book/second-edition/ch13-01-closures.html#closures-can-capture-their-environment
150 [turbofish]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect
151 [unwrap]: error/option_unwrap.html
152 [enumerate]: https://doc.rust-lang.org/book/loops.html#enumerate