src/doc/trpl/benchmark-tests.md

   1 % Benchmark tests
   2
   3 Rust supports benchmark tests, which can test the performance of your
   4 code. Let's make our `src/lib.rs` look like this (comments elided):
   5
   6 ```rust,ignore
   7 #![feature(test)]
   8
   9 extern crate test;
  10
  11 pub fn add_two(a: i32) -> i32 {
  12     a + 2
  13 }
  14
  15 #[cfg(test)]
  16 mod tests {
  17     use super::*;
  18     use test::Bencher;
  19
  20     #[test]
  21     fn it_works() {
  22         assert_eq!(4, add_two(2));
  23     }
  24
  25     #[bench]
  26     fn bench_add_two(b: &mut Bencher) {
  27         b.iter(|| add_two(2));
  28     }
  29 }
  30 ```
  31
  32 Note the `test` feature gate, which enables this unstable feature.
  33
  34 We've imported the `test` crate, which contains our benchmarking support.
  35 We have a new function as well, with the `bench` attribute. Unlike regular
  36 tests, which take no arguments, benchmark tests take a `&mut Bencher`. This
  37 `Bencher` provides an `iter` method, which takes a closure. This closure
  38 contains the code we'd like to benchmark.
  39
  40 We can run benchmark tests with `cargo bench`:
  41
  42 ```bash
  43 $ cargo bench
  44    Compiling adder v0.0.1 (file:///home/steve/tmp/adder)
  45      Running target/release/adder-91b3e234d4ed382a
  46
  47 running 2 tests
  48 test tests::it_works ... ignored
  49 test tests::bench_add_two ... bench:         1 ns/iter (+/- 0)
  50
  51 test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured
  52 ```
  53
  54 Our non-benchmark test was ignored. You may have noticed that `cargo bench`
  55 takes a bit longer than `cargo test`. This is because Rust runs our benchmark
  56 a number of times, and then takes the average. Because we're doing so little
  57 work in this example, we have a `1 ns/iter (+/- 0)`, but this would show
  58 the variance if there was one.
  59
  60 Advice on writing benchmarks:
  61
  62
  63 * Move setup code outside the `iter` loop; only put the part you want to measure inside
  64 * Make the code do "the same thing" on each iteration; do not accumulate or change state
  65 * Make the outer function idempotent too; the benchmark runner is likely to run
  66   it many times
  67 *  Make the inner `iter` loop short and fast so benchmark runs are fast and the
  68    calibrator can adjust the run-length at fine resolution
  69 * Make the code in the `iter` loop do something simple, to assist in pinpointing
  70   performance improvements (or regressions)
  71
  72 ## Gotcha: optimizations
  73
  74 There's another tricky part to writing benchmarks: benchmarks compiled with
  75 optimizations activated can be dramatically changed by the optimizer so that
  76 the benchmark is no longer benchmarking what one expects. For example, the
  77 compiler might recognize that some calculation has no external effects and
  78 remove it entirely.
  79
  80 ```rust,ignore
  81 #![feature(test)]
  82
  83 extern crate test;
  84 use test::Bencher;
  85
  86 #[bench]
  87 fn bench_xor_1000_ints(b: &mut Bencher) {
  88     b.iter(|| {
  89         (0..1000).fold(0, |old, new| old ^ new);
  90     });
  91 }
  92 ```
  93
  94 gives the following results
  95
  96 ```text
  97 running 1 test
  98 test bench_xor_1000_ints ... bench:         0 ns/iter (+/- 0)
  99
 100 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 101 ```
 102
 103 The benchmarking runner offers two ways to avoid this. Either, the closure that
 104 the `iter` method receives can return an arbitrary value which forces the
 105 optimizer to consider the result used and ensures it cannot remove the
 106 computation entirely. This could be done for the example above by adjusting the
 107 `b.iter` call to
 108
 109 ```rust
 110 # struct X;
 111 # impl X { fn iter<T, F>(&self, _: F) where F: FnMut() -> T {} } let b = X;
 112 b.iter(|| {
 113     // note lack of `;` (could also use an explicit `return`).
 114     (0..1000).fold(0, |old, new| old ^ new)
 115 });
 116 ```
 117
 118 Or, the other option is to call the generic `test::black_box` function, which
 119 is an opaque "black box" to the optimizer and so forces it to consider any
 120 argument as used.
 121
 122 ```rust
 123 #![feature(test)]
 124
 125 extern crate test;
 126
 127 # fn main() {
 128 # struct X;
 129 # impl X { fn iter<T, F>(&self, _: F) where F: FnMut() -> T {} } let b = X;
 130 b.iter(|| {
 131     let n = test::black_box(1000);
 132
 133     (0..n).fold(0, |a, b| a ^ b)
 134 })
 135 # }
 136 ```
 137
 138 Neither of these read or modify the value, and are very cheap for small values.
 139 Larger values can be passed indirectly to reduce overhead (e.g.
 140 `black_box(&huge_struct)`).
 141
 142 Performing either of the above changes gives the following benchmarking results
 143
 144 ```text
 145 running 1 test
 146 test bench_xor_1000_ints ... bench:       131 ns/iter (+/- 3)
 147
 148 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 149 ```
 150
 151 However, the optimizer can still modify a testcase in an undesirable manner
 152 even when using either of the above.