ceph/src/boost/libs/math/doc/performance/performance.qbk

   1 [template perf[name value] [value]]
   2 [template para[text] '''<para>'''[text]'''</para>''']
   3
   4 [mathpart perf Performance]
   5
   6 [section:perf_over2 Performance Overview]
   7 [performance_overview]
   8 [endsect]
   9
  10 [section:interp Interpreting these Results]
  11
  12 In all of the following tables, the best performing
  13 result in each row, is assigned a relative value of "1" and shown
  14 in bold, so a score of "2" means ['"twice as slow as the best
  15 performing result".]  Actual timings in nano-seconds per function call
  16 are also shown in parenthesis.  To make the results easier to read, they
  17 are color-coded as follows: the best result and everything within 20% of
  18 it is green, anything that's more than twice as slow as the best result is red,
  19 and results in between are blue.
  20
  21 Result were obtained on a system
  22 with an Intel core i7 4710MQ with 16Gb RAM and running
  23 either Windows 8.1 or Xubuntu Linux.
  24
  25 [caution As usual with performance results these should be taken with a large pinch
  26 of salt: relative performance is known to shift quite a bit depending
  27 upon the architecture of the particular test system used.  Further
  28 more, our performance results were obtained using our own test data:
  29 these test values are designed to provide good coverage of our code and test
  30 all the appropriate corner cases.  They do not necessarily represent
  31 "typical" usage: whatever that may be!
  32 ]
  33
  34 [endsect]
  35
  36 [section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
  37
  38 By far the most important thing you can do when using this library
  39 is turn on your compiler's optimisation options.  As the following
  40 table shows the penalty for using the library in debug mode can be
  41 quite large.  In addition switching to 64-bit code has a small but noticeable
  42 improvement in performance, as does switching to a different compiler
  43 (Intel C++ 15 in this example).
  44
  45 [table_Compiler_Option_Comparison_on_Windows_x64]
  46
  47 [endsect] [/section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
  48
  49 [section:tradoffs Trading Accuracy for Performance]
  50
  51 There are a number of [link policy Policies] that can be used to trade accuracy for performance:
  52
  53 * Internal promotion: by default functions with `float` arguments are evaluated at `double` precision
  54 internally to ensure full precision in the result.  Similarly `double` precision functions are
  55 evaluated at `long double` precision internally by default.  Changing these defaults can have a significant
  56 speed advantage at the expense of accuracy, note also that evaluating using `float` internally may result in
  57 numerical instability for some of the more complex algorithms, we suggest you use this option with care.
  58 * Target accuracy: just because you choose to evaluate at `double` precision doesn't mean you necessarily want
  59 to target full 16-digit accuracy, if you wish you can change the default (full machine precision) to whatever
  60 is "good enough" for your particular use case.
  61
  62 For example, suppose you want to evaluate `double` precision functions at `double` precision internally, you
  63 can change the global default by passing `-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false` on the command line, or
  64 at the point of call via something like this:
  65
  66    double val = boost::math::erf(my_argument, boost::math::policies::make_policy(boost::math::policies::promote_double<false>()));
  67
  68 However, an easier option might be:
  69
  70    #include <boost/math/special_functions.hpp> // Or any individual special function header
  71
  72    namespace math{
  73
  74    namespace precise{
  75    //
  76    // Define a Policy for accurate evaluation - this is the same as the default, unless
  77    // someone has changed the global defaults.
  78    //
  79    typedef boost::math::policies::policy<> accurate_policy;
  80    //
  81    // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS to declare
  82    // functions that use the above policy.  Note no trailing
  83    // ";" required on the macro call:
  84    //
  85    BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(accurate_policy)
  86
  87
  88    }
  89
  90    namespace fast{
  91    //
  92    // Define a Policy for fast evaluation:
  93    //
  94    using namespace boost::math::polcies;
  95    typedef policy<promote_double<false> > fast_policy;
  96    //
  97    // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
  98    //
  99    BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
 100
 101    }
 102
 103    }
 104
 105 And now one can call:
 106
 107    math::accurate::tgamma(x);
 108
 109 For the "accurate" version of tgamma, and:
 110
 111    math::fast::tgamma(x);
 112
 113 For the faster version.
 114
 115 Had we wished to change the target precision (to 9 decimal places) as well as the evaluation type used, we might have done:
 116
 117    namespace math{
 118    namespace fast{
 119    //
 120    // Define a Policy for fast evaluation:
 121    //
 122    using namespace boost::math::polcies;
 123    typedef policy<promote_double<false>, digits10<9> > fast_policy;
 124    //
 125    // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
 126    //
 127    BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
 128
 129    }
 130    }
 131
 132 One can do a similar thing with the distribution classes:
 133
 134    #include <boost/math/distributions.hpp> // or any individual distribution header
 135
 136    namespace math{ namespace fast{
 137    //
 138    // Define a policy for fastest possible evaluation:
 139    //
 140    using namespace boost::math::polcies;
 141    typedef policy<promote_float<false> > fast_float_policy;
 142    //
 143    // Invoke BOOST_MATH_DECLARE_DISTRIBUTIONS
 144    //
 145    BOOST_MATH_DECLARE_DISTRIBUTIONS(float, fast_float_policy)
 146
 147    }} // namespaces
 148
 149    //
 150    // And use:
 151    //
 152    float p_val = cdf(math::fast::normal(1.0f, 3.0f), 0.25f);
 153
 154 Here's how these options change the relative performance of the distributions on Linux:
 155
 156 [table_Distribution_performance_comparison_for_different_performance_options_with_GNU_C_version_5_1_0_on_linux]
 157
 158 [endsect] [/section:tradoffs Trading Accuracy for Performance]
 159
 160 [section:multiprecision Cost of High-Precision Non-built-in Floating-point]
 161
 162 Using user-defined floating-point like __multiprecision has a very high run-time cost.
 163
 164 To give some flavour of this:
 165
 166 [table:linpack_time Linpack Benchmark
 167 [[floating-point type]                            [speed Mflops]]
 168 [[double]                                                [2727]]
 169 [[__float128]                                          [35]]
 170 [[multiprecision::float128]                    [35]]
 171 [[multiprecision::cpp_bin_float_quad] [6]]
 172 ]
 173
 174 [endsect] [/section:multiprecision Cost of High-Precision Non-built-in Floating-point]
 175
 176
 177 [section:tuning Performance Tuning Macros]
 178
 179 There are a small number of performance tuning options
 180 that are determined by configuration macros.  These should be set
 181 in boost/math/tools/user.hpp; or else reported to the Boost-development
 182 mailing list so that the appropriate option for a given compiler and
 183 OS platform can be set automatically in our configuration setup.
 184
 185 [table
 186 [[Macro][Meaning]]
 187 [[BOOST_MATH_POLY_METHOD]
 188    [Determines how polynomials and most rational functions
 189    are evaluated.  Define to one
 190    of the values 0, 1, 2 or 3: see below for the meaning of these values.]]
 191 [[BOOST_MATH_RATIONAL_METHOD]
 192    [Determines how symmetrical rational functions are evaluated: mostly
 193    this only effects how the Lanczos approximation is evaluated, and how
 194    the `evaluate_rational` function behaves.  Define to one
 195    of the values 0, 1, 2 or 3: see below for the meaning of these values.
 196    ]]
 197 [[BOOST_MATH_MAX_POLY_ORDER]
 198    [The maximum order of polynomial or rational function that will
 199    be evaluated by a method other than 0 (a simple "for" loop).
 200    ]]
 201 [[BOOST_MATH_INT_TABLE_TYPE(RT, IT)]
 202    [Many of the coefficients to the polynomials and rational functions
 203    used by this library are integers.  Normally these are stored as tables
 204    as integers, but if mixed integer / floating point arithmetic is much
 205    slower than regular floating point arithmetic then they can be stored
 206    as tables of floating point values instead.  If mixed arithmetic is slow
 207    then add:
 208
 209       #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT
 210
 211    to boost/math/tools/user.hpp, otherwise the default of:
 212
 213       #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT
 214
 215    Set in boost/math/config.hpp is fine, and may well result in smaller
 216    code.
 217    ]]
 218 ]
 219
 220 The values to which `BOOST_MATH_POLY_METHOD` and `BOOST_MATH_RATIONAL_METHOD`
 221 may be set are as follows:
 222
 223 [table
 224 [[Value][Effect]]
 225 [[0][The polynomial or rational function is evaluated using Horner's
 226       method, and a simple for-loop.
 227
 228       Note that if the order of the polynomial
 229       or rational function is a runtime parameter, or the order is
 230       greater than the value of `BOOST_MATH_MAX_POLY_ORDER`, then
 231       this method is always used, irrespective of the value
 232       of `BOOST_MATH_POLY_METHOD` or `BOOST_MATH_RATIONAL_METHOD`.]]
 233 [[1][The polynomial or rational function is evaluated without
 234       the use of a loop, and using Horner's method.  This only occurs
 235       if the order of the polynomial is known at compile time and is less
 236       than or equal to `BOOST_MATH_MAX_POLY_ORDER`. ]]
 237 [[2][The polynomial or rational function is evaluated without
 238       the use of a loop, and using a second order Horner's method.
 239       In theory this permits two operations to occur in parallel
 240       for polynomials, and four in parallel for rational functions.
 241       This only occurs
 242       if the order of the polynomial is known at compile time and is less
 243       than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
 244 [[3][The polynomial or rational function is evaluated without
 245       the use of a loop, and using a second order Horner's method.
 246       In theory this permits two operations to occur in parallel
 247       for polynomials, and four in parallel for rational functions.
 248       This differs from method "2" in that the code is carefully ordered
 249       to make the parallelisation more obvious to the compiler: rather than
 250       relying on the compiler's optimiser to spot the parallelisation
 251       opportunities.
 252       This only occurs
 253       if the order of the polynomial is known at compile time and is less
 254       than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
 255 ]
 256
 257 The performance test suite generates a report for your particular compiler showing which method is likely to work best,
 258 the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux).  There's not much to choose between
 259 the various methods, but generally loop-unrolled methods perform better.  Interestingly, ordering the code
 260 to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below).
 261
 262 [table_Polynomial_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 263
 264 [table_Rational_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 265
 266 [table_Polynomial_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
 267
 268 [table_Rational_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
 269
 270 [endsect] [/section:tuning Performance Tuning Macros]
 271
 272 [section:comp_compilers Comparing Different Compilers]
 273
 274 By running our performance test suite multiple times, we can compare the effect of different compilers: as
 275 might be expected, the differences are generally small compared to say disabling internal use of `long double`.
 276 However, there are still gains to be main, particularly from some of the commercial offerings:
 277
 278 [table_Compiler_Comparison_on_Windows_x64]
 279
 280 [table_Compiler_Comparison_on_linux]
 281
 282 [endsect] [/section:comp_compilers Comparing Different Compilers]
 283
 284 [section:comparisons Comparisons to Other Open Source Libraries]
 285
 286 We've run our performance tests both for our own code, and against other
 287 open source implementations of the same functions.  The results are
 288 presented below to give you a rough idea of how they all compare.
 289 In order to give a more-or-less level playing field our test data
 290 was screened against all the libraries being tested, and any
 291 unsupported domains removed, likewise for any test cases that gave large errors
 292 or unexpected non-finite values.
 293
 294 [caution
 295 You should exercise extreme caution when interpreting
 296 these results, relative performance may vary by platform, the tests use
 297 data that gives good code coverage of /our/ code, but which may skew the
 298 results towards the corner cases.  Finally, remember that different
 299 libraries make different choices with regard to performance verses
 300 numerical stability.
 301 ]
 302
 303 The first results compare standard library functions to Boost equivalents with MSVC-14.0:
 304
 305 [table_Library_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 306
 307 On Linux with GCC, we can also compare to the TR1 functions, and to GSL and RMath:
 308
 309 [table_Library_Comparison_with_GNU_C_version_5_1_0_on_linux]
 310
 311 And finally we can compare the statistical distributions to GSL, RMath and DCDFLIB:
 312
 313 [table_Distribution_performance_comparison_with_GNU_C_version_5_1_0_on_linux]
 314
 315 [endsect] [/section:comparisons Comparisons to Other Open Source Libraries]
 316
 317 [section:perf_test_app The Performance Test Applications]
 318
 319 Under ['boost-path]\/libs\/math\/reporting\/performance you will find
 320 some reasonable comprehensive performance test applications for this library.
 321
 322 In order to generate the tables you will have seen in this documentation (or others
 323 for your specific compiler) you need to invoke `bjam` in this directory, using a C++11
 324 capable compiler.  Note that
 325 results extend/overwrite whatever is already present in
 326 ['boost-path]\/libs\/math\/reporting\/performance\/doc\/performance_tables.qbk,
 327 you may want to delete this file before you begin so as to make a fresh start for
 328 your particular system.
 329
 330 The programs produce results in Boost's Quickbook format which is not terribly
 331 human readable.  If you configure your user-config.jam to be able to build Docbook
 332 documentation, then you will also get a full summary of all the data in HTML format
 333 in ['boost-path]\/libs\/math\/reporting\/performance\/html\/index.html.  Assuming
 334 you're on a 'nix-like platform the procedure to do this is to first install the
 335 `xsltproc`, `Docbook DTD`, and `Bookbook XSL` packages.  Then:
 336
 337 * Copy ['boost-path]\/tools\/build\/example\/user-config.jam to your home directory.
 338 * Add `using xsltproc ;` to the end of the file (note the space surrounding each token, including the final ";", this is important!)
 339 This assumes that `xsltproc` is in your path.
 340 * Add `using boostbook : path-to-xsl-stylesheets : path-to-dtd ;` to the end of the file.  The `path-to-dtd` should point
 341 to version 4.2.x of the Docbook DTD, while `path-to-xsl-stylesheets` should point to the folder containing the latest XSLT stylesheets.
 342 Both paths should use all forward slashes even on Windows.
 343
 344 At this point you should be able to run the tests and generate the HTML summary, if GSL, RMath or libstdc++ are
 345 present in the compilers path they will be automatically tested.  For DCDFLIB you will need to place the C
 346 source in ['boost-path]\/libs\/math\/reporting\/performance\/third_party\/dcdflib.
 347
 348 If you want to compare multiple compilers, or multiple options for one compiler, then you will
 349 need to invoke `bjam` multiple times, once for each compiler.  Note that in order to test
 350 multiple configurations of the same compiler, each has to be given a unique name in the test
 351 program, otherwise they all edit the same table cells.  Suppose you want to test GCC with
 352 and without the -ffast-math option, in this case bjam would be invoked first as:
 353
 354    bjam toolset=gcc -a cxxflags=-std=gnu++11
 355
 356 Which would run the tests using default optimization options (-O3), we can then run again
 357 using -ffast-math:
 358
 359    bjam toolset=gcc -a cxxflags='-std=gnu++11 -ffast-math' define=COMPILER_NAME='"GCC with -ffast-math"'
 360
 361 In the command line above, the -a flag forces a full rebuild, and the preprocessor define COMPILER_NAME needs to be set
 362 to a string literal describing the compiler configuration, hence the double quotes - one for the command line, one for the
 363 compiler.
 364
 365 [endsect] [/section:perf_test_app The Performance Test Applications]
 366
 367 [endmathpart]
 368
 369 [/
 370   Copyright 2006 John Maddock and Paul A. Bristow.
 371   Distributed under the Boost Software License, Version 1.0.
 372   (See accompanying file LICENSE_1_0.txt or copy at
 373   http://www.boost.org/LICENSE_1_0.txt).
 374 ]
 375
 376