ceph/src/boost/libs/math/doc/distributions/binomial_example.qbk

   1 [section:binom_eg Binomial Distribution Examples]
   2
   3 See also the reference documentation for the __binomial_distrib.
   4
   5 [section:binomial_coinflip_example Binomial Coin-Flipping Example]
   6
   7 [import ../../example/binomial_coinflip_example.cpp]
   8 [binomial_coinflip_example1]
   9
  10 See [@../../example/binomial_coinflip_example.cpp binomial_coinflip_example.cpp]
  11 for full source code, the program output looks like this:
  12
  13 [binomial_coinflip_example_output]
  14
  15 [endsect] [/section:binomial_coinflip_example Binomial coinflip example]
  16
  17 [section:binomial_quiz_example Binomial Quiz Example]
  18
  19 [import ../../example/binomial_quiz_example.cpp]
  20 [binomial_quiz_example1]
  21 [binomial_quiz_example2]
  22 [discrete_quantile_real]
  23
  24 See [@../../example/binomial_quiz_example.cpp binomial_quiz_example.cpp]
  25 for full source code and output.
  26
  27 [endsect] [/section:binomial_coinflip_quiz Binomial Coin-Flipping example]
  28
  29 [section:binom_conf Calculating Confidence Limits on the Frequency of Occurrence for a Binomial Distribution]
  30
  31 Imagine you have a process that follows a binomial distribution: for each
  32 trial conducted, an event either occurs or does it does not, referred
  33 to as "successes" and "failures".  If, by experiment, you want to measure the
  34 frequency with which successes occur, the best estimate is given simply
  35 by /k/ \/ /N/, for /k/ successes out of /N/ trials.  However our confidence in that
  36 estimate will be shaped by how many trials were conducted, and how many successes
  37 were observed.  The static member functions
  38 `binomial_distribution<>::find_lower_bound_on_p` and
  39 `binomial_distribution<>::find_upper_bound_on_p` allow you to calculate
  40 the confidence intervals for your estimate of the occurrence frequency.
  41
  42 The sample program [@../../example/binomial_confidence_limits.cpp
  43 binomial_confidence_limits.cpp] illustrates their use.  It begins by defining
  44 a procedure that will print a table of confidence limits for various degrees
  45 of certainty:
  46
  47    #include <iostream>
  48    #include <iomanip>
  49    #include <boost/math/distributions/binomial.hpp>
  50
  51    void confidence_limits_on_frequency(unsigned trials, unsigned successes)
  52    {
  53       //
  54       // trials = Total number of trials.
  55       // successes = Total number of observed successes.
  56       //
  57       // Calculate confidence limits for an observed
  58       // frequency of occurrence that follows a binomial
  59       // distribution.
  60       //
  61       using namespace std;
  62       using namespace boost::math;
  63
  64       // Print out general info:
  65       cout <<
  66          "___________________________________________\n"
  67          "2-Sided Confidence Limits For Success Ratio\n"
  68          "___________________________________________\n\n";
  69       cout << setprecision(7);
  70       cout << setw(40) << left << "Number of Observations" << "=  " << trials << "\n";
  71       cout << setw(40) << left << "Number of successes" << "=  " << successes << "\n";
  72       cout << setw(40) << left << "Sample frequency of occurrence" << "=  " << double(successes) / trials << "\n";
  73
  74 The procedure now defines a table of significance levels: these are the
  75 probabilities that the true occurrence frequency lies outside the calculated
  76 interval:
  77
  78       double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
  79
  80 Some pretty printing of the table header follows:
  81
  82    cout << "\n\n"
  83            "_______________________________________________________________________\n"
  84            "Confidence        Lower CP       Upper CP       Lower JP       Upper JP\n"
  85            " Value (%)        Limit          Limit          Limit          Limit\n"
  86            "_______________________________________________________________________\n";
  87
  88
  89 And now for the important part - the intervals themselves - for each
  90 value of /alpha/, we call `find_lower_bound_on_p` and
  91 `find_lower_upper_on_p` to obtain lower and upper bounds
  92 respectively.  Note that since we are calculating a two-sided interval,
  93 we must divide the value of alpha in two.
  94
  95 Please note that calculating two separate /single sided bounds/, each with risk
  96 level [alpha][space]is not the same thing as calculating a two sided interval.
  97 Had we calculate two single-sided intervals each with a risk
  98 that the true value is outside the interval of [alpha], then:
  99
 100 * The risk that it is less than the lower bound is [alpha].
 101
 102 and
 103
 104 * The risk that it is greater than the upper bound is also [alpha].
 105
 106 So the risk it is outside *upper or lower bound*, is *twice* alpha, and the
 107 probability that it is inside the bounds is therefore not nearly as high as
 108 one might have thought.  This is why [alpha]/2 must be used in
 109 the calculations below.
 110
 111 In contrast, had we been calculating a
 112 single-sided interval, for example: ['"Calculate a lower bound so that we are P%
 113 sure that the true occurrence frequency is greater than some value"]
 114 then we would *not* have divided by two.
 115
 116 Finally note that `binomial_distribution` provides a choice of two
 117 methods for the calculation, we print out the results from both
 118 methods in this example:
 119
 120       for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
 121       {
 122          // Confidence value:
 123          cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
 124          // Calculate Clopper Pearson bounds:
 125          double l = binomial_distribution<>::find_lower_bound_on_p(
 126                         trials, successes, alpha[i]/2);
 127          double u = binomial_distribution<>::find_upper_bound_on_p(
 128                         trials, successes, alpha[i]/2);
 129          // Print Clopper Pearson Limits:
 130          cout << fixed << setprecision(5) << setw(15) << right << l;
 131          cout << fixed << setprecision(5) << setw(15) << right << u;
 132          // Calculate Jeffreys Prior Bounds:
 133          l = binomial_distribution<>::find_lower_bound_on_p(
 134                trials, successes, alpha[i]/2,
 135                binomial_distribution<>::jeffreys_prior_interval);
 136          u = binomial_distribution<>::find_upper_bound_on_p(
 137                trials, successes, alpha[i]/2,
 138                binomial_distribution<>::jeffreys_prior_interval);
 139          // Print Jeffreys Prior Limits:
 140          cout << fixed << setprecision(5) << setw(15) << right << l;
 141          cout << fixed << setprecision(5) << setw(15) << right << u << std::endl;
 142       }
 143       cout << endl;
 144    }
 145
 146 And that's all there is to it.  Let's see some sample output for a 2 in 10
 147 success ratio, first for 20 trials:
 148
 149 [pre'''___________________________________________
 150 2-Sided Confidence Limits For Success Ratio
 151 ___________________________________________
 152
 153 Number of Observations                  =  20
 154 Number of successes                     =  4
 155 Sample frequency of occurrence          =  0.2
 156
 157
 158 _______________________________________________________________________
 159 Confidence        Lower CP       Upper CP       Lower JP       Upper JP
 160  Value (%)        Limit          Limit          Limit          Limit
 161 _______________________________________________________________________
 162     50.000        0.12840        0.29588        0.14974        0.26916
 163     75.000        0.09775        0.34633        0.11653        0.31861
 164     90.000        0.07135        0.40103        0.08734        0.37274
 165     95.000        0.05733        0.43661        0.07152        0.40823
 166     99.000        0.03576        0.50661        0.04655        0.47859
 167     99.900        0.01905        0.58632        0.02634        0.55960
 168     99.990        0.01042        0.64997        0.01530        0.62495
 169     99.999        0.00577        0.70216        0.00901        0.67897
 170 ''']
 171
 172 As you can see, even at the 95% confidence level the bounds are
 173 really quite wide (this example is chosen to be easily compared to the one
 174 in the __handbook
 175 [@http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
 176 here]).  Note also that the Clopper-Pearson calculation method (CP above)
 177 produces quite noticeably more pessimistic estimates than the Jeffreys Prior
 178 method (JP above).
 179
 180
 181 Compare that with the program output for
 182 2000 trials:
 183
 184 [pre'''___________________________________________
 185 2-Sided Confidence Limits For Success Ratio
 186 ___________________________________________
 187
 188 Number of Observations                  =  2000
 189 Number of successes                     =  400
 190 Sample frequency of occurrence          =  0.2000000
 191
 192
 193 _______________________________________________________________________
 194 Confidence        Lower CP       Upper CP       Lower JP       Upper JP
 195  Value (%)        Limit          Limit          Limit          Limit
 196 _______________________________________________________________________
 197     50.000        0.19382        0.20638        0.19406        0.20613
 198     75.000        0.18965        0.21072        0.18990        0.21047
 199     90.000        0.18537        0.21528        0.18561        0.21503
 200     95.000        0.18267        0.21821        0.18291        0.21796
 201     99.000        0.17745        0.22400        0.17769        0.22374
 202     99.900        0.17150        0.23079        0.17173        0.23053
 203     99.990        0.16658        0.23657        0.16681        0.23631
 204     99.999        0.16233        0.24169        0.16256        0.24143
 205 ''']
 206
 207 Now even when the confidence level is very high, the limits are really
 208 quite close to the experimentally calculated value of 0.2.  Furthermore
 209 the difference between the two calculation methods is now really quite small.
 210
 211 [endsect]
 212
 213 [section:binom_size_eg Estimating Sample Sizes for a Binomial Distribution.]
 214
 215 Imagine you have a critical component that you know will fail in 1 in
 216 N "uses" (for some suitable definition of "use").  You may want to schedule
 217 routine replacement of the component so that its chance of failure between
 218 routine replacements is less than P%.  If the failures follow a binomial
 219 distribution (each time the component is "used" it either fails or does not)
 220 then the static member function `binomial_distibution<>::find_maximum_number_of_trials`
 221 can be used to estimate the maximum number of "uses" of that component for some
 222 acceptable risk level /alpha/.
 223
 224 The example program
 225 [@../../example/binomial_sample_sizes.cpp binomial_sample_sizes.cpp]
 226 demonstrates its usage.  It centres on a routine that prints out
 227 a table of maximum sample sizes for various probability thresholds:
 228
 229    void find_max_sample_size(
 230       double p,              // success ratio.
 231       unsigned successes)    // Total number of observed successes permitted.
 232    {
 233
 234 The routine then declares a table of probability thresholds: these are the
 235 maximum acceptable probability that /successes/ or fewer events will be
 236 observed.  In our example, /successes/ will be always zero, since we want
 237 no component failures, but in other situations non-zero values may well
 238 make sense.
 239
 240    double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
 241
 242 Much of the rest of the program is pretty-printing, the important part
 243 is in the calculation of maximum number of permitted trials for each
 244 value of alpha:
 245
 246    for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
 247    {
 248       // Confidence value:
 249       cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
 250       // calculate trials:
 251       double t = binomial::find_maximum_number_of_trials(
 252                      successes, p, alpha[i]);
 253       t = floor(t);
 254       // Print Trials:
 255       cout << fixed << setprecision(5) << setw(15) << right << t << endl;
 256    }
 257
 258 Note that since we're
 259 calculating the maximum number of trials permitted, we'll err on the safe
 260 side and take the floor of the result.  Had we been calculating the
 261 /minimum/ number of trials required to observe a certain number of /successes/
 262 using `find_minimum_number_of_trials` we would have taken the ceiling instead.
 263
 264 We'll finish off by looking at some sample output, firstly for
 265 a 1 in 1000 chance of component failure with each use:
 266
 267 [pre
 268 '''________________________
 269 Maximum Number of Trials
 270 ________________________
 271
 272 Success ratio                           =  0.001
 273 Maximum Number of "successes" permitted =  0
 274
 275
 276 ____________________________
 277 Confidence        Max Number
 278  Value (%)        Of Trials
 279 ____________________________
 280     50.000            692
 281     75.000            287
 282     90.000            105
 283     95.000             51
 284     99.000             10
 285     99.900              0
 286     99.990              0
 287     99.999              0'''
 288 ]
 289
 290 So 51 "uses" of the component would yield a 95% chance that no
 291 component failures would be observed.
 292
 293 Compare that with a 1 in 1 million chance of component failure:
 294
 295 [pre'''
 296 ________________________
 297 Maximum Number of Trials
 298 ________________________
 299
 300 Success ratio                           =  0.0000010
 301 Maximum Number of "successes" permitted =  0
 302
 303
 304 ____________________________
 305 Confidence        Max Number
 306  Value (%)        Of Trials
 307 ____________________________
 308     50.000         693146
 309     75.000         287681
 310     90.000         105360
 311     95.000          51293
 312     99.000          10050
 313     99.900           1000
 314     99.990            100
 315     99.999             10'''
 316 ]
 317
 318 In this case, even 1000 uses of the component would still yield a
 319 less than 1 in 1000 chance of observing a component failure
 320 (i.e. a 99.9% chance of no failure).
 321
 322 [endsect] [/section:binom_size_eg Estimating Sample Sizes for a Binomial Distribution.]
 323
 324 [endsect][/section:binom_eg Binomial Distribution]
 325
 326 [/
 327   Copyright 2006 John Maddock and Paul A. Bristow.
 328   Distributed under the Boost Software License, Version 1.0.
 329   (See accompanying file LICENSE_1_0.txt or copy at
 330   http://www.boost.org/LICENSE_1_0.txt).
 331 ]
 332