ceph/src/boost/libs/math/doc/distributions/binomial.qbk

   1 [section:binomial_dist Binomial Distribution]
   2
   3 ``#include <boost/math/distributions/binomial.hpp>``
   4
   5    namespace boost{ namespace math{
   6
   7    template <class RealType = double,
   8              class ``__Policy``   = ``__policy_class`` >
   9    class binomial_distribution;
  10
  11    typedef binomial_distribution<> binomial;
  12
  13    template <class RealType, class ``__Policy``>
  14    class binomial_distribution
  15    {
  16    public:
  17       typedef RealType  value_type;
  18       typedef Policy    policy_type;
  19
  20       static const ``['unspecified-type]`` clopper_pearson_exact_interval;
  21       static const ``['unspecified-type]`` jeffreys_prior_interval;
  22
  23       // construct:
  24       binomial_distribution(RealType n, RealType p);
  25
  26       // parameter access::
  27       RealType success_fraction() const;
  28       RealType trials() const;
  29
  30       // Bounds on success fraction:
  31       static RealType find_lower_bound_on_p(
  32          RealType trials,
  33          RealType successes,
  34          RealType probability,
  35          ``['unspecified-type]`` method = clopper_pearson_exact_interval);
  36       static RealType find_upper_bound_on_p(
  37          RealType trials,
  38          RealType successes,
  39          RealType probability,
  40          ``['unspecified-type]`` method = clopper_pearson_exact_interval);
  41
  42       // estimate min/max number of trials:
  43       static RealType find_minimum_number_of_trials(
  44          RealType k,     // number of events
  45          RealType p,     // success fraction
  46          RealType alpha); // risk level
  47
  48       static RealType find_maximum_number_of_trials(
  49          RealType k,     // number of events
  50          RealType p,     // success fraction
  51          RealType alpha); // risk level
  52    };
  53
  54    }} // namespaces
  55
  56 The class type `binomial_distribution` represents a
  57 [@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]:
  58 it is used when there are exactly two mutually
  59 exclusive outcomes of a trial. These outcomes are labelled
  60 "success" and "failure". The
  61 __binomial_distrib is used to obtain
  62 the probability of observing k successes in N trials, with the
  63 probability of success on a single trial denoted by p. The
  64 binomial distribution assumes that p is fixed for all trials.
  65
  66 [note The random variable for the binomial distribution is the number of successes,
  67 (the number of trials is a fixed property of the distribution)
  68 whereas for the negative binomial,
  69 the random variable is the number of trials, for a fixed number of successes.]
  70
  71 The PDF for the binomial distribution is given by:
  72
  73 [equation binomial_ref2]
  74
  75 The following two graphs illustrate how the PDF changes depending
  76 upon the distributions parameters, first we'll keep the success
  77 fraction /p/ fixed at 0.5, and vary the sample size:
  78
  79 [graph binomial_pdf_1]
  80
  81 Alternatively, we can keep the sample size fixed at N=20 and
  82 vary the success fraction /p/:
  83
  84 [graph binomial_pdf_2]
  85
  86 [discrete_quantile_warning Binomial]
  87
  88 [h4 Member Functions]
  89
  90 [h5 Construct]
  91
  92    binomial_distribution(RealType n, RealType p);
  93
  94 Constructor: /n/ is the total number of trials, /p/ is the
  95 probability of success of a single trial.
  96
  97 Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error.
  98
  99 [h5 Accessors]
 100
 101    RealType success_fraction() const;
 102
 103 Returns the parameter /p/ from which this distribution was constructed.
 104
 105    RealType trials() const;
 106
 107 Returns the parameter /n/ from which this distribution was constructed.
 108
 109 [h5 Lower Bound on the Success Fraction]
 110
 111    static RealType find_lower_bound_on_p(
 112       RealType trials,
 113       RealType successes,
 114       RealType alpha,
 115       ``['unspecified-type]`` method = clopper_pearson_exact_interval);
 116
 117 Returns a lower bound on the success fraction:
 118
 119 [variablelist
 120 [[trials][The total number of trials conducted.]]
 121 [[successes][The number of successes that occurred.]]
 122 [[alpha][The largest acceptable probability that the true value of
 123          the success fraction is [*less than] the value returned.]]
 124 [[method][An optional parameter that specifies the method to be used
 125          to compute the interval (See below).]]
 126 ]
 127
 128 For example, if you observe /k/ successes from /n/ trials the
 129 best estimate for the success fraction is simply ['k/n], but if you
 130 want to be 95% sure that the true value is [*greater than] some value,
 131 ['p[sub min]], then:
 132
 133    p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(
 134                        n, k, 0.05);
 135
 136 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
 137
 138 There are currently two possible values available for the /method/
 139 optional parameter: /clopper_pearson_exact_interval/
 140 or /jeffreys_prior_interval/.  These constants are both members of
 141 class template `binomial_distribution`, so usage is for example:
 142
 143    p = binomial_distribution<RealType>::find_lower_bound_on_p(
 144        n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval);
 145
 146 The default method if this parameter is not specified is the Clopper Pearson
 147 "exact" interval.  This produces an interval that guarantees at least
 148 `100(1-alpha)%` coverage, but which is known to be overly conservative,
 149 sometimes producing intervals with much greater than the requested coverage.
 150
 151 The alternative calculation method produces a non-informative
 152 Jeffreys Prior interval.  It produces `100(1-alpha)%` coverage only
 153 ['in the average case], though is typically very close to the requested
 154 coverage level.  It is one of the main methods of calculation recommended
 155 in the review by Brown, Cai and DasGupta.
 156
 157 Please note that the "textbook" calculation method using
 158 a normal approximation (the Wald interval) is deliberately
 159 not provided: it is known to produce consistently poor results,
 160 even when the sample size is surprisingly large.
 161 Refer to Brown, Cai and DasGupta for a full explanation.  Many other methods
 162 of calculation are available, and may be more appropriate for specific
 163 situations.  Unfortunately there appears to be no consensus amongst
 164 statisticians as to which is "best": refer to the discussion at the end of
 165 Brown, Cai and DasGupta for examples.
 166
 167 The two methods provided here were chosen principally because they
 168 can be used for both one and two sided intervals.
 169 See also:
 170
 171 Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001),
 172 Interval Estimation for a Binomial Proportion,
 173 Statistical Science, Vol. 16, No. 2, 101-133.
 174
 175 T. Tony Cai (2005),
 176 One-sided confidence intervals in discrete distributions,
 177 Journal of Statistical Planning and Inference 131, 63-88.
 178
 179 Agresti, A. and Coull, B. A. (1998). Approximate is better than
 180 "exact" for interval estimation of binomial proportions. Amer.
 181 Statist. 52 119-126.
 182
 183 Clopper, C. J. and Pearson, E. S. (1934). The use of confidence
 184 or fiducial limits illustrated in the case of the binomial.
 185 Biometrika 26 404-413.
 186
 187 [h5 Upper Bound on the Success Fraction]
 188
 189    static RealType find_upper_bound_on_p(
 190       RealType trials,
 191       RealType successes,
 192       RealType alpha,
 193       ``['unspecified-type]`` method = clopper_pearson_exact_interval);
 194
 195 Returns an upper bound on the success fraction:
 196
 197 [variablelist
 198 [[trials][The total number of trials conducted.]]
 199 [[successes][The number of successes that occurred.]]
 200 [[alpha][The largest acceptable probability that the true value of
 201          the success fraction is [*greater than] the value returned.]]
 202 [[method][An optional parameter that specifies the method to be used
 203          to compute the interval. Refer to the documentation for
 204          `find_upper_bound_on_p` above for the meaning of the
 205          method options.]]
 206 ]
 207
 208 For example, if you observe /k/ successes from /n/ trials the
 209 best estimate for the success fraction is simply ['k/n], but if you
 210 want to be 95% sure that the true value is [*less than] some value,
 211 ['p[sub max]], then:
 212
 213    p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(
 214                        n, k, 0.05);
 215
 216 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
 217
 218 [note
 219 In order to obtain a two sided bound on the success fraction, you
 220 call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p`
 221 each with the same arguments.
 222
 223 If the desired risk level
 224 that the true success fraction lies outside the bounds is [alpha],
 225 then you pass [alpha]/2 to these functions.
 226
 227 So for example a two sided 95% confidence interval would be obtained
 228 by passing [alpha] = 0.025 to each of the functions.
 229
 230 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
 231 ]
 232
 233
 234 [h5 Estimating the Number of Trials Required for a Certain Number of Successes]
 235
 236    static RealType find_minimum_number_of_trials(
 237       RealType k,     // number of events
 238       RealType p,     // success fraction
 239       RealType alpha); // probability threshold
 240
 241 This function estimates the minimum number of trials required to ensure that
 242 more than k events is observed with a level of risk /alpha/ that k or
 243 fewer events occur.
 244
 245 [variablelist
 246 [[k][The number of success observed.]]
 247 [[p][The probability of success for each trial.]]
 248 [[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
 249 ]
 250
 251 For example:
 252
 253    binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05);
 254
 255 Returns the smallest number of trials we must conduct to be 95% sure
 256 of seeing 10 events that occur with frequency one half.
 257
 258 [h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes]
 259
 260    static RealType find_maximum_number_of_trials(
 261       RealType k,     // number of events
 262       RealType p,     // success fraction
 263       RealType alpha); // probability threshold
 264
 265 This function estimates the maximum number of trials we can conduct
 266 to ensure that k successes or fewer are observed, with a risk /alpha/
 267 that more than k occur.
 268
 269 [variablelist
 270 [[k][The number of success observed.]]
 271 [[p][The probability of success for each trial.]]
 272 [[alpha][The maximum acceptable probability that more than k events will be observed.]]
 273 ]
 274
 275 For example:
 276
 277    binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05);
 278
 279 Returns the largest number of trials we can conduct and still be 95% certain
 280 of not observing any events that occur with one in a million frequency.
 281 This is typically used in failure analysis.
 282
 283 [link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.]
 284
 285 [h4 Non-member Accessors]
 286
 287 All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
 288 that are generic to all distributions are supported: __usual_accessors.
 289
 290 The domain for the random variable /k/ is `0 <= k <= N`, otherwise a
 291 __domain_error is returned.
 292
 293 It's worth taking a moment to define what these accessors actually mean in
 294 the context of this distribution:
 295
 296 [table Meaning of the non-member accessors
 297 [[Function][Meaning]]
 298 [[__pdf]
 299    [The probability of obtaining [*exactly k successes] from n trials
 300    with success fraction p.  For example:
 301
 302 `pdf(binomial(n, p), k)`]]
 303 [[__cdf]
 304    [The probability of obtaining [*k successes or fewer] from n trials
 305    with success fraction p.  For example:
 306
 307 `cdf(binomial(n, p), k)`]]
 308 [[__ccdf]
 309    [The probability of obtaining [*more than k successes] from n trials
 310    with success fraction p.  For example:
 311
 312 `cdf(complement(binomial(n, p), k))`]]
 313 [[__quantile]
 314    [The [*greatest] number of successes that may be observed from n trials
 315    with success fraction p, at probability P.  Note that the value returned
 316    is a real-number, and not an integer.  Depending on the use case you may
 317    want to take either the floor or ceiling of the result.  For example:
 318
 319 `quantile(binomial(n, p), P)`]]
 320 [[__quantile_c]
 321    [The [*smallest] number of successes that may be observed from n trials
 322    with success fraction p, at probability P.  Note that the value returned
 323    is a real-number, and not an integer.  Depending on the use case you may
 324    want to take either the floor or ceiling of the result. For example:
 325
 326 `quantile(complement(binomial(n, p), P))`]]
 327 ]
 328
 329 [h4 Examples]
 330
 331 Various [link math_toolkit.stat_tut.weg.binom_eg worked examples]
 332 are available illustrating the use of the binomial distribution.
 333
 334 [h4 Accuracy]
 335
 336 This distribution is implemented using the
 337 incomplete beta functions __ibeta and __ibetac,
 338 please refer to these functions for information on accuracy.
 339
 340 [h4 Implementation]
 341
 342 In the following table /p/ is the probability that one trial will
 343 be successful (the success fraction), /n/ is the number of trials,
 344 /k/ is the number of successes, /p/ is the probability and /q = 1-p/.
 345
 346 [table
 347 [[Function][Implementation Notes]]
 348 [[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial
 349        coefficient of a and b, then we have:
 350
 351 [equation binomial_ref1]
 352
 353 Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)`
 354
 355 The function __ibeta_derivative is used here, since it has already
 356        been optimised for the lowest possible error - indeed this is really
 357        just a thin wrapper around part of the internals of the incomplete
 358        beta function.
 359
 360 There are also various special cases: refer to the code for details.
 361        ]]
 362 [[cdf][Using the relation:
 363
 364 ``
 365 p = I[sub 1-p](n - k, k + 1)
 366   = 1 - I[sub p](k + 1, n - k)
 367   = __ibetac(k + 1, n - k, p)``
 368
 369 There are also various special cases: refer to the code for details.
 370 ]]
 371 [[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p)
 372
 373 There are also various special cases: refer to the code for details. ]]
 374 [[quantile][Since the cdf is non-linear in variate /k/ none of the inverse
 375             incomplete beta functions can be used here.  Instead the quantile
 376             is found numerically using a derivative free method
 377             (__root_finding_TOMS748).]]
 378 [[quantile from the complement][Found numerically as above.]]
 379 [[mean][ `p * n` ]]
 380 [[variance][ `p * n * (1-p)` ]]
 381 [[mode][`floor(p * (n + 1))`]]
 382 [[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]]
 383 [[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]]
 384 [[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]]
 385 [[parameter estimation][The member functions `find_upper_bound_on_p`
 386        `find_lower_bound_on_p` and `find_number_of_trials` are
 387        implemented in terms of the inverse incomplete beta functions
 388        __ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]]
 389 ]
 390
 391 [h4 References]
 392
 393 * [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource].
 394 * [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution].
 395 * [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm  NIST Explorary Data Analysis].
 396
 397 [endsect] [/section:binomial_dist Binomial]
 398
 399 [/ binomial.qbk
 400   Copyright 2006 John Maddock and Paul A. Bristow.
 401   Distributed under the Boost Software License, Version 1.0.
 402   (See accompanying file LICENSE_1_0.txt or copy at
 403   http://www.boost.org/LICENSE_1_0.txt).
 404 ]