1 [section:binomial_dist Binomial Distribution]
3 ``#include <boost/math/distributions/binomial.hpp>``
5 namespace boost{ namespace math{
7 template <class RealType = double,
8 class ``__Policy`` = ``__policy_class`` >
9 class binomial_distribution;
11 typedef binomial_distribution<> binomial;
13 template <class RealType, class ``__Policy``>
14 class binomial_distribution
17 typedef RealType value_type;
18 typedef Policy policy_type;
20 static const ``['unspecified-type]`` clopper_pearson_exact_interval;
21 static const ``['unspecified-type]`` jeffreys_prior_interval;
24 binomial_distribution(RealType n, RealType p);
27 RealType success_fraction() const;
28 RealType trials() const;
30 // Bounds on success fraction:
31 static RealType find_lower_bound_on_p(
35 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
36 static RealType find_upper_bound_on_p(
40 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
42 // estimate min/max number of trials:
43 static RealType find_minimum_number_of_trials(
44 RealType k, // number of events
45 RealType p, // success fraction
46 RealType alpha); // risk level
48 static RealType find_maximum_number_of_trials(
49 RealType k, // number of events
50 RealType p, // success fraction
51 RealType alpha); // risk level
56 The class type `binomial_distribution` represents a
57 [@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]:
58 it is used when there are exactly two mutually
59 exclusive outcomes of a trial. These outcomes are labelled
60 "success" and "failure". The
61 __binomial_distrib is used to obtain
62 the probability of observing k successes in N trials, with the
63 probability of success on a single trial denoted by p. The
64 binomial distribution assumes that p is fixed for all trials.
66 [note The random variable for the binomial distribution is the number of successes,
67 (the number of trials is a fixed property of the distribution)
68 whereas for the negative binomial,
69 the random variable is the number of trials, for a fixed number of successes.]
71 The PDF for the binomial distribution is given by:
73 [equation binomial_ref2]
75 The following two graphs illustrate how the PDF changes depending
76 upon the distributions parameters, first we'll keep the success
77 fraction /p/ fixed at 0.5, and vary the sample size:
79 [graph binomial_pdf_1]
81 Alternatively, we can keep the sample size fixed at N=20 and
82 vary the success fraction /p/:
84 [graph binomial_pdf_2]
86 [discrete_quantile_warning Binomial]
92 binomial_distribution(RealType n, RealType p);
94 Constructor: /n/ is the total number of trials, /p/ is the
95 probability of success of a single trial.
97 Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error.
101 RealType success_fraction() const;
103 Returns the parameter /p/ from which this distribution was constructed.
105 RealType trials() const;
107 Returns the parameter /n/ from which this distribution was constructed.
109 [h5 Lower Bound on the Success Fraction]
111 static RealType find_lower_bound_on_p(
115 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
117 Returns a lower bound on the success fraction:
120 [[trials][The total number of trials conducted.]]
121 [[successes][The number of successes that occurred.]]
122 [[alpha][The largest acceptable probability that the true value of
123 the success fraction is [*less than] the value returned.]]
124 [[method][An optional parameter that specifies the method to be used
125 to compute the interval (See below).]]
128 For example, if you observe /k/ successes from /n/ trials the
129 best estimate for the success fraction is simply ['k/n], but if you
130 want to be 95% sure that the true value is [*greater than] some value,
133 p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(
136 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
138 There are currently two possible values available for the /method/
139 optional parameter: /clopper_pearson_exact_interval/
140 or /jeffreys_prior_interval/. These constants are both members of
141 class template `binomial_distribution`, so usage is for example:
143 p = binomial_distribution<RealType>::find_lower_bound_on_p(
144 n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval);
146 The default method if this parameter is not specified is the Clopper Pearson
147 "exact" interval. This produces an interval that guarantees at least
148 `100(1-alpha)%` coverage, but which is known to be overly conservative,
149 sometimes producing intervals with much greater than the requested coverage.
151 The alternative calculation method produces a non-informative
152 Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only
153 ['in the average case], though is typically very close to the requested
154 coverage level. It is one of the main methods of calculation recommended
155 in the review by Brown, Cai and DasGupta.
157 Please note that the "textbook" calculation method using
158 a normal approximation (the Wald interval) is deliberately
159 not provided: it is known to produce consistently poor results,
160 even when the sample size is surprisingly large.
161 Refer to Brown, Cai and DasGupta for a full explanation. Many other methods
162 of calculation are available, and may be more appropriate for specific
163 situations. Unfortunately there appears to be no consensus amongst
164 statisticians as to which is "best": refer to the discussion at the end of
165 Brown, Cai and DasGupta for examples.
167 The two methods provided here were chosen principally because they
168 can be used for both one and two sided intervals.
171 Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001),
172 Interval Estimation for a Binomial Proportion,
173 Statistical Science, Vol. 16, No. 2, 101-133.
176 One-sided confidence intervals in discrete distributions,
177 Journal of Statistical Planning and Inference 131, 63-88.
179 Agresti, A. and Coull, B. A. (1998). Approximate is better than
180 "exact" for interval estimation of binomial proportions. Amer.
183 Clopper, C. J. and Pearson, E. S. (1934). The use of confidence
184 or fiducial limits illustrated in the case of the binomial.
185 Biometrika 26 404-413.
187 [h5 Upper Bound on the Success Fraction]
189 static RealType find_upper_bound_on_p(
193 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
195 Returns an upper bound on the success fraction:
198 [[trials][The total number of trials conducted.]]
199 [[successes][The number of successes that occurred.]]
200 [[alpha][The largest acceptable probability that the true value of
201 the success fraction is [*greater than] the value returned.]]
202 [[method][An optional parameter that specifies the method to be used
203 to compute the interval. Refer to the documentation for
204 `find_upper_bound_on_p` above for the meaning of the
208 For example, if you observe /k/ successes from /n/ trials the
209 best estimate for the success fraction is simply ['k/n], but if you
210 want to be 95% sure that the true value is [*less than] some value,
213 p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(
216 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
219 In order to obtain a two sided bound on the success fraction, you
220 call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p`
221 each with the same arguments.
223 If the desired risk level
224 that the true success fraction lies outside the bounds is [alpha],
225 then you pass [alpha]/2 to these functions.
227 So for example a two sided 95% confidence interval would be obtained
228 by passing [alpha] = 0.025 to each of the functions.
230 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
234 [h5 Estimating the Number of Trials Required for a Certain Number of Successes]
236 static RealType find_minimum_number_of_trials(
237 RealType k, // number of events
238 RealType p, // success fraction
239 RealType alpha); // probability threshold
241 This function estimates the minimum number of trials required to ensure that
242 more than k events is observed with a level of risk /alpha/ that k or
246 [[k][The number of success observed.]]
247 [[p][The probability of success for each trial.]]
248 [[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
253 binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05);
255 Returns the smallest number of trials we must conduct to be 95% sure
256 of seeing 10 events that occur with frequency one half.
258 [h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes]
260 static RealType find_maximum_number_of_trials(
261 RealType k, // number of events
262 RealType p, // success fraction
263 RealType alpha); // probability threshold
265 This function estimates the maximum number of trials we can conduct
266 to ensure that k successes or fewer are observed, with a risk /alpha/
267 that more than k occur.
270 [[k][The number of success observed.]]
271 [[p][The probability of success for each trial.]]
272 [[alpha][The maximum acceptable probability that more than k events will be observed.]]
277 binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05);
279 Returns the largest number of trials we can conduct and still be 95% certain
280 of not observing any events that occur with one in a million frequency.
281 This is typically used in failure analysis.
283 [link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.]
285 [h4 Non-member Accessors]
287 All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
288 that are generic to all distributions are supported: __usual_accessors.
290 The domain for the random variable /k/ is `0 <= k <= N`, otherwise a
291 __domain_error is returned.
293 It's worth taking a moment to define what these accessors actually mean in
294 the context of this distribution:
296 [table Meaning of the non-member accessors
297 [[Function][Meaning]]
299 [The probability of obtaining [*exactly k successes] from n trials
300 with success fraction p. For example:
302 `pdf(binomial(n, p), k)`]]
304 [The probability of obtaining [*k successes or fewer] from n trials
305 with success fraction p. For example:
307 `cdf(binomial(n, p), k)`]]
309 [The probability of obtaining [*more than k successes] from n trials
310 with success fraction p. For example:
312 `cdf(complement(binomial(n, p), k))`]]
314 [The [*greatest] number of successes that may be observed from n trials
315 with success fraction p, at probability P. Note that the value returned
316 is a real-number, and not an integer. Depending on the use case you may
317 want to take either the floor or ceiling of the result. For example:
319 `quantile(binomial(n, p), P)`]]
321 [The [*smallest] number of successes that may be observed from n trials
322 with success fraction p, at probability P. Note that the value returned
323 is a real-number, and not an integer. Depending on the use case you may
324 want to take either the floor or ceiling of the result. For example:
326 `quantile(complement(binomial(n, p), P))`]]
331 Various [link math_toolkit.stat_tut.weg.binom_eg worked examples]
332 are available illustrating the use of the binomial distribution.
336 This distribution is implemented using the
337 incomplete beta functions __ibeta and __ibetac,
338 please refer to these functions for information on accuracy.
342 In the following table /p/ is the probability that one trial will
343 be successful (the success fraction), /n/ is the number of trials,
344 /k/ is the number of successes, /p/ is the probability and /q = 1-p/.
347 [[Function][Implementation Notes]]
348 [[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial
349 coefficient of a and b, then we have:
351 [equation binomial_ref1]
353 Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)`
355 The function __ibeta_derivative is used here, since it has already
356 been optimised for the lowest possible error - indeed this is really
357 just a thin wrapper around part of the internals of the incomplete
360 There are also various special cases: refer to the code for details.
362 [[cdf][Using the relation:
365 p = I[sub 1-p](n - k, k + 1)
366 = 1 - I[sub p](k + 1, n - k)
367 = __ibetac(k + 1, n - k, p)``
369 There are also various special cases: refer to the code for details.
371 [[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p)
373 There are also various special cases: refer to the code for details. ]]
374 [[quantile][Since the cdf is non-linear in variate /k/ none of the inverse
375 incomplete beta functions can be used here. Instead the quantile
376 is found numerically using a derivative free method
377 (__root_finding_TOMS748).]]
378 [[quantile from the complement][Found numerically as above.]]
380 [[variance][ `p * n * (1-p)` ]]
381 [[mode][`floor(p * (n + 1))`]]
382 [[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]]
383 [[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]]
384 [[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]]
385 [[parameter estimation][The member functions `find_upper_bound_on_p`
386 `find_lower_bound_on_p` and `find_number_of_trials` are
387 implemented in terms of the inverse incomplete beta functions
388 __ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]]
393 * [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource].
394 * [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution].
395 * [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Explorary Data Analysis].
397 [endsect] [/section:binomial_dist Binomial]
400 Copyright 2006 John Maddock and Paul A. Bristow.
401 Distributed under the Boost Software License, Version 1.0.
402 (See accompanying file LICENSE_1_0.txt or copy at
403 http://www.boost.org/LICENSE_1_0.txt).