]> git.proxmox.com Git - ceph.git/blame - ceph/src/boost/libs/math/doc/distributions/binomial.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / math / doc / distributions / binomial.qbk
CommitLineData
7c673cae
FG
1[section:binomial_dist Binomial Distribution]
2
3``#include <boost/math/distributions/binomial.hpp>``
4
5 namespace boost{ namespace math{
6
7 template <class RealType = double,
8 class ``__Policy`` = ``__policy_class`` >
9 class binomial_distribution;
10
11 typedef binomial_distribution<> binomial;
12
13 template <class RealType, class ``__Policy``>
14 class binomial_distribution
15 {
16 public:
17 typedef RealType value_type;
18 typedef Policy policy_type;
19
20 static const ``['unspecified-type]`` clopper_pearson_exact_interval;
21 static const ``['unspecified-type]`` jeffreys_prior_interval;
22
23 // construct:
24 binomial_distribution(RealType n, RealType p);
25
26 // parameter access::
27 RealType success_fraction() const;
28 RealType trials() const;
29
30 // Bounds on success fraction:
31 static RealType find_lower_bound_on_p(
32 RealType trials,
33 RealType successes,
34 RealType probability,
35 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
36 static RealType find_upper_bound_on_p(
37 RealType trials,
38 RealType successes,
39 RealType probability,
40 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
41
42 // estimate min/max number of trials:
43 static RealType find_minimum_number_of_trials(
44 RealType k, // number of events
45 RealType p, // success fraction
46 RealType alpha); // risk level
47
48 static RealType find_maximum_number_of_trials(
49 RealType k, // number of events
50 RealType p, // success fraction
51 RealType alpha); // risk level
52 };
53
54 }} // namespaces
55
56The class type `binomial_distribution` represents a
57[@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]:
58it is used when there are exactly two mutually
59exclusive outcomes of a trial. These outcomes are labelled
60"success" and "failure". The
61__binomial_distrib is used to obtain
62the probability of observing k successes in N trials, with the
63probability of success on a single trial denoted by p. The
64binomial distribution assumes that p is fixed for all trials.
65
66[note The random variable for the binomial distribution is the number of successes,
67(the number of trials is a fixed property of the distribution)
68whereas for the negative binomial,
69the random variable is the number of trials, for a fixed number of successes.]
70
71The PDF for the binomial distribution is given by:
72
73[equation binomial_ref2]
74
75The following two graphs illustrate how the PDF changes depending
76upon the distributions parameters, first we'll keep the success
77fraction /p/ fixed at 0.5, and vary the sample size:
78
79[graph binomial_pdf_1]
80
81Alternatively, we can keep the sample size fixed at N=20 and
82vary the success fraction /p/:
83
84[graph binomial_pdf_2]
85
86[discrete_quantile_warning Binomial]
87
88[h4 Member Functions]
89
90[h5 Construct]
91
92 binomial_distribution(RealType n, RealType p);
93
94Constructor: /n/ is the total number of trials, /p/ is the
95probability of success of a single trial.
96
97Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error.
98
99[h5 Accessors]
100
101 RealType success_fraction() const;
102
103Returns the parameter /p/ from which this distribution was constructed.
104
105 RealType trials() const;
106
107Returns the parameter /n/ from which this distribution was constructed.
108
109[h5 Lower Bound on the Success Fraction]
110
111 static RealType find_lower_bound_on_p(
112 RealType trials,
113 RealType successes,
114 RealType alpha,
115 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
116
117Returns a lower bound on the success fraction:
118
119[variablelist
120[[trials][The total number of trials conducted.]]
121[[successes][The number of successes that occurred.]]
122[[alpha][The largest acceptable probability that the true value of
123 the success fraction is [*less than] the value returned.]]
124[[method][An optional parameter that specifies the method to be used
125 to compute the interval (See below).]]
126]
127
128For example, if you observe /k/ successes from /n/ trials the
129best estimate for the success fraction is simply ['k/n], but if you
130want to be 95% sure that the true value is [*greater than] some value,
131['p[sub min]], then:
132
133 p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(
134 n, k, 0.05);
135
136[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
137
138There are currently two possible values available for the /method/
139optional parameter: /clopper_pearson_exact_interval/
140or /jeffreys_prior_interval/. These constants are both members of
141class template `binomial_distribution`, so usage is for example:
142
143 p = binomial_distribution<RealType>::find_lower_bound_on_p(
144 n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval);
145
146The default method if this parameter is not specified is the Clopper Pearson
147"exact" interval. This produces an interval that guarantees at least
148`100(1-alpha)%` coverage, but which is known to be overly conservative,
149sometimes producing intervals with much greater than the requested coverage.
150
151The alternative calculation method produces a non-informative
152Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only
153['in the average case], though is typically very close to the requested
154coverage level. It is one of the main methods of calculation recommended
155in the review by Brown, Cai and DasGupta.
156
157Please note that the "textbook" calculation method using
158a normal approximation (the Wald interval) is deliberately
159not provided: it is known to produce consistently poor results,
160even when the sample size is surprisingly large.
161Refer to Brown, Cai and DasGupta for a full explanation. Many other methods
162of calculation are available, and may be more appropriate for specific
163situations. Unfortunately there appears to be no consensus amongst
164statisticians as to which is "best": refer to the discussion at the end of
165Brown, Cai and DasGupta for examples.
166
167The two methods provided here were chosen principally because they
168can be used for both one and two sided intervals.
169See also:
170
171Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001),
172Interval Estimation for a Binomial Proportion,
173Statistical Science, Vol. 16, No. 2, 101-133.
174
175T. Tony Cai (2005),
176One-sided confidence intervals in discrete distributions,
177Journal of Statistical Planning and Inference 131, 63-88.
178
179Agresti, A. and Coull, B. A. (1998). Approximate is better than
180"exact" for interval estimation of binomial proportions. Amer.
181Statist. 52 119-126.
182
183Clopper, C. J. and Pearson, E. S. (1934). The use of confidence
184or fiducial limits illustrated in the case of the binomial.
185Biometrika 26 404-413.
186
187[h5 Upper Bound on the Success Fraction]
188
189 static RealType find_upper_bound_on_p(
190 RealType trials,
191 RealType successes,
192 RealType alpha,
193 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
194
195Returns an upper bound on the success fraction:
196
197[variablelist
198[[trials][The total number of trials conducted.]]
199[[successes][The number of successes that occurred.]]
200[[alpha][The largest acceptable probability that the true value of
201 the success fraction is [*greater than] the value returned.]]
202[[method][An optional parameter that specifies the method to be used
203 to compute the interval. Refer to the documentation for
204 `find_upper_bound_on_p` above for the meaning of the
205 method options.]]
206]
207
208For example, if you observe /k/ successes from /n/ trials the
209best estimate for the success fraction is simply ['k/n], but if you
210want to be 95% sure that the true value is [*less than] some value,
211['p[sub max]], then:
212
213 p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(
214 n, k, 0.05);
215
216[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
217
218[note
219In order to obtain a two sided bound on the success fraction, you
220call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p`
221each with the same arguments.
222
223If the desired risk level
224that the true success fraction lies outside the bounds is [alpha],
225then you pass [alpha]/2 to these functions.
226
227So for example a two sided 95% confidence interval would be obtained
228by passing [alpha] = 0.025 to each of the functions.
229
230[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
231]
232
233
234[h5 Estimating the Number of Trials Required for a Certain Number of Successes]
235
236 static RealType find_minimum_number_of_trials(
237 RealType k, // number of events
238 RealType p, // success fraction
239 RealType alpha); // probability threshold
240
241This function estimates the minimum number of trials required to ensure that
242more than k events is observed with a level of risk /alpha/ that k or
243fewer events occur.
244
245[variablelist
246[[k][The number of success observed.]]
247[[p][The probability of success for each trial.]]
248[[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
249]
250
251For example:
252
253 binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05);
254
255Returns the smallest number of trials we must conduct to be 95% sure
256of seeing 10 events that occur with frequency one half.
257
258[h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes]
259
260 static RealType find_maximum_number_of_trials(
261 RealType k, // number of events
262 RealType p, // success fraction
263 RealType alpha); // probability threshold
264
265This function estimates the maximum number of trials we can conduct
266to ensure that k successes or fewer are observed, with a risk /alpha/
267that more than k occur.
268
269[variablelist
270[[k][The number of success observed.]]
271[[p][The probability of success for each trial.]]
272[[alpha][The maximum acceptable probability that more than k events will be observed.]]
273]
274
275For example:
276
277 binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05);
278
279Returns the largest number of trials we can conduct and still be 95% certain
280of not observing any events that occur with one in a million frequency.
281This is typically used in failure analysis.
282
283[link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.]
284
285[h4 Non-member Accessors]
286
287All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
288that are generic to all distributions are supported: __usual_accessors.
289
290The domain for the random variable /k/ is `0 <= k <= N`, otherwise a
291__domain_error is returned.
292
293It's worth taking a moment to define what these accessors actually mean in
294the context of this distribution:
295
296[table Meaning of the non-member accessors
297[[Function][Meaning]]
298[[__pdf]
299 [The probability of obtaining [*exactly k successes] from n trials
300 with success fraction p. For example:
301
302`pdf(binomial(n, p), k)`]]
303[[__cdf]
304 [The probability of obtaining [*k successes or fewer] from n trials
305 with success fraction p. For example:
306
307`cdf(binomial(n, p), k)`]]
308[[__ccdf]
309 [The probability of obtaining [*more than k successes] from n trials
310 with success fraction p. For example:
311
312`cdf(complement(binomial(n, p), k))`]]
313[[__quantile]
314 [The [*greatest] number of successes that may be observed from n trials
315 with success fraction p, at probability P. Note that the value returned
316 is a real-number, and not an integer. Depending on the use case you may
317 want to take either the floor or ceiling of the result. For example:
318
319`quantile(binomial(n, p), P)`]]
320[[__quantile_c]
321 [The [*smallest] number of successes that may be observed from n trials
322 with success fraction p, at probability P. Note that the value returned
323 is a real-number, and not an integer. Depending on the use case you may
324 want to take either the floor or ceiling of the result. For example:
325
326`quantile(complement(binomial(n, p), P))`]]
327]
328
329[h4 Examples]
330
331Various [link math_toolkit.stat_tut.weg.binom_eg worked examples]
332are available illustrating the use of the binomial distribution.
333
334[h4 Accuracy]
335
336This distribution is implemented using the
337incomplete beta functions __ibeta and __ibetac,
338please refer to these functions for information on accuracy.
339
340[h4 Implementation]
341
342In the following table /p/ is the probability that one trial will
343be successful (the success fraction), /n/ is the number of trials,
344/k/ is the number of successes, /p/ is the probability and /q = 1-p/.
345
346[table
347[[Function][Implementation Notes]]
348[[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial
349 coefficient of a and b, then we have:
350
351[equation binomial_ref1]
352
353Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)`
354
355The function __ibeta_derivative is used here, since it has already
356 been optimised for the lowest possible error - indeed this is really
357 just a thin wrapper around part of the internals of the incomplete
358 beta function.
359
360There are also various special cases: refer to the code for details.
361 ]]
362[[cdf][Using the relation:
363
364``
365p = I[sub 1-p](n - k, k + 1)
366 = 1 - I[sub p](k + 1, n - k)
367 = __ibetac(k + 1, n - k, p)``
368
369There are also various special cases: refer to the code for details.
370]]
371[[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p)
372
373There are also various special cases: refer to the code for details. ]]
374[[quantile][Since the cdf is non-linear in variate /k/ none of the inverse
375 incomplete beta functions can be used here. Instead the quantile
376 is found numerically using a derivative free method
377 (__root_finding_TOMS748).]]
378[[quantile from the complement][Found numerically as above.]]
379[[mean][ `p * n` ]]
380[[variance][ `p * n * (1-p)` ]]
381[[mode][`floor(p * (n + 1))`]]
382[[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]]
383[[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]]
384[[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]]
385[[parameter estimation][The member functions `find_upper_bound_on_p`
386 `find_lower_bound_on_p` and `find_number_of_trials` are
387 implemented in terms of the inverse incomplete beta functions
388 __ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]]
389]
390
391[h4 References]
392
393* [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource].
394* [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution].
395* [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Explorary Data Analysis].
396
397[endsect] [/section:binomial_dist Binomial]
398
399[/ binomial.qbk
400 Copyright 2006 John Maddock and Paul A. Bristow.
401 Distributed under the Boost Software License, Version 1.0.
402 (See accompanying file LICENSE_1_0.txt or copy at
403 http://www.boost.org/LICENSE_1_0.txt).
404]