]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/math/doc/distributions/binomial.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / math / doc / distributions / binomial.qbk
1 [section:binomial_dist Binomial Distribution]
2
3 ``#include <boost/math/distributions/binomial.hpp>``
4
5 namespace boost{ namespace math{
6
7 template <class RealType = double,
8 class ``__Policy`` = ``__policy_class`` >
9 class binomial_distribution;
10
11 typedef binomial_distribution<> binomial;
12
13 template <class RealType, class ``__Policy``>
14 class binomial_distribution
15 {
16 public:
17 typedef RealType value_type;
18 typedef Policy policy_type;
19
20 static const ``['unspecified-type]`` clopper_pearson_exact_interval;
21 static const ``['unspecified-type]`` jeffreys_prior_interval;
22
23 // construct:
24 binomial_distribution(RealType n, RealType p);
25
26 // parameter access::
27 RealType success_fraction() const;
28 RealType trials() const;
29
30 // Bounds on success fraction:
31 static RealType find_lower_bound_on_p(
32 RealType trials,
33 RealType successes,
34 RealType probability,
35 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
36 static RealType find_upper_bound_on_p(
37 RealType trials,
38 RealType successes,
39 RealType probability,
40 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
41
42 // estimate min/max number of trials:
43 static RealType find_minimum_number_of_trials(
44 RealType k, // number of events
45 RealType p, // success fraction
46 RealType alpha); // risk level
47
48 static RealType find_maximum_number_of_trials(
49 RealType k, // number of events
50 RealType p, // success fraction
51 RealType alpha); // risk level
52 };
53
54 }} // namespaces
55
56 The class type `binomial_distribution` represents a
57 [@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]:
58 it is used when there are exactly two mutually
59 exclusive outcomes of a trial. These outcomes are labelled
60 "success" and "failure". The
61 __binomial_distrib is used to obtain
62 the probability of observing k successes in N trials, with the
63 probability of success on a single trial denoted by p. The
64 binomial distribution assumes that p is fixed for all trials.
65
66 [note The random variable for the binomial distribution is the number of successes,
67 (the number of trials is a fixed property of the distribution)
68 whereas for the negative binomial,
69 the random variable is the number of trials, for a fixed number of successes.]
70
71 The PDF for the binomial distribution is given by:
72
73 [equation binomial_ref2]
74
75 The following two graphs illustrate how the PDF changes depending
76 upon the distributions parameters, first we'll keep the success
77 fraction /p/ fixed at 0.5, and vary the sample size:
78
79 [graph binomial_pdf_1]
80
81 Alternatively, we can keep the sample size fixed at N=20 and
82 vary the success fraction /p/:
83
84 [graph binomial_pdf_2]
85
86 [discrete_quantile_warning Binomial]
87
88 [h4 Member Functions]
89
90 [h5 Construct]
91
92 binomial_distribution(RealType n, RealType p);
93
94 Constructor: /n/ is the total number of trials, /p/ is the
95 probability of success of a single trial.
96
97 Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error.
98
99 [h5 Accessors]
100
101 RealType success_fraction() const;
102
103 Returns the parameter /p/ from which this distribution was constructed.
104
105 RealType trials() const;
106
107 Returns the parameter /n/ from which this distribution was constructed.
108
109 [h5 Lower Bound on the Success Fraction]
110
111 static RealType find_lower_bound_on_p(
112 RealType trials,
113 RealType successes,
114 RealType alpha,
115 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
116
117 Returns a lower bound on the success fraction:
118
119 [variablelist
120 [[trials][The total number of trials conducted.]]
121 [[successes][The number of successes that occurred.]]
122 [[alpha][The largest acceptable probability that the true value of
123 the success fraction is [*less than] the value returned.]]
124 [[method][An optional parameter that specifies the method to be used
125 to compute the interval (See below).]]
126 ]
127
128 For example, if you observe /k/ successes from /n/ trials the
129 best estimate for the success fraction is simply ['k/n], but if you
130 want to be 95% sure that the true value is [*greater than] some value,
131 ['p[sub min]], then:
132
133 p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(
134 n, k, 0.05);
135
136 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
137
138 There are currently two possible values available for the /method/
139 optional parameter: /clopper_pearson_exact_interval/
140 or /jeffreys_prior_interval/. These constants are both members of
141 class template `binomial_distribution`, so usage is for example:
142
143 p = binomial_distribution<RealType>::find_lower_bound_on_p(
144 n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval);
145
146 The default method if this parameter is not specified is the Clopper Pearson
147 "exact" interval. This produces an interval that guarantees at least
148 `100(1-alpha)%` coverage, but which is known to be overly conservative,
149 sometimes producing intervals with much greater than the requested coverage.
150
151 The alternative calculation method produces a non-informative
152 Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only
153 ['in the average case], though is typically very close to the requested
154 coverage level. It is one of the main methods of calculation recommended
155 in the review by Brown, Cai and DasGupta.
156
157 Please note that the "textbook" calculation method using
158 a normal approximation (the Wald interval) is deliberately
159 not provided: it is known to produce consistently poor results,
160 even when the sample size is surprisingly large.
161 Refer to Brown, Cai and DasGupta for a full explanation. Many other methods
162 of calculation are available, and may be more appropriate for specific
163 situations. Unfortunately there appears to be no consensus amongst
164 statisticians as to which is "best": refer to the discussion at the end of
165 Brown, Cai and DasGupta for examples.
166
167 The two methods provided here were chosen principally because they
168 can be used for both one and two sided intervals.
169 See also:
170
171 Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001),
172 Interval Estimation for a Binomial Proportion,
173 Statistical Science, Vol. 16, No. 2, 101-133.
174
175 T. Tony Cai (2005),
176 One-sided confidence intervals in discrete distributions,
177 Journal of Statistical Planning and Inference 131, 63-88.
178
179 Agresti, A. and Coull, B. A. (1998). Approximate is better than
180 "exact" for interval estimation of binomial proportions. Amer.
181 Statist. 52 119-126.
182
183 Clopper, C. J. and Pearson, E. S. (1934). The use of confidence
184 or fiducial limits illustrated in the case of the binomial.
185 Biometrika 26 404-413.
186
187 [h5 Upper Bound on the Success Fraction]
188
189 static RealType find_upper_bound_on_p(
190 RealType trials,
191 RealType successes,
192 RealType alpha,
193 ``['unspecified-type]`` method = clopper_pearson_exact_interval);
194
195 Returns an upper bound on the success fraction:
196
197 [variablelist
198 [[trials][The total number of trials conducted.]]
199 [[successes][The number of successes that occurred.]]
200 [[alpha][The largest acceptable probability that the true value of
201 the success fraction is [*greater than] the value returned.]]
202 [[method][An optional parameter that specifies the method to be used
203 to compute the interval. Refer to the documentation for
204 `find_upper_bound_on_p` above for the meaning of the
205 method options.]]
206 ]
207
208 For example, if you observe /k/ successes from /n/ trials the
209 best estimate for the success fraction is simply ['k/n], but if you
210 want to be 95% sure that the true value is [*less than] some value,
211 ['p[sub max]], then:
212
213 p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(
214 n, k, 0.05);
215
216 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
217
218 [note
219 In order to obtain a two sided bound on the success fraction, you
220 call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p`
221 each with the same arguments.
222
223 If the desired risk level
224 that the true success fraction lies outside the bounds is [alpha],
225 then you pass [alpha]/2 to these functions.
226
227 So for example a two sided 95% confidence interval would be obtained
228 by passing [alpha] = 0.025 to each of the functions.
229
230 [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
231 ]
232
233
234 [h5 Estimating the Number of Trials Required for a Certain Number of Successes]
235
236 static RealType find_minimum_number_of_trials(
237 RealType k, // number of events
238 RealType p, // success fraction
239 RealType alpha); // probability threshold
240
241 This function estimates the minimum number of trials required to ensure that
242 more than k events is observed with a level of risk /alpha/ that k or
243 fewer events occur.
244
245 [variablelist
246 [[k][The number of success observed.]]
247 [[p][The probability of success for each trial.]]
248 [[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
249 ]
250
251 For example:
252
253 binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05);
254
255 Returns the smallest number of trials we must conduct to be 95% sure
256 of seeing 10 events that occur with frequency one half.
257
258 [h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes]
259
260 static RealType find_maximum_number_of_trials(
261 RealType k, // number of events
262 RealType p, // success fraction
263 RealType alpha); // probability threshold
264
265 This function estimates the maximum number of trials we can conduct
266 to ensure that k successes or fewer are observed, with a risk /alpha/
267 that more than k occur.
268
269 [variablelist
270 [[k][The number of success observed.]]
271 [[p][The probability of success for each trial.]]
272 [[alpha][The maximum acceptable probability that more than k events will be observed.]]
273 ]
274
275 For example:
276
277 binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05);
278
279 Returns the largest number of trials we can conduct and still be 95% certain
280 of not observing any events that occur with one in a million frequency.
281 This is typically used in failure analysis.
282
283 [link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.]
284
285 [h4 Non-member Accessors]
286
287 All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
288 that are generic to all distributions are supported: __usual_accessors.
289
290 The domain for the random variable /k/ is `0 <= k <= N`, otherwise a
291 __domain_error is returned.
292
293 It's worth taking a moment to define what these accessors actually mean in
294 the context of this distribution:
295
296 [table Meaning of the non-member accessors
297 [[Function][Meaning]]
298 [[__pdf]
299 [The probability of obtaining [*exactly k successes] from n trials
300 with success fraction p. For example:
301
302 `pdf(binomial(n, p), k)`]]
303 [[__cdf]
304 [The probability of obtaining [*k successes or fewer] from n trials
305 with success fraction p. For example:
306
307 `cdf(binomial(n, p), k)`]]
308 [[__ccdf]
309 [The probability of obtaining [*more than k successes] from n trials
310 with success fraction p. For example:
311
312 `cdf(complement(binomial(n, p), k))`]]
313 [[__quantile]
314 [The [*greatest] number of successes that may be observed from n trials
315 with success fraction p, at probability P. Note that the value returned
316 is a real-number, and not an integer. Depending on the use case you may
317 want to take either the floor or ceiling of the result. For example:
318
319 `quantile(binomial(n, p), P)`]]
320 [[__quantile_c]
321 [The [*smallest] number of successes that may be observed from n trials
322 with success fraction p, at probability P. Note that the value returned
323 is a real-number, and not an integer. Depending on the use case you may
324 want to take either the floor or ceiling of the result. For example:
325
326 `quantile(complement(binomial(n, p), P))`]]
327 ]
328
329 [h4 Examples]
330
331 Various [link math_toolkit.stat_tut.weg.binom_eg worked examples]
332 are available illustrating the use of the binomial distribution.
333
334 [h4 Accuracy]
335
336 This distribution is implemented using the
337 incomplete beta functions __ibeta and __ibetac,
338 please refer to these functions for information on accuracy.
339
340 [h4 Implementation]
341
342 In the following table /p/ is the probability that one trial will
343 be successful (the success fraction), /n/ is the number of trials,
344 /k/ is the number of successes, /p/ is the probability and /q = 1-p/.
345
346 [table
347 [[Function][Implementation Notes]]
348 [[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial
349 coefficient of a and b, then we have:
350
351 [equation binomial_ref1]
352
353 Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)`
354
355 The function __ibeta_derivative is used here, since it has already
356 been optimised for the lowest possible error - indeed this is really
357 just a thin wrapper around part of the internals of the incomplete
358 beta function.
359
360 There are also various special cases: refer to the code for details.
361 ]]
362 [[cdf][Using the relation:
363
364 ``
365 p = I[sub 1-p](n - k, k + 1)
366 = 1 - I[sub p](k + 1, n - k)
367 = __ibetac(k + 1, n - k, p)``
368
369 There are also various special cases: refer to the code for details.
370 ]]
371 [[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p)
372
373 There are also various special cases: refer to the code for details. ]]
374 [[quantile][Since the cdf is non-linear in variate /k/ none of the inverse
375 incomplete beta functions can be used here. Instead the quantile
376 is found numerically using a derivative free method
377 (__root_finding_TOMS748).]]
378 [[quantile from the complement][Found numerically as above.]]
379 [[mean][ `p * n` ]]
380 [[variance][ `p * n * (1-p)` ]]
381 [[mode][`floor(p * (n + 1))`]]
382 [[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]]
383 [[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]]
384 [[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]]
385 [[parameter estimation][The member functions `find_upper_bound_on_p`
386 `find_lower_bound_on_p` and `find_number_of_trials` are
387 implemented in terms of the inverse incomplete beta functions
388 __ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]]
389 ]
390
391 [h4 References]
392
393 * [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource].
394 * [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution].
395 * [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Explorary Data Analysis].
396
397 [endsect] [/section:binomial_dist Binomial]
398
399 [/ binomial.qbk
400 Copyright 2006 John Maddock and Paul A. Bristow.
401 Distributed under the Boost Software License, Version 1.0.
402 (See accompanying file LICENSE_1_0.txt or copy at
403 http://www.boost.org/LICENSE_1_0.txt).
404 ]