]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [section:binomial_dist Binomial Distribution] |
2 | ||
3 | ``#include <boost/math/distributions/binomial.hpp>`` | |
4 | ||
5 | namespace boost{ namespace math{ | |
6 | ||
7 | template <class RealType = double, | |
8 | class ``__Policy`` = ``__policy_class`` > | |
9 | class binomial_distribution; | |
10 | ||
11 | typedef binomial_distribution<> binomial; | |
12 | ||
13 | template <class RealType, class ``__Policy``> | |
14 | class binomial_distribution | |
15 | { | |
16 | public: | |
17 | typedef RealType value_type; | |
18 | typedef Policy policy_type; | |
19 | ||
20 | static const ``['unspecified-type]`` clopper_pearson_exact_interval; | |
21 | static const ``['unspecified-type]`` jeffreys_prior_interval; | |
22 | ||
23 | // construct: | |
24 | binomial_distribution(RealType n, RealType p); | |
25 | ||
26 | // parameter access:: | |
27 | RealType success_fraction() const; | |
28 | RealType trials() const; | |
29 | ||
30 | // Bounds on success fraction: | |
31 | static RealType find_lower_bound_on_p( | |
32 | RealType trials, | |
33 | RealType successes, | |
34 | RealType probability, | |
35 | ``['unspecified-type]`` method = clopper_pearson_exact_interval); | |
36 | static RealType find_upper_bound_on_p( | |
37 | RealType trials, | |
38 | RealType successes, | |
39 | RealType probability, | |
40 | ``['unspecified-type]`` method = clopper_pearson_exact_interval); | |
41 | ||
42 | // estimate min/max number of trials: | |
43 | static RealType find_minimum_number_of_trials( | |
44 | RealType k, // number of events | |
45 | RealType p, // success fraction | |
46 | RealType alpha); // risk level | |
47 | ||
48 | static RealType find_maximum_number_of_trials( | |
49 | RealType k, // number of events | |
50 | RealType p, // success fraction | |
51 | RealType alpha); // risk level | |
52 | }; | |
53 | ||
54 | }} // namespaces | |
55 | ||
56 | The class type `binomial_distribution` represents a | |
57 | [@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]: | |
58 | it is used when there are exactly two mutually | |
59 | exclusive outcomes of a trial. These outcomes are labelled | |
60 | "success" and "failure". The | |
61 | __binomial_distrib is used to obtain | |
62 | the probability of observing k successes in N trials, with the | |
63 | probability of success on a single trial denoted by p. The | |
64 | binomial distribution assumes that p is fixed for all trials. | |
65 | ||
66 | [note The random variable for the binomial distribution is the number of successes, | |
67 | (the number of trials is a fixed property of the distribution) | |
68 | whereas for the negative binomial, | |
69 | the random variable is the number of trials, for a fixed number of successes.] | |
70 | ||
71 | The PDF for the binomial distribution is given by: | |
72 | ||
73 | [equation binomial_ref2] | |
74 | ||
75 | The following two graphs illustrate how the PDF changes depending | |
76 | upon the distributions parameters, first we'll keep the success | |
77 | fraction /p/ fixed at 0.5, and vary the sample size: | |
78 | ||
79 | [graph binomial_pdf_1] | |
80 | ||
81 | Alternatively, we can keep the sample size fixed at N=20 and | |
82 | vary the success fraction /p/: | |
83 | ||
84 | [graph binomial_pdf_2] | |
85 | ||
86 | [discrete_quantile_warning Binomial] | |
87 | ||
88 | [h4 Member Functions] | |
89 | ||
90 | [h5 Construct] | |
91 | ||
92 | binomial_distribution(RealType n, RealType p); | |
93 | ||
94 | Constructor: /n/ is the total number of trials, /p/ is the | |
95 | probability of success of a single trial. | |
96 | ||
97 | Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error. | |
98 | ||
99 | [h5 Accessors] | |
100 | ||
101 | RealType success_fraction() const; | |
102 | ||
103 | Returns the parameter /p/ from which this distribution was constructed. | |
104 | ||
105 | RealType trials() const; | |
106 | ||
107 | Returns the parameter /n/ from which this distribution was constructed. | |
108 | ||
109 | [h5 Lower Bound on the Success Fraction] | |
110 | ||
111 | static RealType find_lower_bound_on_p( | |
112 | RealType trials, | |
113 | RealType successes, | |
114 | RealType alpha, | |
115 | ``['unspecified-type]`` method = clopper_pearson_exact_interval); | |
116 | ||
117 | Returns a lower bound on the success fraction: | |
118 | ||
119 | [variablelist | |
120 | [[trials][The total number of trials conducted.]] | |
121 | [[successes][The number of successes that occurred.]] | |
122 | [[alpha][The largest acceptable probability that the true value of | |
123 | the success fraction is [*less than] the value returned.]] | |
124 | [[method][An optional parameter that specifies the method to be used | |
125 | to compute the interval (See below).]] | |
126 | ] | |
127 | ||
128 | For example, if you observe /k/ successes from /n/ trials the | |
129 | best estimate for the success fraction is simply ['k/n], but if you | |
130 | want to be 95% sure that the true value is [*greater than] some value, | |
131 | ['p[sub min]], then: | |
132 | ||
133 | p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p( | |
134 | n, k, 0.05); | |
135 | ||
136 | [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.] | |
137 | ||
138 | There are currently two possible values available for the /method/ | |
139 | optional parameter: /clopper_pearson_exact_interval/ | |
140 | or /jeffreys_prior_interval/. These constants are both members of | |
141 | class template `binomial_distribution`, so usage is for example: | |
142 | ||
143 | p = binomial_distribution<RealType>::find_lower_bound_on_p( | |
144 | n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval); | |
145 | ||
146 | The default method if this parameter is not specified is the Clopper Pearson | |
147 | "exact" interval. This produces an interval that guarantees at least | |
148 | `100(1-alpha)%` coverage, but which is known to be overly conservative, | |
149 | sometimes producing intervals with much greater than the requested coverage. | |
150 | ||
151 | The alternative calculation method produces a non-informative | |
152 | Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only | |
153 | ['in the average case], though is typically very close to the requested | |
154 | coverage level. It is one of the main methods of calculation recommended | |
155 | in the review by Brown, Cai and DasGupta. | |
156 | ||
157 | Please note that the "textbook" calculation method using | |
158 | a normal approximation (the Wald interval) is deliberately | |
159 | not provided: it is known to produce consistently poor results, | |
160 | even when the sample size is surprisingly large. | |
161 | Refer to Brown, Cai and DasGupta for a full explanation. Many other methods | |
162 | of calculation are available, and may be more appropriate for specific | |
163 | situations. Unfortunately there appears to be no consensus amongst | |
164 | statisticians as to which is "best": refer to the discussion at the end of | |
165 | Brown, Cai and DasGupta for examples. | |
166 | ||
167 | The two methods provided here were chosen principally because they | |
168 | can be used for both one and two sided intervals. | |
169 | See also: | |
170 | ||
171 | Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001), | |
172 | Interval Estimation for a Binomial Proportion, | |
173 | Statistical Science, Vol. 16, No. 2, 101-133. | |
174 | ||
175 | T. Tony Cai (2005), | |
176 | One-sided confidence intervals in discrete distributions, | |
177 | Journal of Statistical Planning and Inference 131, 63-88. | |
178 | ||
179 | Agresti, A. and Coull, B. A. (1998). Approximate is better than | |
180 | "exact" for interval estimation of binomial proportions. Amer. | |
181 | Statist. 52 119-126. | |
182 | ||
183 | Clopper, C. J. and Pearson, E. S. (1934). The use of confidence | |
184 | or fiducial limits illustrated in the case of the binomial. | |
185 | Biometrika 26 404-413. | |
186 | ||
187 | [h5 Upper Bound on the Success Fraction] | |
188 | ||
189 | static RealType find_upper_bound_on_p( | |
190 | RealType trials, | |
191 | RealType successes, | |
192 | RealType alpha, | |
193 | ``['unspecified-type]`` method = clopper_pearson_exact_interval); | |
194 | ||
195 | Returns an upper bound on the success fraction: | |
196 | ||
197 | [variablelist | |
198 | [[trials][The total number of trials conducted.]] | |
199 | [[successes][The number of successes that occurred.]] | |
200 | [[alpha][The largest acceptable probability that the true value of | |
201 | the success fraction is [*greater than] the value returned.]] | |
202 | [[method][An optional parameter that specifies the method to be used | |
203 | to compute the interval. Refer to the documentation for | |
204 | `find_upper_bound_on_p` above for the meaning of the | |
205 | method options.]] | |
206 | ] | |
207 | ||
208 | For example, if you observe /k/ successes from /n/ trials the | |
209 | best estimate for the success fraction is simply ['k/n], but if you | |
210 | want to be 95% sure that the true value is [*less than] some value, | |
211 | ['p[sub max]], then: | |
212 | ||
213 | p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p( | |
214 | n, k, 0.05); | |
215 | ||
216 | [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.] | |
217 | ||
218 | [note | |
219 | In order to obtain a two sided bound on the success fraction, you | |
220 | call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p` | |
221 | each with the same arguments. | |
222 | ||
223 | If the desired risk level | |
224 | that the true success fraction lies outside the bounds is [alpha], | |
225 | then you pass [alpha]/2 to these functions. | |
226 | ||
227 | So for example a two sided 95% confidence interval would be obtained | |
228 | by passing [alpha] = 0.025 to each of the functions. | |
229 | ||
230 | [link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.] | |
231 | ] | |
232 | ||
233 | ||
234 | [h5 Estimating the Number of Trials Required for a Certain Number of Successes] | |
235 | ||
236 | static RealType find_minimum_number_of_trials( | |
237 | RealType k, // number of events | |
238 | RealType p, // success fraction | |
239 | RealType alpha); // probability threshold | |
240 | ||
241 | This function estimates the minimum number of trials required to ensure that | |
242 | more than k events is observed with a level of risk /alpha/ that k or | |
243 | fewer events occur. | |
244 | ||
245 | [variablelist | |
246 | [[k][The number of success observed.]] | |
247 | [[p][The probability of success for each trial.]] | |
248 | [[alpha][The maximum acceptable probability that k events or fewer will be observed.]] | |
249 | ] | |
250 | ||
251 | For example: | |
252 | ||
253 | binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05); | |
254 | ||
255 | Returns the smallest number of trials we must conduct to be 95% sure | |
256 | of seeing 10 events that occur with frequency one half. | |
257 | ||
258 | [h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes] | |
259 | ||
260 | static RealType find_maximum_number_of_trials( | |
261 | RealType k, // number of events | |
262 | RealType p, // success fraction | |
263 | RealType alpha); // probability threshold | |
264 | ||
265 | This function estimates the maximum number of trials we can conduct | |
266 | to ensure that k successes or fewer are observed, with a risk /alpha/ | |
267 | that more than k occur. | |
268 | ||
269 | [variablelist | |
270 | [[k][The number of success observed.]] | |
271 | [[p][The probability of success for each trial.]] | |
272 | [[alpha][The maximum acceptable probability that more than k events will be observed.]] | |
273 | ] | |
274 | ||
275 | For example: | |
276 | ||
277 | binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05); | |
278 | ||
279 | Returns the largest number of trials we can conduct and still be 95% certain | |
280 | of not observing any events that occur with one in a million frequency. | |
281 | This is typically used in failure analysis. | |
282 | ||
283 | [link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.] | |
284 | ||
285 | [h4 Non-member Accessors] | |
286 | ||
287 | All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions] | |
288 | that are generic to all distributions are supported: __usual_accessors. | |
289 | ||
290 | The domain for the random variable /k/ is `0 <= k <= N`, otherwise a | |
291 | __domain_error is returned. | |
292 | ||
293 | It's worth taking a moment to define what these accessors actually mean in | |
294 | the context of this distribution: | |
295 | ||
296 | [table Meaning of the non-member accessors | |
297 | [[Function][Meaning]] | |
298 | [[__pdf] | |
299 | [The probability of obtaining [*exactly k successes] from n trials | |
300 | with success fraction p. For example: | |
301 | ||
302 | `pdf(binomial(n, p), k)`]] | |
303 | [[__cdf] | |
304 | [The probability of obtaining [*k successes or fewer] from n trials | |
305 | with success fraction p. For example: | |
306 | ||
307 | `cdf(binomial(n, p), k)`]] | |
308 | [[__ccdf] | |
309 | [The probability of obtaining [*more than k successes] from n trials | |
310 | with success fraction p. For example: | |
311 | ||
312 | `cdf(complement(binomial(n, p), k))`]] | |
313 | [[__quantile] | |
314 | [The [*greatest] number of successes that may be observed from n trials | |
315 | with success fraction p, at probability P. Note that the value returned | |
316 | is a real-number, and not an integer. Depending on the use case you may | |
317 | want to take either the floor or ceiling of the result. For example: | |
318 | ||
319 | `quantile(binomial(n, p), P)`]] | |
320 | [[__quantile_c] | |
321 | [The [*smallest] number of successes that may be observed from n trials | |
322 | with success fraction p, at probability P. Note that the value returned | |
323 | is a real-number, and not an integer. Depending on the use case you may | |
324 | want to take either the floor or ceiling of the result. For example: | |
325 | ||
326 | `quantile(complement(binomial(n, p), P))`]] | |
327 | ] | |
328 | ||
329 | [h4 Examples] | |
330 | ||
331 | Various [link math_toolkit.stat_tut.weg.binom_eg worked examples] | |
332 | are available illustrating the use of the binomial distribution. | |
333 | ||
334 | [h4 Accuracy] | |
335 | ||
336 | This distribution is implemented using the | |
337 | incomplete beta functions __ibeta and __ibetac, | |
338 | please refer to these functions for information on accuracy. | |
339 | ||
340 | [h4 Implementation] | |
341 | ||
342 | In the following table /p/ is the probability that one trial will | |
343 | be successful (the success fraction), /n/ is the number of trials, | |
344 | /k/ is the number of successes, /p/ is the probability and /q = 1-p/. | |
345 | ||
346 | [table | |
347 | [[Function][Implementation Notes]] | |
348 | [[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial | |
349 | coefficient of a and b, then we have: | |
350 | ||
351 | [equation binomial_ref1] | |
352 | ||
353 | Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)` | |
354 | ||
355 | The function __ibeta_derivative is used here, since it has already | |
356 | been optimised for the lowest possible error - indeed this is really | |
357 | just a thin wrapper around part of the internals of the incomplete | |
358 | beta function. | |
359 | ||
360 | There are also various special cases: refer to the code for details. | |
361 | ]] | |
362 | [[cdf][Using the relation: | |
363 | ||
364 | `` | |
365 | p = I[sub 1-p](n - k, k + 1) | |
366 | = 1 - I[sub p](k + 1, n - k) | |
367 | = __ibetac(k + 1, n - k, p)`` | |
368 | ||
369 | There are also various special cases: refer to the code for details. | |
370 | ]] | |
371 | [[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p) | |
372 | ||
373 | There are also various special cases: refer to the code for details. ]] | |
374 | [[quantile][Since the cdf is non-linear in variate /k/ none of the inverse | |
375 | incomplete beta functions can be used here. Instead the quantile | |
376 | is found numerically using a derivative free method | |
377 | (__root_finding_TOMS748).]] | |
378 | [[quantile from the complement][Found numerically as above.]] | |
379 | [[mean][ `p * n` ]] | |
380 | [[variance][ `p * n * (1-p)` ]] | |
381 | [[mode][`floor(p * (n + 1))`]] | |
382 | [[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]] | |
383 | [[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]] | |
384 | [[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]] | |
385 | [[parameter estimation][The member functions `find_upper_bound_on_p` | |
386 | `find_lower_bound_on_p` and `find_number_of_trials` are | |
387 | implemented in terms of the inverse incomplete beta functions | |
388 | __ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]] | |
389 | ] | |
390 | ||
391 | [h4 References] | |
392 | ||
393 | * [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource]. | |
394 | * [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution]. | |
395 | * [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Explorary Data Analysis]. | |
396 | ||
397 | [endsect] [/section:binomial_dist Binomial] | |
398 | ||
399 | [/ binomial.qbk | |
400 | Copyright 2006 John Maddock and Paul A. Bristow. | |
401 | Distributed under the Boost Software License, Version 1.0. | |
402 | (See accompanying file LICENSE_1_0.txt or copy at | |
403 | http://www.boost.org/LICENSE_1_0.txt). | |
404 | ] |