]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/math/doc/distributions/binomial_example.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / math / doc / distributions / binomial_example.qbk
1 [section:binom_eg Binomial Distribution Examples]
2
3 See also the reference documentation for the __binomial_distrib.
4
5 [section:binomial_coinflip_example Binomial Coin-Flipping Example]
6
7 [import ../../example/binomial_coinflip_example.cpp]
8 [binomial_coinflip_example1]
9
10 See [@../../example/binomial_coinflip_example.cpp binomial_coinflip_example.cpp]
11 for full source code, the program output looks like this:
12
13 [binomial_coinflip_example_output]
14
15 [endsect] [/section:binomial_coinflip_example Binomial coinflip example]
16
17 [section:binomial_quiz_example Binomial Quiz Example]
18
19 [import ../../example/binomial_quiz_example.cpp]
20 [binomial_quiz_example1]
21 [binomial_quiz_example2]
22 [discrete_quantile_real]
23
24 See [@../../example/binomial_quiz_example.cpp binomial_quiz_example.cpp]
25 for full source code and output.
26
27 [endsect] [/section:binomial_coinflip_quiz Binomial Coin-Flipping example]
28
29 [section:binom_conf Calculating Confidence Limits on the Frequency of Occurrence for a Binomial Distribution]
30
31 Imagine you have a process that follows a binomial distribution: for each
32 trial conducted, an event either occurs or does it does not, referred
33 to as "successes" and "failures". If, by experiment, you want to measure the
34 frequency with which successes occur, the best estimate is given simply
35 by /k/ \/ /N/, for /k/ successes out of /N/ trials. However our confidence in that
36 estimate will be shaped by how many trials were conducted, and how many successes
37 were observed. The static member functions
38 `binomial_distribution<>::find_lower_bound_on_p` and
39 `binomial_distribution<>::find_upper_bound_on_p` allow you to calculate
40 the confidence intervals for your estimate of the occurrence frequency.
41
42 The sample program [@../../example/binomial_confidence_limits.cpp
43 binomial_confidence_limits.cpp] illustrates their use. It begins by defining
44 a procedure that will print a table of confidence limits for various degrees
45 of certainty:
46
47 #include <iostream>
48 #include <iomanip>
49 #include <boost/math/distributions/binomial.hpp>
50
51 void confidence_limits_on_frequency(unsigned trials, unsigned successes)
52 {
53 //
54 // trials = Total number of trials.
55 // successes = Total number of observed successes.
56 //
57 // Calculate confidence limits for an observed
58 // frequency of occurrence that follows a binomial
59 // distribution.
60 //
61 using namespace std;
62 using namespace boost::math;
63
64 // Print out general info:
65 cout <<
66 "___________________________________________\n"
67 "2-Sided Confidence Limits For Success Ratio\n"
68 "___________________________________________\n\n";
69 cout << setprecision(7);
70 cout << setw(40) << left << "Number of Observations" << "= " << trials << "\n";
71 cout << setw(40) << left << "Number of successes" << "= " << successes << "\n";
72 cout << setw(40) << left << "Sample frequency of occurrence" << "= " << double(successes) / trials << "\n";
73
74 The procedure now defines a table of significance levels: these are the
75 probabilities that the true occurrence frequency lies outside the calculated
76 interval:
77
78 double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
79
80 Some pretty printing of the table header follows:
81
82 cout << "\n\n"
83 "_______________________________________________________________________\n"
84 "Confidence Lower CP Upper CP Lower JP Upper JP\n"
85 " Value (%) Limit Limit Limit Limit\n"
86 "_______________________________________________________________________\n";
87
88
89 And now for the important part - the intervals themselves - for each
90 value of /alpha/, we call `find_lower_bound_on_p` and
91 `find_lower_upper_on_p` to obtain lower and upper bounds
92 respectively. Note that since we are calculating a two-sided interval,
93 we must divide the value of alpha in two.
94
95 Please note that calculating two separate /single sided bounds/, each with risk
96 level [alpha][space]is not the same thing as calculating a two sided interval.
97 Had we calculate two single-sided intervals each with a risk
98 that the true value is outside the interval of [alpha], then:
99
100 * The risk that it is less than the lower bound is [alpha].
101
102 and
103
104 * The risk that it is greater than the upper bound is also [alpha].
105
106 So the risk it is outside *upper or lower bound*, is *twice* alpha, and the
107 probability that it is inside the bounds is therefore not nearly as high as
108 one might have thought. This is why [alpha]/2 must be used in
109 the calculations below.
110
111 In contrast, had we been calculating a
112 single-sided interval, for example: ['"Calculate a lower bound so that we are P%
113 sure that the true occurrence frequency is greater than some value"]
114 then we would *not* have divided by two.
115
116 Finally note that `binomial_distribution` provides a choice of two
117 methods for the calculation, we print out the results from both
118 methods in this example:
119
120 for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
121 {
122 // Confidence value:
123 cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
124 // Calculate Clopper Pearson bounds:
125 double l = binomial_distribution<>::find_lower_bound_on_p(
126 trials, successes, alpha[i]/2);
127 double u = binomial_distribution<>::find_upper_bound_on_p(
128 trials, successes, alpha[i]/2);
129 // Print Clopper Pearson Limits:
130 cout << fixed << setprecision(5) << setw(15) << right << l;
131 cout << fixed << setprecision(5) << setw(15) << right << u;
132 // Calculate Jeffreys Prior Bounds:
133 l = binomial_distribution<>::find_lower_bound_on_p(
134 trials, successes, alpha[i]/2,
135 binomial_distribution<>::jeffreys_prior_interval);
136 u = binomial_distribution<>::find_upper_bound_on_p(
137 trials, successes, alpha[i]/2,
138 binomial_distribution<>::jeffreys_prior_interval);
139 // Print Jeffreys Prior Limits:
140 cout << fixed << setprecision(5) << setw(15) << right << l;
141 cout << fixed << setprecision(5) << setw(15) << right << u << std::endl;
142 }
143 cout << endl;
144 }
145
146 And that's all there is to it. Let's see some sample output for a 2 in 10
147 success ratio, first for 20 trials:
148
149 [pre'''___________________________________________
150 2-Sided Confidence Limits For Success Ratio
151 ___________________________________________
152
153 Number of Observations = 20
154 Number of successes = 4
155 Sample frequency of occurrence = 0.2
156
157
158 _______________________________________________________________________
159 Confidence Lower CP Upper CP Lower JP Upper JP
160 Value (%) Limit Limit Limit Limit
161 _______________________________________________________________________
162 50.000 0.12840 0.29588 0.14974 0.26916
163 75.000 0.09775 0.34633 0.11653 0.31861
164 90.000 0.07135 0.40103 0.08734 0.37274
165 95.000 0.05733 0.43661 0.07152 0.40823
166 99.000 0.03576 0.50661 0.04655 0.47859
167 99.900 0.01905 0.58632 0.02634 0.55960
168 99.990 0.01042 0.64997 0.01530 0.62495
169 99.999 0.00577 0.70216 0.00901 0.67897
170 ''']
171
172 As you can see, even at the 95% confidence level the bounds are
173 really quite wide (this example is chosen to be easily compared to the one
174 in the __handbook
175 [@http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
176 here]). Note also that the Clopper-Pearson calculation method (CP above)
177 produces quite noticeably more pessimistic estimates than the Jeffreys Prior
178 method (JP above).
179
180
181 Compare that with the program output for
182 2000 trials:
183
184 [pre'''___________________________________________
185 2-Sided Confidence Limits For Success Ratio
186 ___________________________________________
187
188 Number of Observations = 2000
189 Number of successes = 400
190 Sample frequency of occurrence = 0.2000000
191
192
193 _______________________________________________________________________
194 Confidence Lower CP Upper CP Lower JP Upper JP
195 Value (%) Limit Limit Limit Limit
196 _______________________________________________________________________
197 50.000 0.19382 0.20638 0.19406 0.20613
198 75.000 0.18965 0.21072 0.18990 0.21047
199 90.000 0.18537 0.21528 0.18561 0.21503
200 95.000 0.18267 0.21821 0.18291 0.21796
201 99.000 0.17745 0.22400 0.17769 0.22374
202 99.900 0.17150 0.23079 0.17173 0.23053
203 99.990 0.16658 0.23657 0.16681 0.23631
204 99.999 0.16233 0.24169 0.16256 0.24143
205 ''']
206
207 Now even when the confidence level is very high, the limits are really
208 quite close to the experimentally calculated value of 0.2. Furthermore
209 the difference between the two calculation methods is now really quite small.
210
211 [endsect]
212
213 [section:binom_size_eg Estimating Sample Sizes for a Binomial Distribution.]
214
215 Imagine you have a critical component that you know will fail in 1 in
216 N "uses" (for some suitable definition of "use"). You may want to schedule
217 routine replacement of the component so that its chance of failure between
218 routine replacements is less than P%. If the failures follow a binomial
219 distribution (each time the component is "used" it either fails or does not)
220 then the static member function `binomial_distibution<>::find_maximum_number_of_trials`
221 can be used to estimate the maximum number of "uses" of that component for some
222 acceptable risk level /alpha/.
223
224 The example program
225 [@../../example/binomial_sample_sizes.cpp binomial_sample_sizes.cpp]
226 demonstrates its usage. It centres on a routine that prints out
227 a table of maximum sample sizes for various probability thresholds:
228
229 void find_max_sample_size(
230 double p, // success ratio.
231 unsigned successes) // Total number of observed successes permitted.
232 {
233
234 The routine then declares a table of probability thresholds: these are the
235 maximum acceptable probability that /successes/ or fewer events will be
236 observed. In our example, /successes/ will be always zero, since we want
237 no component failures, but in other situations non-zero values may well
238 make sense.
239
240 double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
241
242 Much of the rest of the program is pretty-printing, the important part
243 is in the calculation of maximum number of permitted trials for each
244 value of alpha:
245
246 for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
247 {
248 // Confidence value:
249 cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
250 // calculate trials:
251 double t = binomial::find_maximum_number_of_trials(
252 successes, p, alpha[i]);
253 t = floor(t);
254 // Print Trials:
255 cout << fixed << setprecision(5) << setw(15) << right << t << endl;
256 }
257
258 Note that since we're
259 calculating the maximum number of trials permitted, we'll err on the safe
260 side and take the floor of the result. Had we been calculating the
261 /minimum/ number of trials required to observe a certain number of /successes/
262 using `find_minimum_number_of_trials` we would have taken the ceiling instead.
263
264 We'll finish off by looking at some sample output, firstly for
265 a 1 in 1000 chance of component failure with each use:
266
267 [pre
268 '''________________________
269 Maximum Number of Trials
270 ________________________
271
272 Success ratio = 0.001
273 Maximum Number of "successes" permitted = 0
274
275
276 ____________________________
277 Confidence Max Number
278 Value (%) Of Trials
279 ____________________________
280 50.000 692
281 75.000 287
282 90.000 105
283 95.000 51
284 99.000 10
285 99.900 0
286 99.990 0
287 99.999 0'''
288 ]
289
290 So 51 "uses" of the component would yield a 95% chance that no
291 component failures would be observed.
292
293 Compare that with a 1 in 1 million chance of component failure:
294
295 [pre'''
296 ________________________
297 Maximum Number of Trials
298 ________________________
299
300 Success ratio = 0.0000010
301 Maximum Number of "successes" permitted = 0
302
303
304 ____________________________
305 Confidence Max Number
306 Value (%) Of Trials
307 ____________________________
308 50.000 693146
309 75.000 287681
310 90.000 105360
311 95.000 51293
312 99.000 10050
313 99.900 1000
314 99.990 100
315 99.999 10'''
316 ]
317
318 In this case, even 1000 uses of the component would still yield a
319 less than 1 in 1000 chance of observing a component failure
320 (i.e. a 99.9% chance of no failure).
321
322 [endsect] [/section:binom_size_eg Estimating Sample Sizes for a Binomial Distribution.]
323
324 [endsect][/section:binom_eg Binomial Distribution]
325
326 [/
327 Copyright 2006 John Maddock and Paul A. Bristow.
328 Distributed under the Boost Software License, Version 1.0.
329 (See accompanying file LICENSE_1_0.txt or copy at
330 http://www.boost.org/LICENSE_1_0.txt).
331 ]
332