1 [section:negative_binomial_dist Negative Binomial Distribution]
3 ``#include <boost/math/distributions/negative_binomial.hpp>``
5 namespace boost{ namespace math{
7 template <class RealType = double,
8 class ``__Policy`` = ``__policy_class`` >
9 class negative_binomial_distribution;
11 typedef negative_binomial_distribution<> negative_binomial;
13 template <class RealType, class ``__Policy``>
14 class negative_binomial_distribution
17 typedef RealType value_type;
18 typedef Policy policy_type;
19 // Constructor from successes and success_fraction:
20 negative_binomial_distribution(RealType r, RealType p);
22 // Parameter accessors:
23 RealType success_fraction() const;
24 RealType successes() const;
26 // Bounds on success fraction:
27 static RealType find_lower_bound_on_p(
30 RealType probability); // alpha
31 static RealType find_upper_bound_on_p(
34 RealType probability); // alpha
36 // Estimate min/max number of trials:
37 static RealType find_minimum_number_of_trials(
38 RealType k, // Number of failures.
39 RealType p, // Success fraction.
40 RealType probability); // Probability threshold alpha.
41 static RealType find_maximum_number_of_trials(
42 RealType k, // Number of failures.
43 RealType p, // Success fraction.
44 RealType probability); // Probability threshold alpha.
49 The class type `negative_binomial_distribution` represents a
50 [@http://en.wikipedia.org/wiki/Negative_binomial_distribution negative_binomial distribution]:
51 it is used when there are exactly two mutually exclusive outcomes of a
52 [@http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trial]:
53 these outcomes are labelled "success" and "failure".
55 For k + r Bernoulli trials each with success fraction p, the
56 negative_binomial distribution gives the probability of observing
57 k failures and r successes with success on the last trial.
58 The negative_binomial distribution
59 assumes that success_fraction p is fixed for all (k + r) trials.
61 [note The random variable for the negative binomial distribution is the number of trials,
62 (the number of successes is a fixed property of the distribution)
63 whereas for the binomial,
64 the random variable is the number of successes, for a fixed number of trials.]
68 [equation neg_binomial_ref]
70 The following graph illustrate how the PDF varies as the success fraction
73 [graph negative_binomial_pdf_1]
75 Alternatively, this graph shows how the shape of the PDF varies as
76 the number of successes changes:
78 [graph negative_binomial_pdf_2]
80 [h4 Related Distributions]
82 The name negative binomial distribution is reserved by some to the
83 case where the successes parameter r is an integer.
84 This integer version is also called the
85 [@http://mathworld.wolfram.com/PascalDistribution.html Pascal distribution].
87 This implementation uses real numbers for the computation throughout
88 (because it uses the *real-valued* incomplete beta function family of functions).
89 This real-valued version is also called the Polya Distribution.
91 The Poisson distribution is a generalization of the Pascal distribution,
92 where the success parameter r is an integer: to obtain the Pascal
93 distribution you must ensure that an integer value is provided for r,
94 and take integer values (floor or ceiling) from functions that return
95 a number of successes.
97 For large values of r (successes), the negative binomial distribution
98 converges to the Poisson distribution.
100 The geometric distribution is a special case
101 where the successes parameter r = 1,
102 so only a first and only success is required.
103 geometric(p) = negative_binomial(1, p).
105 The Poisson distribution is a special case for large successes
107 poisson([lambda]) = lim [sub r [rarr] [infin]] [space] negative_binomial(r, r / ([lambda] + r)))
109 [discrete_quantile_warning Negative Binomial]
111 [h4 Member Functions]
115 negative_binomial_distribution(RealType r, RealType p);
117 Constructor: /r/ is the total number of successes, /p/ is the
118 probability of success of a single trial.
120 Requires: `r > 0` and `0 <= p <= 1`.
124 RealType success_fraction() const; // successes / trials (0 <= p <= 1)
126 Returns the parameter /p/ from which this distribution was constructed.
128 RealType successes() const; // required successes (r > 0)
130 Returns the parameter /r/ from which this distribution was constructed.
132 The best method of calculation for the following functions is disputed:
133 see __binomial_distrib for more discussion.
135 [h5 Lower Bound on Parameter p]
137 static RealType find_lower_bound_on_p(
140 RealType probability) // (0 <= alpha <= 1), 0.05 equivalent to 95% confidence.
142 Returns a *lower bound* on the success fraction:
145 [[failures][The total number of failures before the ['r]th success.]]
146 [[successes][The number of successes required.]]
147 [[alpha][The largest acceptable probability that the true value of
148 the success fraction is [*less than] the value returned.]]
151 For example, if you observe /k/ failures and /r/ successes from /n/ = k + r trials
152 the best estimate for the success fraction is simply ['r/n], but if you
153 want to be 95% sure that the true value is [*greater than] some value,
156 p``[sub min]`` = negative_binomial_distribution<RealType>::find_lower_bound_on_p(
157 failures, successes, 0.05);
159 [link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_conf See negative binomial confidence interval example.]
161 This function uses the Clopper-Pearson method of computing the lower bound on the
162 success fraction, whilst many texts refer to this method as giving an "exact"
163 result in practice it produces an interval that guarantees ['at least] the
164 coverage required, and may produce pessimistic estimates for some combinations
165 of /failures/ and /successes/. See:
167 [@http://www.ucs.louisiana.edu/~kxk4695/Discrete_new.pdf
168 Yong Cai and K. Krishnamoorthy, A Simple Improved Inferential Method for Some Discrete Distributions.
169 Computational statistics and data analysis, 2005, vol. 48, no3, 605-621].
171 [h5 Upper Bound on Parameter p]
173 static RealType find_upper_bound_on_p(
176 RealType alpha); // (0 <= alpha <= 1), 0.05 equivalent to 95% confidence.
178 Returns an *upper bound* on the success fraction:
181 [[trials][The total number of trials conducted.]]
182 [[successes][The number of successes that occurred.]]
183 [[alpha][The largest acceptable probability that the true value of
184 the success fraction is [*greater than] the value returned.]]
187 For example, if you observe /k/ successes from /n/ trials the
188 best estimate for the success fraction is simply ['k/n], but if you
189 want to be 95% sure that the true value is [*less than] some value,
192 p``[sub max]`` = negative_binomial_distribution<RealType>::find_upper_bound_on_p(
195 [link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_conf See negative binomial confidence interval example.]
197 This function uses the Clopper-Pearson method of computing the lower bound on the
198 success fraction, whilst many texts refer to this method as giving an "exact"
199 result in practice it produces an interval that guarantees ['at least] the
200 coverage required, and may produce pessimistic estimates for some combinations
201 of /failures/ and /successes/. See:
203 [@http://www.ucs.louisiana.edu/~kxk4695/Discrete_new.pdf
204 Yong Cai and K. Krishnamoorthy, A Simple Improved Inferential Method for Some Discrete Distributions.
205 Computational statistics and data analysis, 2005, vol. 48, no3, 605-621].
207 [h5 Estimating Number of Trials to Ensure at Least a Certain Number of Failures]
209 static RealType find_minimum_number_of_trials(
210 RealType k, // number of failures.
211 RealType p, // success fraction.
212 RealType alpha); // probability threshold (0.05 equivalent to 95%).
214 This functions estimates the number of trials required to achieve a certain
215 probability that [*more than k failures will be observed].
218 [[k][The target number of failures to be observed.]]
219 [[p][The probability of ['success] for each trial.]]
220 [[alpha][The maximum acceptable risk that only k failures or fewer will be observed.]]
225 negative_binomial_distribution<RealType>::find_minimum_number_of_trials(10, 0.5, 0.05);
227 Returns the smallest number of trials we must conduct to be 95% sure
228 of seeing 10 failures that occur with frequency one half.
230 [link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_size_eg Worked Example.]
232 This function uses numeric inversion of the negative binomial distribution
233 to obtain the result: another interpretation of the result, is that it finds
234 the number of trials (success+failures) that will lead to an /alpha/ probability
235 of observing k failures or fewer.
237 [h5 Estimating Number of Trials to Ensure a Maximum Number of Failures or Less]
239 static RealType find_maximum_number_of_trials(
240 RealType k, // number of failures.
241 RealType p, // success fraction.
242 RealType alpha); // probability threshold (0.05 equivalent to 95%).
244 This functions estimates the maximum number of trials we can conduct and achieve
245 a certain probability that [*k failures or fewer will be observed].
248 [[k][The maximum number of failures to be observed.]]
249 [[p][The probability of ['success] for each trial.]]
250 [[alpha][The maximum acceptable ['risk] that more than k failures will be observed.]]
255 negative_binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1.0-1.0/1000000, 0.05);
257 Returns the largest number of trials we can conduct and still be 95% sure
258 of seeing no failures that occur with frequency one in one million.
260 This function uses numeric inversion of the negative binomial distribution
261 to obtain the result: another interpretation of the result, is that it finds
262 the number of trials (success+failures) that will lead to an /alpha/ probability
263 of observing more than k failures.
265 [h4 Non-member Accessors]
267 All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
268 that are generic to all distributions are supported: __usual_accessors.
270 However it's worth taking a moment to define what these actually mean in
271 the context of this distribution:
273 [table Meaning of the non-member accessors.
274 [[Function][Meaning]]
276 [The probability of obtaining [*exactly k failures] from k+r trials
277 with success fraction p. For example:
279 ``pdf(negative_binomial(r, p), k)``]]
281 [The probability of obtaining [*k failures or fewer] from k+r trials
282 with success fraction p and success on the last trial. For example:
284 ``cdf(negative_binomial(r, p), k)``]]
286 [The probability of obtaining [*more than k failures] from k+r trials
287 with success fraction p and success on the last trial. For example:
289 ``cdf(complement(negative_binomial(r, p), k))``]]
291 [The [*greatest] number of failures k expected to be observed from k+r trials
292 with success fraction p, at probability P. Note that the value returned
293 is a real-number, and not an integer. Depending on the use case you may
294 want to take either the floor or ceiling of the real result. For example:
296 ``quantile(negative_binomial(r, p), P)``]]
298 [The [*smallest] number of failures k expected to be observed from k+r trials
299 with success fraction p, at probability P. Note that the value returned
300 is a real-number, and not an integer. Depending on the use case you may
301 want to take either the floor or ceiling of the real result. For example:
302 ``quantile(complement(negative_binomial(r, p), P))``]]
307 This distribution is implemented using the
308 incomplete beta functions __ibeta and __ibetac:
309 please refer to these functions for information on accuracy.
313 In the following table, /p/ is the probability that any one trial will
314 be successful (the success fraction), /r/ is the number of successes,
315 /k/ is the number of failures, /p/ is the probability and /q = 1-p/.
318 [[Function][Implementation Notes]]
319 [[pdf][pdf = exp(lgamma(r + k) - lgamma(r) - lgamma(k+1)) * pow(p, r) * pow((1-p), k)
321 Implementation is in terms of __ibeta_derivative:
323 (p/(r + k)) * ibeta_derivative(r, static_cast<RealType>(k+1), p)
324 The function __ibeta_derivative is used here, since it has already
325 been optimised for the lowest possible error - indeed this is really
326 just a thin wrapper around part of the internals of the incomplete
329 [[cdf][Using the relation:
331 cdf = I[sub p](r, k+1) = ibeta(r, k+1, p)
333 = ibeta(r, static_cast<RealType>(k+1), p)]]
334 [[cdf complement][Using the relation:
336 1 - cdf = I[sub p](k+1, r)
338 = ibetac(r, static_cast<RealType>(k+1), p)
340 [[quantile][ibeta_invb(r, p, P) - 1]]
341 [[quantile from the complement][ibetac_invb(r, p, Q) -1)]]
342 [[mean][ `r(1-p)/p` ]]
343 [[variance][ `r (1-p) / p * p` ]]
344 [[mode][`floor((r-1) * (1 - p)/p)`]]
345 [[skewness][`(2 - p) / sqrt(r * (1 - p))`]]
346 [[kurtosis][`6 / r + (p * p) / r * (1 - p )`]]
347 [[kurtosis excess][`6 / r + (p * p) / r * (1 - p ) -3`]]
348 [[parameter estimation member functions][]]
349 [[`find_lower_bound_on_p`][ibeta_inv(successes, failures + 1, alpha)]]
350 [[`find_upper_bound_on_p`][ibetac_inv(successes, failures, alpha) plus see comments in code.]]
351 [[`find_minimum_number_of_trials`][ibeta_inva(k + 1, p, alpha)]]
352 [[`find_maximum_number_of_trials`][ibetac_inva(k + 1, p, alpha)]]
355 Implementation notes:
357 * The real concept type (that deliberately lacks the Lanczos approximation),
358 was found to take several minutes to evaluate some extreme test values,
359 so the test has been disabled for this type.
361 * Much greater speed, and perhaps greater accuracy,
362 might be achieved for extreme values by using a normal approximation.
363 This is NOT been tested or implemented.
365 [endsect][/section:negative_binomial_dist Negative Binomial]
367 [/ negative_binomial.qbk
368 Copyright 2006 John Maddock and Paul A. Bristow.
369 Distributed under the Boost Software License, Version 1.0.
370 (See accompanying file LICENSE_1_0.txt or copy at
371 http://www.boost.org/LICENSE_1_0.txt).