]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/math/doc/html/math_toolkit/dist_ref/dists/hypergeometric_dist.html
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / src / boost / libs / math / doc / html / math_toolkit / dist_ref / dists / hypergeometric_dist.html
1 <html>
2 <head>
3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Hypergeometric Distribution</title>
5 <link rel="stylesheet" href="../../../math.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.77.1">
7 <link rel="home" href="../../../index.html" title="Math Toolkit 2.5.1">
8 <link rel="up" href="../dists.html" title="Distributions">
9 <link rel="prev" href="hyperexponential_dist.html" title="Hyperexponential Distribution">
10 <link rel="next" href="inverse_chi_squared_dist.html" title="Inverse Chi Squared Distribution">
11 </head>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../../more/index.htm">More</a></td>
20 </tr></table>
21 <hr>
22 <div class="spirit-nav">
23 <a accesskey="p" href="hyperexponential_dist.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../dists.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="inverse_chi_squared_dist.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
24 </div>
25 <div class="section">
26 <div class="titlepage"><div><div><h4 class="title">
27 <a name="math_toolkit.dist_ref.dists.hypergeometric_dist"></a><a class="link" href="hypergeometric_dist.html" title="Hypergeometric Distribution">Hypergeometric
28 Distribution</a>
29 </h4></div></div></div>
30 <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">distributions</span><span class="special">/</span><span class="identifier">hypergeometric</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span></pre>
31 <pre class="programlisting"><span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">{</span> <span class="keyword">namespace</span> <span class="identifier">math</span><span class="special">{</span>
32
33 <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">RealType</span> <span class="special">=</span> <span class="keyword">double</span><span class="special">,</span>
34 <span class="keyword">class</span> <a class="link" href="../../../policy.html" title="Chapter&#160;15.&#160;Policies: Controlling Precision, Error Handling etc">Policy</a> <span class="special">=</span> <a class="link" href="../../pol_ref/pol_ref_ref.html" title="Policy Class Reference">policies::policy&lt;&gt;</a> <span class="special">&gt;</span>
35 <span class="keyword">class</span> <span class="identifier">hypergeometric_distribution</span><span class="special">;</span>
36
37 <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">RealType</span><span class="special">,</span> <span class="keyword">class</span> <span class="identifier">Policy</span><span class="special">&gt;</span>
38 <span class="keyword">class</span> <span class="identifier">hypergeometric_distribution</span>
39 <span class="special">{</span>
40 <span class="keyword">public</span><span class="special">:</span>
41 <span class="keyword">typedef</span> <span class="identifier">RealType</span> <span class="identifier">value_type</span><span class="special">;</span>
42 <span class="keyword">typedef</span> <span class="identifier">Policy</span> <span class="identifier">policy_type</span><span class="special">;</span>
43 <span class="comment">// Construct:</span>
44 <span class="identifier">hypergeometric_distribution</span><span class="special">(</span><span class="keyword">unsigned</span> <span class="identifier">r</span><span class="special">,</span> <span class="keyword">unsigned</span> <span class="identifier">n</span><span class="special">,</span> <span class="keyword">unsigned</span> <span class="identifier">N</span><span class="special">);</span>
45 <span class="comment">// Accessors:</span>
46 <span class="keyword">unsigned</span> <span class="identifier">total</span><span class="special">()</span><span class="keyword">const</span><span class="special">;</span>
47 <span class="keyword">unsigned</span> <span class="identifier">defective</span><span class="special">()</span><span class="keyword">const</span><span class="special">;</span>
48 <span class="keyword">unsigned</span> <span class="identifier">sample_count</span><span class="special">()</span><span class="keyword">const</span><span class="special">;</span>
49 <span class="special">};</span>
50
51 <span class="keyword">typedef</span> <span class="identifier">hypergeometric_distribution</span><span class="special">&lt;&gt;</span> <span class="identifier">hypergeometric</span><span class="special">;</span>
52
53 <span class="special">}}</span> <span class="comment">// namespaces</span>
54 </pre>
55 <p>
56 The hypergeometric distribution describes the number of "events"
57 <span class="emphasis"><em>k</em></span> from a sample <span class="emphasis"><em>n</em></span> drawn from
58 a total population <span class="emphasis"><em>N</em></span> <span class="emphasis"><em>without replacement</em></span>.
59 </p>
60 <p>
61 Imagine we have a sample of <span class="emphasis"><em>N</em></span> objects of which <span class="emphasis"><em>r</em></span>
62 are "defective" and N-r are "not defective" (the terms
63 "success/failure" or "red/blue" are also used). If
64 we sample <span class="emphasis"><em>n</em></span> items <span class="emphasis"><em>without replacement</em></span>
65 then what is the probability that exactly <span class="emphasis"><em>k</em></span> items
66 in the sample are defective? The answer is given by the pdf of the hypergeometric
67 distribution <code class="computeroutput"><span class="identifier">f</span><span class="special">(</span><span class="identifier">k</span><span class="special">;</span> <span class="identifier">r</span><span class="special">,</span> <span class="identifier">n</span><span class="special">,</span>
68 <span class="identifier">N</span><span class="special">)</span></code>,
69 whilst the probability of <span class="emphasis"><em>k</em></span> defectives or fewer is
70 given by F(k; r, n, N), where F(k) is the CDF of the hypergeometric distribution.
71 </p>
72 <div class="note"><table border="0" summary="Note">
73 <tr>
74 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../doc/src/images/note.png"></td>
75 <th align="left">Note</th>
76 </tr>
77 <tr><td align="left" valign="top"><p>
78 Unlike almost all of the other distributions in this library, the hypergeometric
79 distribution is strictly discrete: it can not be extended to real valued
80 arguments of its parameters or random variable.
81 </p></td></tr>
82 </table></div>
83 <p>
84 The following graph shows how the distribution changes as the proportion
85 of "defective" items changes, while keeping the population and
86 sample sizes constant:
87 </p>
88 <p>
89 <span class="inlinemediaobject"><img src="../../../../graphs/hypergeometric_pdf_1.svg" align="middle"></span>
90 </p>
91 <p>
92 Note that since the distribution is symmetrical in parameters <span class="emphasis"><em>n</em></span>
93 and <span class="emphasis"><em>r</em></span>, if we change the sample size and keep the population
94 and proportion "defective" the same then we obtain basically
95 the same graphs:
96 </p>
97 <p>
98 <span class="inlinemediaobject"><img src="../../../../graphs/hypergeometric_pdf_2.svg" align="middle"></span>
99 </p>
100 <h5>
101 <a name="math_toolkit.dist_ref.dists.hypergeometric_dist.h0"></a>
102 <span class="phrase"><a name="math_toolkit.dist_ref.dists.hypergeometric_dist.member_functions"></a></span><a class="link" href="hypergeometric_dist.html#math_toolkit.dist_ref.dists.hypergeometric_dist.member_functions">Member
103 Functions</a>
104 </h5>
105 <pre class="programlisting"><span class="identifier">hypergeometric_distribution</span><span class="special">(</span><span class="keyword">unsigned</span> <span class="identifier">r</span><span class="special">,</span> <span class="keyword">unsigned</span> <span class="identifier">n</span><span class="special">,</span> <span class="keyword">unsigned</span> <span class="identifier">N</span><span class="special">);</span>
106 </pre>
107 <p>
108 Constructs a hypergeometric distribution with a population of <span class="emphasis"><em>N</em></span>
109 objects, of which <span class="emphasis"><em>r</em></span> are defective, and from which
110 <span class="emphasis"><em>n</em></span> are sampled.
111 </p>
112 <pre class="programlisting"><span class="keyword">unsigned</span> <span class="identifier">total</span><span class="special">()</span><span class="keyword">const</span><span class="special">;</span>
113 </pre>
114 <p>
115 Returns the total number of objects <span class="emphasis"><em>N</em></span>.
116 </p>
117 <pre class="programlisting"><span class="keyword">unsigned</span> <span class="identifier">defective</span><span class="special">()</span><span class="keyword">const</span><span class="special">;</span>
118 </pre>
119 <p>
120 Returns the number of objects <span class="emphasis"><em>r</em></span> in population <span class="emphasis"><em>N</em></span>
121 which are defective.
122 </p>
123 <pre class="programlisting"><span class="keyword">unsigned</span> <span class="identifier">sample_count</span><span class="special">()</span><span class="keyword">const</span><span class="special">;</span>
124 </pre>
125 <p>
126 Returns the number of objects <span class="emphasis"><em>n</em></span> which are sampled
127 from the population <span class="emphasis"><em>N</em></span>.
128 </p>
129 <h5>
130 <a name="math_toolkit.dist_ref.dists.hypergeometric_dist.h1"></a>
131 <span class="phrase"><a name="math_toolkit.dist_ref.dists.hypergeometric_dist.non_member_accessors"></a></span><a class="link" href="hypergeometric_dist.html#math_toolkit.dist_ref.dists.hypergeometric_dist.non_member_accessors">Non-member
132 Accessors</a>
133 </h5>
134 <p>
135 All the <a class="link" href="../nmp.html" title="Non-Member Properties">usual non-member accessor
136 functions</a> that are generic to all distributions are supported:
137 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.cdf">Cumulative Distribution Function</a>,
138 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.pdf">Probability Density Function</a>,
139 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.quantile">Quantile</a>, <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.hazard">Hazard Function</a>, <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.chf">Cumulative Hazard Function</a>,
140 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.mean">mean</a>, <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.median">median</a>,
141 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.mode">mode</a>, <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.variance">variance</a>,
142 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.sd">standard deviation</a>,
143 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.skewness">skewness</a>, <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.kurtosis">kurtosis</a>, <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.kurtosis_excess">kurtosis_excess</a>,
144 <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.range">range</a> and <a class="link" href="../nmp.html#math_toolkit.dist_ref.nmp.support">support</a>.
145 </p>
146 <p>
147 The domain of the random variable is the unsigned integers in the range
148 [max(0, n + r - N), min(n, r)]. A <a class="link" href="../../error_handling.html#math_toolkit.error_handling.domain_error">domain_error</a>
149 is raised if the random variable is outside this range, or is not an integral
150 value.
151 </p>
152 <div class="caution"><table border="0" summary="Caution">
153 <tr>
154 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Caution]" src="../../../../../../../doc/src/images/caution.png"></td>
155 <th align="left">Caution</th>
156 </tr>
157 <tr><td align="left" valign="top">
158 <p>
159 The quantile function will by default return an integer result that has
160 been <span class="emphasis"><em>rounded outwards</em></span>. That is to say lower quantiles
161 (where the probability is less than 0.5) are rounded downward, and upper
162 quantiles (where the probability is greater than 0.5) are rounded upwards.
163 This behaviour ensures that if an X% quantile is requested, then <span class="emphasis"><em>at
164 least</em></span> the requested coverage will be present in the central
165 region, and <span class="emphasis"><em>no more than</em></span> the requested coverage
166 will be present in the tails.
167 </p>
168 <p>
169 This behaviour can be changed so that the quantile functions are rounded
170 differently using <a class="link" href="../../pol_overview.html" title="Policy Overview">Policies</a>.
171 It is strongly recommended that you read the tutorial <a class="link" href="../../pol_tutorial/understand_dis_quant.html" title="Understanding Quantiles of Discrete Distributions">Understanding
172 Quantiles of Discrete Distributions</a> before using the quantile
173 function on the Hypergeometric distribution. The <a class="link" href="../../pol_ref/discrete_quant_ref.html" title="Discrete Quantile Policies">reference
174 docs</a> describe how to change the rounding policy for these distributions.
175 </p>
176 <p>
177 However, note that the implementation method of the quantile function
178 always returns an integral value, therefore attempting to use a <a class="link" href="../../../policy.html" title="Chapter&#160;15.&#160;Policies: Controlling Precision, Error Handling etc">Policy</a> that requires (or produces) a real valued
179 result will result in a compile time error.
180 </p>
181 </td></tr>
182 </table></div>
183 <h5>
184 <a name="math_toolkit.dist_ref.dists.hypergeometric_dist.h2"></a>
185 <span class="phrase"><a name="math_toolkit.dist_ref.dists.hypergeometric_dist.accuracy"></a></span><a class="link" href="hypergeometric_dist.html#math_toolkit.dist_ref.dists.hypergeometric_dist.accuracy">Accuracy</a>
186 </h5>
187 <p>
188 For small N such that <code class="computeroutput"><span class="identifier">N</span> <span class="special">&lt;</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">max_factorial</span><span class="special">&lt;</span><span class="identifier">RealType</span><span class="special">&gt;::</span><span class="identifier">value</span></code>
189 then table based lookup of the results gives an accuracy to a few epsilon.
190 <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">max_factorial</span><span class="special">&lt;</span><span class="identifier">RealType</span><span class="special">&gt;::</span><span class="identifier">value</span></code> is 170 at double or long double
191 precision.
192 </p>
193 <p>
194 For larger N such that <code class="computeroutput"><span class="identifier">N</span> <span class="special">&lt;</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">prime</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">max_prime</span><span class="special">)</span></code> then only basic arithmetic is required
195 for the calculation and the accuracy is typically &lt; 20 epsilon. This
196 takes care of N up to 104729.
197 </p>
198 <p>
199 For <code class="computeroutput"><span class="identifier">N</span> <span class="special">&gt;</span>
200 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">prime</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">max_prime</span><span class="special">)</span></code>
201 then accuracy quickly degrades, with 5 or 6 decimal digits being lost for
202 N = 110000.
203 </p>
204 <p>
205 In general for very large N, the user should expect to lose log<sub>10</sub>N decimal
206 digits of precision during the calculation, with the results becoming meaningless
207 for N &gt;= 10<sup>15</sup>.
208 </p>
209 <h5>
210 <a name="math_toolkit.dist_ref.dists.hypergeometric_dist.h3"></a>
211 <span class="phrase"><a name="math_toolkit.dist_ref.dists.hypergeometric_dist.testing"></a></span><a class="link" href="hypergeometric_dist.html#math_toolkit.dist_ref.dists.hypergeometric_dist.testing">Testing</a>
212 </h5>
213 <p>
214 There are three sets of tests: our implementation is tested against a table
215 of values produced by Mathematica's implementation of this distribution.
216 We also sanity check our implementation against some spot values computed
217 using the online calculator here <a href="http://stattrek.com/Tables/Hypergeometric.aspx" target="_top">http://stattrek.com/Tables/Hypergeometric.aspx</a>.
218 Finally we test accuracy against some high precision test data using this
219 implementation and NTL::RR.
220 </p>
221 <h5>
222 <a name="math_toolkit.dist_ref.dists.hypergeometric_dist.h4"></a>
223 <span class="phrase"><a name="math_toolkit.dist_ref.dists.hypergeometric_dist.implementation"></a></span><a class="link" href="hypergeometric_dist.html#math_toolkit.dist_ref.dists.hypergeometric_dist.implementation">Implementation</a>
224 </h5>
225 <p>
226 The PDF can be calculated directly using the formula:
227 </p>
228 <p>
229 <span class="inlinemediaobject"><img src="../../../../equations/hypergeometric1.svg"></span>
230 </p>
231 <p>
232 However, this can only be used directly when the largest of the factorials
233 is guaranteed not to overflow the floating point representation used. This
234 formula is used directly when <code class="computeroutput"><span class="identifier">N</span>
235 <span class="special">&lt;</span> <span class="identifier">max_factorial</span><span class="special">&lt;</span><span class="identifier">RealType</span><span class="special">&gt;::</span><span class="identifier">value</span></code>
236 in which case table lookup of the factorials gives a rapid and accurate
237 implementation method.
238 </p>
239 <p>
240 For larger <span class="emphasis"><em>N</em></span> the method described in "An Accurate
241 Computation of the Hypergeometric Distribution Function", Trong Wu,
242 ACM Transactions on Mathematical Software, Vol. 19, No. 1, March 1993,
243 Pages 33-43 is used. The method relies on the fact that there is an easy
244 method for factorising a factorial into the product of prime numbers:
245 </p>
246 <p>
247 <span class="inlinemediaobject"><img src="../../../../equations/hypergeometric2.svg"></span>
248 </p>
249 <p>
250 Where p<sub>i</sub> is the i'th prime number, and e<sub>i</sub> is a small positive integer or
251 zero, which can be calculated via:
252 </p>
253 <p>
254 <span class="inlinemediaobject"><img src="../../../../equations/hypergeometric3.svg"></span>
255 </p>
256 <p>
257 Further we can combine the factorials in the expression for the PDF to
258 yield the PDF directly as the product of prime numbers:
259 </p>
260 <p>
261 <span class="inlinemediaobject"><img src="../../../../equations/hypergeometric4.svg"></span>
262 </p>
263 <p>
264 With this time the exponents e<sub>i</sub> being either positive, negative or zero.
265 Indeed such a degree of cancellation occurs in the calculation of the e<sub>i</sub> that
266 many are zero, and typically most have a magnitude or no more than 1 or
267 2.
268 </p>
269 <p>
270 Calculation of the product of the primes requires some care to prevent
271 numerical overflow, we use a novel recursive method which splits the calculation
272 into a series of sub-products, with a new sub-product started each time
273 the next multiplication would cause either overflow or underflow. The sub-products
274 are stored in a linked list on the program stack, and combined in an order
275 that will guarantee no overflow or unnecessary-underflow once the last
276 sub-product has been calculated.
277 </p>
278 <p>
279 This method can be used as long as N is smaller than the largest prime
280 number we have stored in our table of primes (currently 104729). The method
281 is relatively slow (calculating the exponents requires the most time),
282 but requires only a small number of arithmetic operations to calculate
283 the result (indeed there is no shorter method involving only basic arithmetic
284 once the exponents have been found), the method is therefore much more
285 accurate than the alternatives.
286 </p>
287 <p>
288 For much larger N, we can calculate the PDF from the factorials using either
289 lgamma, or by directly combining lanczos approximations to avoid calculating
290 via logarithms. We use the latter method, as it is usually 1 or 2 decimal
291 digits more accurate than computing via logarithms with lgamma. However,
292 in this area where N &gt; 104729, the user should expect to lose around
293 log<sub>10</sub>N decimal digits during the calculation in the worst case.
294 </p>
295 <p>
296 The CDF and its complement is calculated by directly summing the PDF's.
297 We start by deciding whether the CDF, or its complement, is likely to be
298 the smaller of the two and then calculate the PDF at <span class="emphasis"><em>k</em></span>
299 (or <span class="emphasis"><em>k+1</em></span> if we're calculating the complement) and calculate
300 successive PDF values via the recurrence relations:
301 </p>
302 <p>
303 <span class="inlinemediaobject"><img src="../../../../equations/hypergeometric5.svg"></span>
304 </p>
305 <p>
306 Until we either reach the end of the distributions domain, or the next
307 PDF value to be summed would be too small to affect the result.
308 </p>
309 <p>
310 The quantile is calculated in a similar manner to the CDF: we first guess
311 which end of the distribution we're nearer to, and then sum PDFs starting
312 from the end of the distribution this time, until we have some value <span class="emphasis"><em>k</em></span>
313 that gives the required CDF.
314 </p>
315 <p>
316 The median is simply the quantile at 0.5, and the remaining properties
317 are calculated via:
318 </p>
319 <p>
320 <span class="inlinemediaobject"><img src="../../../../equations/hypergeometric6.svg"></span>
321 </p>
322 </div>
323 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
324 <td align="left"></td>
325 <td align="right"><div class="copyright-footer">Copyright &#169; 2006-2010, 2012-2014 Nikhar Agrawal,
326 Anton Bikineev, Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert
327 Holin, Bruno Lalande, John Maddock, Jeremy Murphy, Johan R&#229;de, Gautam Sewani,
328 Benjamin Sobotta, Thijs van den Berg, Daryle Walker and Xiaogang Zhang<p>
329 Distributed under the Boost Software License, Version 1.0. (See accompanying
330 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
331 </p>
332 </div></td>
333 </tr></table>
334 <hr>
335 <div class="spirit-nav">
336 <a accesskey="p" href="hyperexponential_dist.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../dists.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="inverse_chi_squared_dist.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
337 </div>
338 </body>
339 </html>