]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | |
2 | [section:st_eg Student's t Distribution Examples] | |
3 | ||
4 | [section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution] | |
5 | ||
6 | Let's say you have a sample mean, you may wish to know what confidence intervals | |
7 | you can place on that mean. Colloquially: "I want an interval that I can be | |
8 | P% sure contains the true mean". (On a technical point, note that | |
9 | the interval either contains the true mean or it does not: the | |
10 | meaning of the confidence level is subtly | |
11 | different from this colloquialism. More background information can be found on the | |
12 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]). | |
13 | ||
14 | The formula for the interval can be expressed as: | |
15 | ||
16 | [equation dist_tutorial4] | |
17 | ||
18 | Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation, | |
19 | /N/ is the sample size, /[alpha]/ is the desired significance level and | |
20 | ['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t | |
21 | distribution with /N-1/ degrees of freedom. | |
22 | ||
23 | [note | |
24 | The quantity [alpha][space] is the maximum acceptable risk of falsely rejecting | |
25 | the null-hypothesis. The smaller the value of [alpha] the greater the | |
26 | strength of the test. | |
27 | ||
28 | The confidence level of the test is defined as 1 - [alpha], and often expressed | |
29 | as a percentage. So for example a significance level of 0.05, is equivalent | |
30 | to a 95% confidence level. Refer to | |
31 | [@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm | |
32 | "What are confidence intervals?"] in __handbook for more information. | |
33 | ] [/ Note] | |
34 | ||
35 | [note | |
36 | The usual assumptions of | |
37 | [@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)] | |
38 | variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution] | |
39 | of course apply here, as they do in other examples. | |
40 | ] | |
41 | ||
42 | From the formula, it should be clear that: | |
43 | ||
44 | * The width of the confidence interval decreases as the sample size increases. | |
45 | * The width increases as the standard deviation increases. | |
46 | * The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger). | |
47 | * The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger). | |
48 | ||
49 | The following example code is taken from the example program | |
50 | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]. | |
51 | ||
52 | We'll begin by defining a procedure to calculate intervals for | |
53 | various confidence levels; the procedure will print these out | |
54 | as a table: | |
55 | ||
56 | // Needed includes: | |
57 | #include <boost/math/distributions/students_t.hpp> | |
58 | #include <iostream> | |
59 | #include <iomanip> | |
60 | // Bring everything into global namespace for ease of use: | |
61 | using namespace boost::math; | |
62 | using namespace std; | |
63 | ||
64 | void confidence_limits_on_mean( | |
65 | double Sm, // Sm = Sample Mean. | |
66 | double Sd, // Sd = Sample Standard Deviation. | |
67 | unsigned Sn) // Sn = Sample Size. | |
68 | { | |
69 | using namespace std; | |
70 | using namespace boost::math; | |
71 | ||
72 | // Print out general info: | |
73 | cout << | |
74 | "__________________________________\n" | |
75 | "2-Sided Confidence Limits For Mean\n" | |
76 | "__________________________________\n\n"; | |
77 | cout << setprecision(7); | |
78 | cout << setw(40) << left << "Number of Observations" << "= " << Sn << "\n"; | |
79 | cout << setw(40) << left << "Mean" << "= " << Sm << "\n"; | |
80 | cout << setw(40) << left << "Standard Deviation" << "= " << Sd << "\n"; | |
81 | ||
82 | We'll define a table of significance/risk levels for which we'll compute intervals: | |
83 | ||
84 | double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 }; | |
85 | ||
86 | Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999). | |
87 | ||
88 | Next we'll declare the distribution object we'll need, note that | |
89 | the /degrees of freedom/ parameter is the sample size less one: | |
90 | ||
91 | students_t dist(Sn - 1); | |
92 | ||
93 | Most of what follows in the program is pretty printing, so let's focus | |
94 | on the calculation of the interval. First we need the t-statistic, | |
95 | computed using the /quantile/ function and our significance level. Note | |
96 | that since the significance levels are the complement of the probability, | |
97 | we have to wrap the arguments in a call to /complement(...)/: | |
98 | ||
99 | double T = quantile(complement(dist, alpha[i] / 2)); | |
100 | ||
101 | Note that alpha was divided by two, since we'll be calculating | |
102 | both the upper and lower bounds: had we been interested in a single | |
103 | sided interval then we would have omitted this step. | |
104 | ||
105 | Now to complete the picture, we'll get the (one-sided) width of the | |
106 | interval from the t-statistic | |
107 | by multiplying by the standard deviation, and dividing by the square | |
108 | root of the sample size: | |
109 | ||
110 | double w = T * Sd / sqrt(double(Sn)); | |
111 | ||
112 | The two-sided interval is then the sample mean plus and minus this width. | |
113 | ||
114 | And apart from some more pretty-printing that completes the procedure. | |
115 | ||
116 | Let's take a look at some sample output, first using the | |
117 | [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm | |
118 | Heat flow data] from the NIST site. The data set was collected | |
119 | by Bob Zarr of NIST in January, 1990 from a heat flow meter | |
120 | calibration and stability analysis. | |
121 | The corresponding dataplot | |
122 | output for this test can be found in | |
123 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm | |
124 | section 3.5.2] of the __handbook. | |
125 | ||
126 | [pre''' | |
127 | __________________________________ | |
128 | 2-Sided Confidence Limits For Mean | |
129 | __________________________________ | |
130 | ||
131 | Number of Observations = 195 | |
132 | Mean = 9.26146 | |
133 | Standard Deviation = 0.02278881 | |
134 | ||
135 | ||
136 | ___________________________________________________________________ | |
137 | Confidence T Interval Lower Upper | |
138 | Value (%) Value Width Limit Limit | |
139 | ___________________________________________________________________ | |
140 | 50.000 0.676 1.103e-003 9.26036 9.26256 | |
141 | 75.000 1.154 1.883e-003 9.25958 9.26334 | |
142 | 90.000 1.653 2.697e-003 9.25876 9.26416 | |
143 | 95.000 1.972 3.219e-003 9.25824 9.26468 | |
144 | 99.000 2.601 4.245e-003 9.25721 9.26571 | |
145 | 99.900 3.341 5.453e-003 9.25601 9.26691 | |
146 | 99.990 3.973 6.484e-003 9.25498 9.26794 | |
147 | 99.999 4.537 7.404e-003 9.25406 9.26886 | |
148 | '''] | |
149 | ||
150 | As you can see the large sample size (195) and small standard deviation (0.023) | |
151 | have combined to give very small intervals, indeed we can be | |
152 | very confident that the true mean is 9.2. | |
153 | ||
154 | For comparison the next example data output is taken from | |
155 | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. | |
156 | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 | |
157 | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] | |
158 | The values result from the determination of mercury by cold-vapour | |
159 | atomic absorption. | |
160 | ||
161 | [pre''' | |
162 | __________________________________ | |
163 | 2-Sided Confidence Limits For Mean | |
164 | __________________________________ | |
165 | ||
166 | Number of Observations = 3 | |
167 | Mean = 37.8000000 | |
168 | Standard Deviation = 0.9643650 | |
169 | ||
170 | ||
171 | ___________________________________________________________________ | |
172 | Confidence T Interval Lower Upper | |
173 | Value (%) Value Width Limit Limit | |
174 | ___________________________________________________________________ | |
175 | 50.000 0.816 0.455 37.34539 38.25461 | |
176 | 75.000 1.604 0.893 36.90717 38.69283 | |
177 | 90.000 2.920 1.626 36.17422 39.42578 | |
178 | 95.000 4.303 2.396 35.40438 40.19562 | |
179 | 99.000 9.925 5.526 32.27408 43.32592 | |
180 | 99.900 31.599 17.594 20.20639 55.39361 | |
181 | 99.990 99.992 55.673 -17.87346 93.47346 | |
182 | 99.999 316.225 176.067 -138.26683 213.86683 | |
183 | '''] | |
184 | ||
185 | This time the fact that there are only three measurements leads to | |
186 | much wider intervals, indeed such large intervals that it's hard | |
187 | to be very confident in the location of the mean. | |
188 | ||
189 | [endsect] | |
190 | ||
191 | [section:tut_mean_test Testing a sample mean for difference from a "true" mean] | |
192 | ||
193 | When calibrating or comparing a scientific instrument or measurement method of some kind, | |
194 | we want to be answer the question "Does an observed sample mean differ from the | |
195 | "true" mean in any significant way?". If it does, then we have evidence of | |
196 | a systematic difference. This question can be answered with a Students-t test: | |
197 | more information can be found | |
198 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm | |
199 | on the NIST site]. | |
200 | ||
201 | Of course, the assignment of "true" to one mean may be quite arbitrary, | |
202 | often this is simply a "traditional" method of measurement. | |
203 | ||
204 | The following example code is taken from the example program | |
205 | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]. | |
206 | ||
207 | We'll begin by defining a procedure to determine which of the | |
208 | possible hypothesis are rejected or not-rejected | |
209 | at a given significance level: | |
210 | ||
211 | [note | |
212 | Non-statisticians might say 'not-rejected' means 'accepted', | |
213 | (often of the null-hypothesis) implying, wrongly, that there really *IS* no difference, | |
214 | but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'. | |
215 | 'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference. | |
216 | For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and | |
217 | [@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.] | |
218 | ] [/ note] | |
219 | ||
220 | ||
221 | // Needed includes: | |
222 | #include <boost/math/distributions/students_t.hpp> | |
223 | #include <iostream> | |
224 | #include <iomanip> | |
225 | // Bring everything into global namespace for ease of use: | |
226 | using namespace boost::math; | |
227 | using namespace std; | |
228 | ||
229 | void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha) | |
230 | { | |
231 | // | |
232 | // M = true mean. | |
233 | // Sm = Sample Mean. | |
234 | // Sd = Sample Standard Deviation. | |
235 | // Sn = Sample Size. | |
236 | // alpha = Significance Level. | |
237 | ||
238 | Most of the procedure is pretty-printing, so let's just focus on the | |
239 | calculation, we begin by calculating the t-statistic: | |
240 | ||
241 | // Difference in means: | |
242 | double diff = Sm - M; | |
243 | // Degrees of freedom: | |
244 | unsigned v = Sn - 1; | |
245 | // t-statistic: | |
246 | double t_stat = diff * sqrt(double(Sn)) / Sd; | |
247 | ||
248 | Finally calculate the probability from the t-statistic. If we're interested | |
249 | in simply whether there is a difference (either less or greater) or not, | |
250 | we don't care about the sign of the t-statistic, | |
251 | and we take the complement of the probability for comparison | |
252 | to the significance level: | |
253 | ||
254 | students_t dist(v); | |
255 | double q = cdf(complement(dist, fabs(t_stat))); | |
256 | ||
257 | The procedure then prints out the results of the various tests | |
258 | that can be done, these | |
259 | can be summarised in the following table: | |
260 | ||
261 | [table | |
262 | [[Hypothesis][Test]] | |
263 | [[The Null-hypothesis: there is | |
264 | *no difference* in means] | |
265 | [Reject if complement of CDF for |t| < significance level / 2: | |
266 | ||
267 | `cdf(complement(dist, fabs(t))) < alpha / 2`]] | |
268 | ||
269 | [[The Alternative-hypothesis: there | |
270 | *is difference* in means] | |
271 | [Reject if complement of CDF for |t| > significance level / 2: | |
272 | ||
273 | `cdf(complement(dist, fabs(t))) > alpha / 2`]] | |
274 | ||
275 | [[The Alternative-hypothesis: the sample mean *is less* than | |
276 | the true mean.] | |
277 | [Reject if CDF of t > 1 - significance level: | |
278 | ||
279 | `cdf(complement(dist, t)) < alpha`]] | |
280 | ||
281 | [[The Alternative-hypothesis: the sample mean *is greater* than | |
282 | the true mean.] | |
283 | [Reject if complement of CDF of t < significance level: | |
284 | ||
285 | `cdf(dist, t) < alpha`]] | |
286 | ] | |
287 | ||
288 | [note | |
289 | Notice that the comparisons are against `alpha / 2` for a two-sided test | |
290 | and against `alpha` for a one-sided test] | |
291 | ||
292 | Now that we have all the parts in place, let's take a look at some | |
293 | sample output, first using the | |
294 | [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm | |
295 | Heat flow data] from the NIST site. The data set was collected | |
296 | by Bob Zarr of NIST in January, 1990 from a heat flow meter | |
297 | calibration and stability analysis. The corresponding dataplot | |
298 | output for this test can be found in | |
299 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm | |
300 | section 3.5.2] of the __handbook. | |
301 | ||
302 | [pre | |
303 | __________________________________ | |
304 | Student t test for a single sample | |
305 | __________________________________ | |
306 | ||
307 | Number of Observations = 195 | |
308 | Sample Mean = 9.26146 | |
309 | Sample Standard Deviation = 0.02279 | |
310 | Expected True Mean = 5.00000 | |
311 | ||
312 | Sample Mean - Expected Test Mean = 4.26146 | |
313 | Degrees of Freedom = 194 | |
314 | T Statistic = 2611.28380 | |
315 | Probability that difference is due to chance = 0.000e+000 | |
316 | ||
317 | Results for Alternative Hypothesis and alpha = 0.0500 | |
318 | ||
319 | Alternative Hypothesis Conclusion | |
320 | Mean != 5.000 NOT REJECTED | |
321 | Mean < 5.000 REJECTED | |
322 | Mean > 5.000 NOT REJECTED | |
323 | ] | |
324 | ||
325 | You will note the line that says the probability that the difference is | |
326 | due to chance is zero. From a philosophical point of view, of course, | |
327 | the probability can never reach zero. However, in this case the calculated | |
328 | probability is smaller than the smallest representable double precision number, | |
329 | hence the appearance of a zero here. Whatever its "true" value is, we know it | |
330 | must be extraordinarily small, so the alternative hypothesis - that there is | |
331 | a difference in means - is not rejected. | |
332 | ||
333 | For comparison the next example data output is taken from | |
334 | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. | |
335 | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 | |
336 | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] | |
337 | The values result from the determination of mercury by cold-vapour | |
338 | atomic absorption. | |
339 | ||
340 | [pre | |
341 | __________________________________ | |
342 | Student t test for a single sample | |
343 | __________________________________ | |
344 | ||
345 | Number of Observations = 3 | |
346 | Sample Mean = 37.80000 | |
347 | Sample Standard Deviation = 0.96437 | |
348 | Expected True Mean = 38.90000 | |
349 | ||
350 | Sample Mean - Expected Test Mean = -1.10000 | |
351 | Degrees of Freedom = 2 | |
352 | T Statistic = -1.97566 | |
353 | Probability that difference is due to chance = 1.869e-001 | |
354 | ||
355 | Results for Alternative Hypothesis and alpha = 0.0500 | |
356 | ||
357 | Alternative Hypothesis Conclusion | |
358 | Mean != 38.900 REJECTED | |
359 | Mean < 38.900 NOT REJECTED | |
360 | Mean > 38.900 NOT REJECTED | |
361 | ] | |
362 | ||
363 | As you can see the small number of measurements (3) has led to a large uncertainty | |
364 | in the location of the true mean. So even though there appears to be a difference | |
365 | between the sample mean and the expected true mean, we conclude that there | |
366 | is no significant difference, and are unable to reject the null hypothesis. | |
367 | However, if we were to lower the bar for acceptance down to alpha = 0.1 | |
368 | (a 90% confidence level) we see a different output: | |
369 | ||
370 | [pre | |
371 | __________________________________ | |
372 | Student t test for a single sample | |
373 | __________________________________ | |
374 | ||
375 | Number of Observations = 3 | |
376 | Sample Mean = 37.80000 | |
377 | Sample Standard Deviation = 0.96437 | |
378 | Expected True Mean = 38.90000 | |
379 | ||
380 | Sample Mean - Expected Test Mean = -1.10000 | |
381 | Degrees of Freedom = 2 | |
382 | T Statistic = -1.97566 | |
383 | Probability that difference is due to chance = 1.869e-001 | |
384 | ||
385 | Results for Alternative Hypothesis and alpha = 0.1000 | |
386 | ||
387 | Alternative Hypothesis Conclusion | |
388 | Mean != 38.900 REJECTED | |
389 | Mean < 38.900 NOT REJECTED | |
390 | Mean > 38.900 REJECTED | |
391 | ] | |
392 | ||
393 | In this case, we really have a borderline result, | |
394 | and more data (and/or more accurate data), | |
395 | is needed for a more convincing conclusion. | |
396 | ||
397 | [endsect] | |
398 | ||
399 | [section:tut_mean_size Estimating how large a sample size would have to become | |
400 | in order to give a significant Students-t test result with a single sample test] | |
401 | ||
402 | Imagine you have conducted a Students-t test on a single sample in order | |
403 | to check for systematic errors in your measurements. Imagine that the | |
404 | result is borderline. At this point one might go off and collect more data, | |
405 | but it might be prudent to first ask the question "How much more?". | |
406 | The parameter estimators of the students_t_distribution class | |
407 | can provide this information. | |
408 | ||
409 | This section is based on the example code in | |
410 | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp] | |
411 | and we begin by defining a procedure that will print out a table of | |
412 | estimated sample sizes for various confidence levels: | |
413 | ||
414 | // Needed includes: | |
415 | #include <boost/math/distributions/students_t.hpp> | |
416 | #include <iostream> | |
417 | #include <iomanip> | |
418 | // Bring everything into global namespace for ease of use: | |
419 | using namespace boost::math; | |
420 | using namespace std; | |
421 | ||
422 | void single_sample_find_df( | |
423 | double M, // M = true mean. | |
424 | double Sm, // Sm = Sample Mean. | |
425 | double Sd) // Sd = Sample Standard Deviation. | |
426 | { | |
427 | ||
428 | Next we define a table of significance levels: | |
429 | ||
430 | double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 }; | |
431 | ||
432 | Printing out the table of sample sizes required for various confidence levels | |
433 | begins with the table header: | |
434 | ||
435 | cout << "\n\n" | |
436 | "_______________________________________________________________\n" | |
437 | "Confidence Estimated Estimated\n" | |
438 | " Value (%) Sample Size Sample Size\n" | |
439 | " (one sided test) (two sided test)\n" | |
440 | "_______________________________________________________________\n"; | |
441 | ||
442 | ||
443 | And now the important part: the sample sizes required. Class | |
444 | `students_t_distribution` has a static member function | |
445 | `find_degrees_of_freedom` that will calculate how large | |
446 | a sample size needs to be in order to give a definitive result. | |
447 | ||
448 | The first argument is the difference between the means that you | |
449 | wish to be able to detect, here it's the absolute value of the | |
450 | difference between the sample mean, and the true mean. | |
451 | ||
452 | Then come two probability values: alpha and beta. Alpha is the | |
453 | maximum acceptable risk of rejecting the null-hypothesis when it is | |
454 | in fact true. Beta is the maximum acceptable risk of failing to reject | |
455 | the null-hypothesis when in fact it is false. | |
456 | Also note that for a two-sided test, alpha must be divided by 2. | |
457 | ||
458 | The final parameter of the function is the standard deviation of the sample. | |
459 | ||
460 | In this example, we assume that alpha and beta are the same, and call | |
461 | `find_degrees_of_freedom` twice: once with alpha for a one-sided test, | |
462 | and once with alpha/2 for a two-sided test. | |
463 | ||
464 | for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i) | |
465 | { | |
466 | // Confidence value: | |
467 | cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]); | |
468 | // calculate df for single sided test: | |
469 | double df = students_t::find_degrees_of_freedom( | |
470 | fabs(M - Sm), alpha[i], alpha[i], Sd); | |
471 | // convert to sample size: | |
472 | double size = ceil(df) + 1; | |
473 | // Print size: | |
474 | cout << fixed << setprecision(0) << setw(16) << right << size; | |
475 | // calculate df for two sided test: | |
476 | df = students_t::find_degrees_of_freedom( | |
477 | fabs(M - Sm), alpha[i]/2, alpha[i], Sd); | |
478 | // convert to sample size: | |
479 | size = ceil(df) + 1; | |
480 | // Print size: | |
481 | cout << fixed << setprecision(0) << setw(16) << right << size << endl; | |
482 | } | |
483 | cout << endl; | |
484 | } | |
485 | ||
486 | Let's now look at some sample output using data taken from | |
487 | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. | |
488 | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 | |
489 | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] | |
490 | The values result from the determination of mercury by cold-vapour | |
491 | atomic absorption. | |
492 | ||
493 | Only three measurements were made, and the Students-t test above | |
494 | gave a borderline result, so this example | |
495 | will show us how many samples would need to be collected: | |
496 | ||
497 | [pre''' | |
498 | _____________________________________________________________ | |
499 | Estimated sample sizes required for various confidence levels | |
500 | _____________________________________________________________ | |
501 | ||
502 | True Mean = 38.90000 | |
503 | Sample Mean = 37.80000 | |
504 | Sample Standard Deviation = 0.96437 | |
505 | ||
506 | ||
507 | _______________________________________________________________ | |
508 | Confidence Estimated Estimated | |
509 | Value (%) Sample Size Sample Size | |
510 | (one sided test) (two sided test) | |
511 | _______________________________________________________________ | |
512 | 75.000 3 4 | |
513 | 90.000 7 9 | |
514 | 95.000 11 13 | |
515 | 99.000 20 22 | |
516 | 99.900 35 37 | |
517 | 99.990 50 53 | |
518 | 99.999 66 68 | |
519 | '''] | |
520 | ||
521 | So in this case, many more measurements would have had to be made, | |
522 | for example at the 95% level, 14 measurements in total for a two-sided test. | |
523 | ||
524 | [endsect] | |
525 | [section:two_sample_students_t Comparing the means of two samples with the Students-t test] | |
526 | ||
527 | Imagine that we have two samples, and we wish to determine whether | |
528 | their means are different or not. This situation often arises when | |
529 | determining whether a new process or treatment is better than an old one. | |
530 | ||
531 | In this example, we'll be using the | |
532 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm | |
533 | Car Mileage sample data] from the | |
534 | [@http://www.itl.nist.gov NIST website]. The data compares | |
535 | miles per gallon of US cars with miles per gallon of Japanese cars. | |
536 | ||
537 | The sample code is in | |
538 | [@../../example/students_t_two_samples.cpp students_t_two_samples.cpp]. | |
539 | ||
540 | There are two ways in which this test can be conducted: we can assume | |
541 | that the true standard deviations of the two samples are equal or not. | |
542 | If the standard deviations are assumed to be equal, then the calculation | |
543 | of the t-statistic is greatly simplified, so we'll examine that case first. | |
544 | In real life we should verify whether this assumption is valid with a | |
545 | Chi-Squared test for equal variances. | |
546 | ||
547 | We begin by defining a procedure that will conduct our test assuming equal | |
548 | variances: | |
549 | ||
550 | // Needed headers: | |
551 | #include <boost/math/distributions/students_t.hpp> | |
552 | #include <iostream> | |
553 | #include <iomanip> | |
554 | // Simplify usage: | |
555 | using namespace boost::math; | |
556 | using namespace std; | |
557 | ||
558 | void two_samples_t_test_equal_sd( | |
559 | double Sm1, // Sm1 = Sample 1 Mean. | |
560 | double Sd1, // Sd1 = Sample 1 Standard Deviation. | |
561 | unsigned Sn1, // Sn1 = Sample 1 Size. | |
562 | double Sm2, // Sm2 = Sample 2 Mean. | |
563 | double Sd2, // Sd2 = Sample 2 Standard Deviation. | |
564 | unsigned Sn2, // Sn2 = Sample 2 Size. | |
565 | double alpha) // alpha = Significance Level. | |
566 | { | |
567 | ||
568 | ||
569 | Our procedure will begin by calculating the t-statistic, assuming | |
570 | equal variances the needed formulae are: | |
571 | ||
572 | [equation dist_tutorial1] | |
573 | ||
574 | where Sp is the "pooled" standard deviation of the two samples, | |
575 | and /v/ is the number of degrees of freedom of the two combined | |
576 | samples. We can now write the code to calculate the t-statistic: | |
577 | ||
578 | // Degrees of freedom: | |
579 | double v = Sn1 + Sn2 - 2; | |
580 | cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n"; | |
581 | // Pooled variance: | |
582 | double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v); | |
583 | cout << setw(55) << left << "Pooled Standard Deviation" << "= " << sp << "\n"; | |
584 | // t-statistic: | |
585 | double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2)); | |
586 | cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n"; | |
587 | ||
588 | The next step is to define our distribution object, and calculate the | |
589 | complement of the probability: | |
590 | ||
591 | students_t dist(v); | |
592 | double q = cdf(complement(dist, fabs(t_stat))); | |
593 | cout << setw(55) << left << "Probability that difference is due to chance" << "= " | |
594 | << setprecision(3) << scientific << 2 * q << "\n\n"; | |
595 | ||
596 | Here we've used the absolute value of the t-statistic, because we initially | |
597 | want to know simply whether there is a difference or not (a two-sided test). | |
598 | However, we can also test whether the mean of the second sample is greater | |
599 | or is less (one-sided test) than that of the first: | |
600 | all the possible tests are summed up in the following table: | |
601 | ||
602 | [table | |
603 | [[Hypothesis][Test]] | |
604 | [[The Null-hypothesis: there is | |
605 | *no difference* in means] | |
606 | [Reject if complement of CDF for |t| < significance level / 2: | |
607 | ||
608 | `cdf(complement(dist, fabs(t))) < alpha / 2`]] | |
609 | ||
610 | [[The Alternative-hypothesis: there is a | |
611 | *difference* in means] | |
612 | [Reject if complement of CDF for |t| > significance level / 2: | |
613 | ||
614 | `cdf(complement(dist, fabs(t))) < alpha / 2`]] | |
615 | ||
616 | [[The Alternative-hypothesis: Sample 1 Mean is *less* than | |
617 | Sample 2 Mean.] | |
618 | [Reject if CDF of t > significance level: | |
619 | ||
620 | `cdf(dist, t) > alpha`]] | |
621 | ||
622 | [[The Alternative-hypothesis: Sample 1 Mean is *greater* than | |
623 | Sample 2 Mean.] | |
624 | ||
625 | [Reject if complement of CDF of t > significance level: | |
626 | ||
627 | `cdf(complement(dist, t)) > alpha`]] | |
628 | ] | |
629 | ||
630 | [note | |
631 | For a two-sided test we must compare against alpha / 2 and not alpha.] | |
632 | ||
633 | Most of the rest of the sample program is pretty-printing, so we'll | |
634 | skip over that, and take a look at the sample output for alpha=0.05 | |
635 | (a 95% probability level). For comparison the dataplot output | |
636 | for the same data is in | |
637 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm | |
638 | section 1.3.5.3] of the __handbook. | |
639 | ||
640 | [pre''' | |
641 | ________________________________________________ | |
642 | Student t test for two samples (equal variances) | |
643 | ________________________________________________ | |
644 | ||
645 | Number of Observations (Sample 1) = 249 | |
646 | Sample 1 Mean = 20.145 | |
647 | Sample 1 Standard Deviation = 6.4147 | |
648 | Number of Observations (Sample 2) = 79 | |
649 | Sample 2 Mean = 30.481 | |
650 | Sample 2 Standard Deviation = 6.1077 | |
651 | Degrees of Freedom = 326 | |
652 | Pooled Standard Deviation = 6.3426 | |
653 | T Statistic = -12.621 | |
654 | Probability that difference is due to chance = 5.273e-030 | |
655 | ||
656 | Results for Alternative Hypothesis and alpha = 0.0500''' | |
657 | ||
658 | Alternative Hypothesis Conclusion | |
659 | Sample 1 Mean != Sample 2 Mean NOT REJECTED | |
660 | Sample 1 Mean < Sample 2 Mean NOT REJECTED | |
661 | Sample 1 Mean > Sample 2 Mean REJECTED | |
662 | ] | |
663 | ||
664 | So with a probability that the difference is due to chance of just | |
665 | 5.273e-030, we can safely conclude that there is indeed a difference. | |
666 | ||
667 | The tests on the alternative hypothesis show that we must | |
668 | also reject the hypothesis that Sample 1 Mean is | |
669 | greater than that for Sample 2: in this case Sample 1 represents the | |
670 | miles per gallon for Japanese cars, and Sample 2 the miles per gallon for | |
671 | US cars, so we conclude that Japanese cars are on average more | |
672 | fuel efficient. | |
673 | ||
674 | Now that we have the simple case out of the way, let's look for a moment | |
675 | at the more complex one: that the standard deviations of the two samples | |
676 | are not equal. In this case the formula for the t-statistic becomes: | |
677 | ||
678 | [equation dist_tutorial2] | |
679 | ||
680 | And for the combined degrees of freedom we use the | |
681 | [@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite] | |
682 | approximation: | |
683 | ||
684 | [equation dist_tutorial3] | |
685 | ||
686 | Note that this is one of the rare situations where the degrees-of-freedom | |
687 | parameter to the Student's t distribution is a real number, and not an | |
688 | integer value. | |
689 | ||
690 | [note | |
691 | Some statistical packages truncate the effective degrees of freedom to | |
692 | an integer value: this may be necessary if you are relying on lookup tables, | |
693 | but since our code fully supports non-integer degrees of freedom there is no | |
694 | need to truncate in this case. Also note that when the degrees of freedom | |
695 | is small then the Welch-Satterthwaite approximation may be a significant | |
696 | source of error.] | |
697 | ||
698 | Putting these formulae into code we get: | |
699 | ||
700 | // Degrees of freedom: | |
701 | double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2; | |
702 | v *= v; | |
703 | double t1 = Sd1 * Sd1 / Sn1; | |
704 | t1 *= t1; | |
705 | t1 /= (Sn1 - 1); | |
706 | double t2 = Sd2 * Sd2 / Sn2; | |
707 | t2 *= t2; | |
708 | t2 /= (Sn2 - 1); | |
709 | v /= (t1 + t2); | |
710 | cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n"; | |
711 | // t-statistic: | |
712 | double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2); | |
713 | cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n"; | |
714 | ||
715 | Thereafter the code and the tests are performed the same as before. Using | |
716 | are car mileage data again, here's what the output looks like: | |
717 | ||
718 | [pre''' | |
719 | __________________________________________________ | |
720 | Student t test for two samples (unequal variances) | |
721 | __________________________________________________ | |
722 | ||
723 | Number of Observations (Sample 1) = 249 | |
724 | Sample 1 Mean = 20.145 | |
725 | Sample 1 Standard Deviation = 6.4147 | |
726 | Number of Observations (Sample 2) = 79 | |
727 | Sample 2 Mean = 30.481 | |
728 | Sample 2 Standard Deviation = 6.1077 | |
729 | Degrees of Freedom = 136.87 | |
730 | T Statistic = -12.946 | |
731 | Probability that difference is due to chance = 1.571e-025 | |
732 | ||
733 | Results for Alternative Hypothesis and alpha = 0.0500''' | |
734 | ||
735 | Alternative Hypothesis Conclusion | |
736 | Sample 1 Mean != Sample 2 Mean NOT REJECTED | |
737 | Sample 1 Mean < Sample 2 Mean NOT REJECTED | |
738 | Sample 1 Mean > Sample 2 Mean REJECTED | |
739 | ] | |
740 | ||
741 | This time allowing the variances in the two samples to differ has yielded | |
742 | a higher likelihood that the observed difference is down to chance alone | |
743 | (1.571e-025 compared to 5.273e-030 when equal variances were assumed). | |
744 | However, the conclusion remains the same: US cars are less fuel efficient | |
745 | than Japanese models. | |
746 | ||
747 | [endsect] | |
748 | [section:paired_st Comparing two paired samples with the Student's t distribution] | |
749 | ||
750 | Imagine that we have a before and after reading for each item in the sample: | |
751 | for example we might have measured blood pressure before and after administration | |
752 | of a new drug. We can't pool the results and compare the means before and after | |
753 | the change, because each patient will have a different baseline reading. | |
754 | Instead we calculate the difference between before and after measurements | |
755 | in each patient, and calculate the mean and standard deviation of the differences. | |
756 | To test whether a significant change has taken place, we can then test | |
757 | the null-hypothesis that the true mean is zero using the same procedure | |
758 | we used in the single sample cases previously discussed. | |
759 | ||
760 | That means we can: | |
761 | ||
762 | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean]. | |
763 | If the endpoints of the interval differ in sign then we are unable to reject | |
764 | the null-hypothesis that there is no change. | |
765 | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the | |
766 | result is consistent with a true mean of zero, then we are unable to reject the | |
767 | null-hypothesis that there is no change. | |
768 | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need | |
769 | in order to obtain a significant result]. | |
770 | ||
771 | [endsect] | |
772 | ||
773 | [endsect][/section:st_eg Student's t] | |
774 | ||
775 | [/ | |
776 | Copyright 2006, 2012 John Maddock and Paul A. Bristow. | |
777 | Distributed under the Boost Software License, Version 1.0. | |
778 | (See accompanying file LICENSE_1_0.txt or copy at | |
779 | http://www.boost.org/LICENSE_1_0.txt). | |
780 | ] | |
781 |