ceph/src/boost/libs/math/doc/distributions/students_t_examples.qbk

   1
   2 [section:st_eg Student's t Distribution Examples]
   3
   4 [section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution]
   5
   6 Let's say you have a sample mean, you may wish to know what confidence intervals
   7 you can place on that mean.  Colloquially: "I want an interval that I can be
   8 P% sure contains the true mean".  (On a technical point, note that
   9 the interval either contains the true mean or it does not: the
  10 meaning of the confidence level is subtly
  11 different from this colloquialism.  More background information can be found on the
  12 [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]).
  13
  14 The formula for the interval can be expressed as:
  15
  16 [equation dist_tutorial4]
  17
  18 Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation,
  19 /N/ is the sample size, /[alpha]/ is the desired significance level and
  20 ['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t
  21 distribution with /N-1/ degrees of freedom.
  22
  23 [note
  24 The quantity [alpha][space] is the maximum acceptable risk of falsely rejecting
  25 the null-hypothesis.  The smaller the value of [alpha] the greater the
  26 strength of the test.
  27
  28 The confidence level of the test is defined as 1 - [alpha], and often expressed
  29 as a percentage.  So for example a significance level of 0.05, is equivalent
  30 to a 95% confidence level.  Refer to
  31 [@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
  32 "What are confidence intervals?"] in __handbook for more information.
  33 ] [/ Note]
  34
  35 [note
  36 The usual assumptions of
  37 [@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)]
  38 variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution]
  39 of course apply here, as they do in other examples.
  40 ]
  41
  42 From the formula, it should be clear that:
  43
  44 * The width of the confidence interval decreases as the sample size increases.
  45 * The width increases as the standard deviation increases.
  46 * The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger).
  47 * The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger).
  48
  49 The following example code is taken from the example program
  50 [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
  51
  52 We'll begin by defining a procedure to calculate intervals for
  53 various confidence levels; the procedure will print these out
  54 as a table:
  55
  56    // Needed includes:
  57    #include <boost/math/distributions/students_t.hpp>
  58    #include <iostream>
  59    #include <iomanip>
  60    // Bring everything into global namespace for ease of use:
  61    using namespace boost::math;
  62    using namespace std;
  63
  64    void confidence_limits_on_mean(
  65       double Sm,           // Sm = Sample Mean.
  66       double Sd,           // Sd = Sample Standard Deviation.
  67       unsigned Sn)         // Sn = Sample Size.
  68    {
  69       using namespace std;
  70       using namespace boost::math;
  71
  72       // Print out general info:
  73       cout <<
  74          "__________________________________\n"
  75          "2-Sided Confidence Limits For Mean\n"
  76          "__________________________________\n\n";
  77       cout << setprecision(7);
  78       cout << setw(40) << left << "Number of Observations" << "=  " << Sn << "\n";
  79       cout << setw(40) << left << "Mean" << "=  " << Sm << "\n";
  80       cout << setw(40) << left << "Standard Deviation" << "=  " << Sd << "\n";
  81
  82 We'll define a table of significance/risk levels for which we'll compute intervals:
  83
  84       double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
  85
  86 Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999).
  87
  88 Next we'll declare the distribution object we'll need, note that
  89 the /degrees of freedom/ parameter is the sample size less one:
  90
  91       students_t dist(Sn - 1);
  92
  93 Most of what follows in the program is pretty printing, so let's focus
  94 on the calculation of the interval. First we need the t-statistic,
  95 computed using the /quantile/ function and our significance level.  Note
  96 that since the significance levels are the complement of the probability,
  97 we have to wrap the arguments in a call to /complement(...)/:
  98
  99    double T = quantile(complement(dist, alpha[i] / 2));
 100
 101 Note that alpha was divided by two, since we'll be calculating
 102 both the upper and lower bounds: had we been interested in a single
 103 sided interval then we would have omitted this step.
 104
 105 Now to complete the picture, we'll get the (one-sided) width of the
 106 interval from the t-statistic
 107 by multiplying by the standard deviation, and dividing by the square
 108 root of the sample size:
 109
 110    double w = T * Sd / sqrt(double(Sn));
 111
 112 The two-sided interval is then the sample mean plus and minus this width.
 113
 114 And apart from some more pretty-printing that completes the procedure.
 115
 116 Let's take a look at some sample output, first using the
 117 [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
 118 Heat flow data] from the NIST site.  The data set was collected
 119 by Bob Zarr of NIST in January, 1990 from a heat flow meter
 120 calibration and stability analysis.
 121 The corresponding dataplot
 122 output for this test can be found in
 123 [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
 124 section 3.5.2] of the __handbook.
 125
 126 [pre'''
 127    __________________________________
 128    2-Sided Confidence Limits For Mean
 129    __________________________________
 130
 131    Number of Observations                  =  195
 132    Mean                                    =  9.26146
 133    Standard Deviation                      =  0.02278881
 134
 135
 136    ___________________________________________________________________
 137    Confidence       T           Interval          Lower          Upper
 138     Value (%)     Value          Width            Limit          Limit
 139    ___________________________________________________________________
 140        50.000     0.676       1.103e-003        9.26036        9.26256
 141        75.000     1.154       1.883e-003        9.25958        9.26334
 142        90.000     1.653       2.697e-003        9.25876        9.26416
 143        95.000     1.972       3.219e-003        9.25824        9.26468
 144        99.000     2.601       4.245e-003        9.25721        9.26571
 145        99.900     3.341       5.453e-003        9.25601        9.26691
 146        99.990     3.973       6.484e-003        9.25498        9.26794
 147        99.999     4.537       7.404e-003        9.25406        9.26886
 148 ''']
 149
 150 As you can see the large sample size (195) and small standard deviation (0.023)
 151 have combined to give very small intervals, indeed we can be
 152 very confident that the true mean is 9.2.
 153
 154 For comparison the next example data output is taken from
 155 ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
 156 and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
 157 J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
 158 The values result from the determination of mercury by cold-vapour
 159 atomic absorption.
 160
 161 [pre'''
 162    __________________________________
 163    2-Sided Confidence Limits For Mean
 164    __________________________________
 165
 166    Number of Observations                  =  3
 167    Mean                                    =  37.8000000
 168    Standard Deviation                      =  0.9643650
 169
 170
 171    ___________________________________________________________________
 172    Confidence       T           Interval          Lower          Upper
 173     Value (%)     Value          Width            Limit          Limit
 174    ___________________________________________________________________
 175        50.000     0.816            0.455       37.34539       38.25461
 176        75.000     1.604            0.893       36.90717       38.69283
 177        90.000     2.920            1.626       36.17422       39.42578
 178        95.000     4.303            2.396       35.40438       40.19562
 179        99.000     9.925            5.526       32.27408       43.32592
 180        99.900    31.599           17.594       20.20639       55.39361
 181        99.990    99.992           55.673      -17.87346       93.47346
 182        99.999   316.225          176.067     -138.26683      213.86683
 183 ''']
 184
 185 This time the fact that there are only three measurements leads to
 186 much wider intervals, indeed such large intervals that it's hard
 187 to be very confident in the location of the mean.
 188
 189 [endsect]
 190
 191 [section:tut_mean_test Testing a sample mean for difference from a "true" mean]
 192
 193 When calibrating or comparing a scientific instrument or measurement method of some kind,
 194 we want to be answer the question "Does an observed sample mean differ from the
 195 "true" mean in any significant way?".  If it does, then we have evidence of
 196 a systematic difference.  This question can be answered with a Students-t test:
 197 more information can be found
 198 [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
 199 on the NIST site].
 200
 201 Of course, the assignment of "true" to one mean may be quite arbitrary,
 202 often this is simply a "traditional" method of measurement.
 203
 204 The following example code is taken from the example program
 205 [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
 206
 207 We'll begin by defining a procedure to determine which of the
 208 possible hypothesis are rejected or not-rejected
 209 at a given significance level:
 210
 211 [note
 212 Non-statisticians might say 'not-rejected' means 'accepted',
 213 (often of the null-hypothesis) implying, wrongly, that there really *IS* no difference,
 214 but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'.
 215 'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference.
 216 For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and
 217 [@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.]
 218 ] [/ note]
 219
 220
 221    // Needed includes:
 222    #include <boost/math/distributions/students_t.hpp>
 223    #include <iostream>
 224    #include <iomanip>
 225    // Bring everything into global namespace for ease of use:
 226    using namespace boost::math;
 227    using namespace std;
 228
 229    void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha)
 230    {
 231       //
 232       // M = true mean.
 233       // Sm = Sample Mean.
 234       // Sd = Sample Standard Deviation.
 235       // Sn = Sample Size.
 236       // alpha = Significance Level.
 237
 238 Most of the procedure is pretty-printing, so let's just focus on the
 239 calculation, we begin by calculating the t-statistic:
 240
 241    // Difference in means:
 242    double diff = Sm - M;
 243    // Degrees of freedom:
 244    unsigned v = Sn - 1;
 245    // t-statistic:
 246    double t_stat = diff * sqrt(double(Sn)) / Sd;
 247
 248 Finally calculate the probability from the t-statistic. If we're interested
 249 in simply whether there is a difference (either less or greater) or not,
 250 we don't care about the sign of the t-statistic,
 251 and we take the complement of the probability for comparison
 252 to the significance level:
 253
 254    students_t dist(v);
 255    double q = cdf(complement(dist, fabs(t_stat)));
 256
 257 The procedure then prints out the results of the various tests
 258 that can be done, these
 259 can be summarised in the following table:
 260
 261 [table
 262 [[Hypothesis][Test]]
 263 [[The Null-hypothesis: there is
 264 *no difference* in means]
 265 [Reject if complement of CDF for |t| < significance level / 2:
 266
 267 `cdf(complement(dist, fabs(t))) < alpha / 2`]]
 268
 269 [[The Alternative-hypothesis: there
 270 *is difference* in means]
 271 [Reject if complement of CDF for |t| > significance level / 2:
 272
 273 `cdf(complement(dist, fabs(t))) > alpha / 2`]]
 274
 275 [[The Alternative-hypothesis: the sample mean *is less* than
 276 the true mean.]
 277 [Reject if CDF of t > 1 - significance level:
 278
 279 `cdf(complement(dist, t)) < alpha`]]
 280
 281 [[The Alternative-hypothesis: the sample mean *is greater* than
 282 the true mean.]
 283 [Reject if complement of CDF of t < significance level:
 284
 285 `cdf(dist, t) < alpha`]]
 286 ]
 287
 288 [note
 289 Notice that the comparisons are against `alpha / 2` for a two-sided test
 290 and against `alpha` for a one-sided test]
 291
 292 Now that we have all the parts in place, let's take a look at some
 293 sample output, first using the
 294 [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
 295 Heat flow data] from the NIST site.  The data set was collected
 296 by Bob Zarr of NIST in January, 1990 from a heat flow meter
 297 calibration and stability analysis.  The corresponding dataplot
 298 output for this test can be found in
 299 [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
 300 section 3.5.2] of the __handbook.
 301
 302 [pre
 303 __________________________________
 304 Student t test for a single sample
 305 __________________________________
 306
 307 Number of Observations                                 =  195
 308 Sample Mean                                            =  9.26146
 309 Sample Standard Deviation                              =  0.02279
 310 Expected True Mean                                     =  5.00000
 311
 312 Sample Mean - Expected Test Mean                       =  4.26146
 313 Degrees of Freedom                                     =  194
 314 T Statistic                                            =  2611.28380
 315 Probability that difference is due to chance           =  0.000e+000
 316
 317 Results for Alternative Hypothesis and alpha           =  0.0500
 318
 319 Alternative Hypothesis     Conclusion
 320 Mean != 5.000            NOT REJECTED
 321 Mean  < 5.000            REJECTED
 322 Mean  > 5.000            NOT REJECTED
 323 ]
 324
 325 You will note the line that says the probability that the difference is
 326 due to chance is zero.  From a philosophical point of view, of course,
 327 the probability can never reach zero.  However, in this case the calculated
 328 probability is smaller than the smallest representable double precision number,
 329 hence the appearance of a zero here.  Whatever its "true" value is, we know it
 330 must be extraordinarily small, so the alternative hypothesis - that there is
 331 a difference in means - is not rejected.
 332
 333 For comparison the next example data output is taken from
 334 ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
 335 and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
 336 J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
 337 The values result from the determination of mercury by cold-vapour
 338 atomic absorption.
 339
 340 [pre
 341 __________________________________
 342 Student t test for a single sample
 343 __________________________________
 344
 345 Number of Observations                                 =  3
 346 Sample Mean                                            =  37.80000
 347 Sample Standard Deviation                              =  0.96437
 348 Expected True Mean                                     =  38.90000
 349
 350 Sample Mean - Expected Test Mean                       =  -1.10000
 351 Degrees of Freedom                                     =  2
 352 T Statistic                                            =  -1.97566
 353 Probability that difference is due to chance           =  1.869e-001
 354
 355 Results for Alternative Hypothesis and alpha           =  0.0500
 356
 357 Alternative Hypothesis     Conclusion
 358 Mean != 38.900            REJECTED
 359 Mean  < 38.900            NOT REJECTED
 360 Mean  > 38.900            NOT REJECTED
 361 ]
 362
 363 As you can see the small number of measurements (3) has led to a large uncertainty
 364 in the location of the true mean.  So even though there appears to be a difference
 365 between the sample mean and the expected true mean, we conclude that there
 366 is no significant difference, and are unable to reject the null hypothesis.
 367 However, if we were to lower the bar for acceptance down to alpha = 0.1
 368 (a 90% confidence level) we see a different output:
 369
 370 [pre
 371 __________________________________
 372 Student t test for a single sample
 373 __________________________________
 374
 375 Number of Observations                                 =  3
 376 Sample Mean                                            =  37.80000
 377 Sample Standard Deviation                              =  0.96437
 378 Expected True Mean                                     =  38.90000
 379
 380 Sample Mean - Expected Test Mean                       =  -1.10000
 381 Degrees of Freedom                                     =  2
 382 T Statistic                                            =  -1.97566
 383 Probability that difference is due to chance           =  1.869e-001
 384
 385 Results for Alternative Hypothesis and alpha           =  0.1000
 386
 387 Alternative Hypothesis     Conclusion
 388 Mean != 38.900            REJECTED
 389 Mean  < 38.900            NOT REJECTED
 390 Mean  > 38.900            REJECTED
 391 ]
 392
 393 In this case, we really have a borderline result,
 394 and more data (and/or more accurate data),
 395 is needed for a more convincing conclusion.
 396
 397 [endsect]
 398
 399 [section:tut_mean_size Estimating how large a sample size would have to become
 400 in order to give a significant Students-t test result with a single sample test]
 401
 402 Imagine you have conducted a Students-t test on a single sample in order
 403 to check for systematic errors in your measurements.  Imagine that the
 404 result is borderline.  At this point one might go off and collect more data,
 405 but it might be prudent to first ask the question "How much more?".
 406 The parameter estimators of the students_t_distribution class
 407 can provide this information.
 408
 409 This section is based on the example code in
 410 [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]
 411 and we begin by defining a procedure that will print out a table of
 412 estimated sample sizes for various confidence levels:
 413
 414    // Needed includes:
 415    #include <boost/math/distributions/students_t.hpp>
 416    #include <iostream>
 417    #include <iomanip>
 418    // Bring everything into global namespace for ease of use:
 419    using namespace boost::math;
 420    using namespace std;
 421
 422    void single_sample_find_df(
 423       double M,          // M = true mean.
 424       double Sm,         // Sm = Sample Mean.
 425       double Sd)         // Sd = Sample Standard Deviation.
 426    {
 427
 428 Next we define a table of significance levels:
 429
 430       double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
 431
 432 Printing out the table of sample sizes required for various confidence levels
 433 begins with the table header:
 434
 435       cout << "\n\n"
 436               "_______________________________________________________________\n"
 437               "Confidence       Estimated          Estimated\n"
 438               " Value (%)      Sample Size        Sample Size\n"
 439               "              (one sided test)    (two sided test)\n"
 440               "_______________________________________________________________\n";
 441
 442
 443 And now the important part: the sample sizes required.  Class
 444 `students_t_distribution` has a static member function
 445 `find_degrees_of_freedom` that will calculate how large
 446 a sample size needs to be in order to give a definitive result.
 447
 448 The first argument is the difference between the means that you
 449 wish to be able to detect, here it's the absolute value of the
 450 difference between the sample mean, and the true mean.
 451
 452 Then come two probability values: alpha and beta.  Alpha is the
 453 maximum acceptable risk of rejecting the null-hypothesis when it is
 454 in fact true.  Beta is the maximum acceptable risk of failing to reject
 455 the null-hypothesis when in fact it is false.
 456 Also note that for a two-sided test, alpha must be divided by 2.
 457
 458 The final parameter of the function is the standard deviation of the sample.
 459
 460 In this example, we assume that alpha and beta are the same, and call
 461 `find_degrees_of_freedom` twice: once with alpha for a one-sided test,
 462 and once with alpha/2 for a two-sided test.
 463
 464       for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
 465       {
 466          // Confidence value:
 467          cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
 468          // calculate df for single sided test:
 469          double df = students_t::find_degrees_of_freedom(
 470             fabs(M - Sm), alpha[i], alpha[i], Sd);
 471          // convert to sample size:
 472          double size = ceil(df) + 1;
 473          // Print size:
 474          cout << fixed << setprecision(0) << setw(16) << right << size;
 475          // calculate df for two sided test:
 476          df = students_t::find_degrees_of_freedom(
 477             fabs(M - Sm), alpha[i]/2, alpha[i], Sd);
 478          // convert to sample size:
 479          size = ceil(df) + 1;
 480          // Print size:
 481          cout << fixed << setprecision(0) << setw(16) << right << size << endl;
 482       }
 483       cout << endl;
 484    }
 485
 486 Let's now look at some sample output using data taken from
 487 ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
 488 and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
 489 J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
 490 The values result from the determination of mercury by cold-vapour
 491 atomic absorption.
 492
 493 Only three measurements were made, and the Students-t test above
 494 gave a borderline result, so this example
 495 will show us how many samples would need to be collected:
 496
 497 [pre'''
 498 _____________________________________________________________
 499 Estimated sample sizes required for various confidence levels
 500 _____________________________________________________________
 501
 502 True Mean                               =  38.90000
 503 Sample Mean                             =  37.80000
 504 Sample Standard Deviation               =  0.96437
 505
 506
 507 _______________________________________________________________
 508 Confidence       Estimated          Estimated
 509  Value (%)      Sample Size        Sample Size
 510               (one sided test)    (two sided test)
 511 _______________________________________________________________
 512     75.000               3               4
 513     90.000               7               9
 514     95.000              11              13
 515     99.000              20              22
 516     99.900              35              37
 517     99.990              50              53
 518     99.999              66              68
 519 ''']
 520
 521 So in this case, many more measurements would have had to be made,
 522 for example at the 95% level, 14 measurements in total for a two-sided test.
 523
 524 [endsect]
 525 [section:two_sample_students_t Comparing the means of two samples with the Students-t test]
 526
 527 Imagine that we have two samples, and we wish to determine whether
 528 their means are different or not.  This situation often arises when
 529 determining whether a new process or treatment is better than an old one.
 530
 531 In this example, we'll be using the
 532 [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm
 533 Car Mileage sample data] from the
 534 [@http://www.itl.nist.gov NIST website].  The data compares
 535 miles per gallon of US cars with miles per gallon of Japanese cars.
 536
 537 The sample code is in
 538 [@../../example/students_t_two_samples.cpp students_t_two_samples.cpp].
 539
 540 There are two ways in which this test can be conducted: we can assume
 541 that the true standard deviations of the two samples are equal or not.
 542 If the standard deviations are assumed to be equal, then the calculation
 543 of the t-statistic is greatly simplified, so we'll examine that case first.
 544 In real life we should verify whether this assumption is valid with a
 545 Chi-Squared test for equal variances.
 546
 547 We begin by defining a procedure that will conduct our test assuming equal
 548 variances:
 549
 550    // Needed headers:
 551    #include <boost/math/distributions/students_t.hpp>
 552    #include <iostream>
 553    #include <iomanip>
 554    // Simplify usage:
 555    using namespace boost::math;
 556    using namespace std;
 557
 558    void two_samples_t_test_equal_sd(
 559            double Sm1,       // Sm1 = Sample 1 Mean.
 560            double Sd1,       // Sd1 = Sample 1 Standard Deviation.
 561            unsigned Sn1,     // Sn1 = Sample 1 Size.
 562            double Sm2,       // Sm2 = Sample 2 Mean.
 563            double Sd2,       // Sd2 = Sample 2 Standard Deviation.
 564            unsigned Sn2,     // Sn2 = Sample 2 Size.
 565            double alpha)     // alpha = Significance Level.
 566    {
 567
 568
 569 Our procedure will begin by calculating the t-statistic, assuming
 570 equal variances the needed formulae are:
 571
 572 [equation dist_tutorial1]
 573
 574 where Sp is the "pooled" standard deviation of the two samples,
 575 and /v/ is the number of degrees of freedom of the two combined
 576 samples.  We can now write the code to calculate the t-statistic:
 577
 578    // Degrees of freedom:
 579    double v = Sn1 + Sn2 - 2;
 580    cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";
 581    // Pooled variance:
 582    double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v);
 583    cout << setw(55) << left << "Pooled Standard Deviation" << "=  " << sp << "\n";
 584    // t-statistic:
 585    double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2));
 586    cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";
 587
 588 The next step is to define our distribution object, and calculate the
 589 complement of the probability:
 590
 591    students_t dist(v);
 592    double q = cdf(complement(dist, fabs(t_stat)));
 593    cout << setw(55) << left << "Probability that difference is due to chance" << "=  "
 594       << setprecision(3) << scientific << 2 * q << "\n\n";
 595
 596 Here we've used the absolute value of the t-statistic, because we initially
 597 want to know simply whether there is a difference or not (a two-sided test).
 598 However, we can also test whether the mean of the second sample is greater
 599 or is less (one-sided test) than that of the first:
 600 all the possible tests are summed up in the following table:
 601
 602 [table
 603 [[Hypothesis][Test]]
 604 [[The Null-hypothesis: there is
 605 *no difference* in means]
 606 [Reject if complement of CDF for |t| < significance level / 2:
 607
 608 `cdf(complement(dist, fabs(t))) < alpha / 2`]]
 609
 610 [[The Alternative-hypothesis: there is a
 611 *difference* in means]
 612 [Reject if complement of CDF for |t| > significance level / 2:
 613
 614 `cdf(complement(dist, fabs(t))) < alpha / 2`]]
 615
 616 [[The Alternative-hypothesis: Sample 1 Mean is *less* than
 617 Sample 2 Mean.]
 618 [Reject if CDF of t > significance level:
 619
 620 `cdf(dist, t) > alpha`]]
 621
 622 [[The Alternative-hypothesis: Sample 1 Mean is *greater* than
 623 Sample 2 Mean.]
 624
 625 [Reject if complement of CDF of t > significance level:
 626
 627 `cdf(complement(dist, t)) > alpha`]]
 628 ]
 629
 630 [note
 631 For a two-sided test we must compare against alpha / 2 and not alpha.]
 632
 633 Most of the rest of the sample program is pretty-printing, so we'll
 634 skip over that, and take a look at the sample output for alpha=0.05
 635 (a 95% probability level).  For comparison the dataplot output
 636 for the same data is in
 637 [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm
 638 section 1.3.5.3] of the __handbook.
 639
 640 [pre'''
 641    ________________________________________________
 642    Student t test for two samples (equal variances)
 643    ________________________________________________
 644
 645    Number of Observations (Sample 1)                      =  249
 646    Sample 1 Mean                                          =  20.145
 647    Sample 1 Standard Deviation                            =  6.4147
 648    Number of Observations (Sample 2)                      =  79
 649    Sample 2 Mean                                          =  30.481
 650    Sample 2 Standard Deviation                            =  6.1077
 651    Degrees of Freedom                                     =  326
 652    Pooled Standard Deviation                              =  6.3426
 653    T Statistic                                            =  -12.621
 654    Probability that difference is due to chance           =  5.273e-030
 655
 656    Results for Alternative Hypothesis and alpha           =  0.0500'''
 657
 658    Alternative Hypothesis              Conclusion
 659    Sample 1 Mean != Sample 2 Mean       NOT REJECTED
 660    Sample 1 Mean <  Sample 2 Mean       NOT REJECTED
 661    Sample 1 Mean >  Sample 2 Mean       REJECTED
 662 ]
 663
 664 So with a probability that the difference is due to chance of just
 665 5.273e-030, we can safely conclude that there is indeed a difference.
 666
 667 The tests on the alternative hypothesis show that we must
 668 also reject the hypothesis that Sample 1 Mean is
 669 greater than that for Sample 2: in this case Sample 1 represents the
 670 miles per gallon for Japanese cars, and Sample 2 the miles per gallon for
 671 US cars, so we conclude that Japanese cars are on average more
 672 fuel efficient.
 673
 674 Now that we have the simple case out of the way, let's look for a moment
 675 at the more complex one: that the standard deviations of the two samples
 676 are not equal.  In this case the formula for the t-statistic becomes:
 677
 678 [equation dist_tutorial2]
 679
 680 And for the combined degrees of freedom we use the
 681 [@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite]
 682 approximation:
 683
 684 [equation dist_tutorial3]
 685
 686 Note that this is one of the rare situations where the degrees-of-freedom
 687 parameter to the Student's t distribution is a real number, and not an
 688 integer value.
 689
 690 [note
 691 Some statistical packages truncate the effective degrees of freedom to
 692 an integer value: this may be necessary if you are relying on lookup tables,
 693 but since our code fully supports non-integer degrees of freedom there is no
 694 need to truncate in this case.  Also note that when the degrees of freedom
 695 is small then the Welch-Satterthwaite approximation may be a significant
 696 source of error.]
 697
 698 Putting these formulae into code we get:
 699
 700    // Degrees of freedom:
 701    double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2;
 702    v *= v;
 703    double t1 = Sd1 * Sd1 / Sn1;
 704    t1 *= t1;
 705    t1 /=  (Sn1 - 1);
 706    double t2 = Sd2 * Sd2 / Sn2;
 707    t2 *= t2;
 708    t2 /= (Sn2 - 1);
 709    v /= (t1 + t2);
 710    cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";
 711    // t-statistic:
 712    double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2);
 713    cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";
 714
 715 Thereafter the code and the tests are performed the same as before.  Using
 716 are car mileage data again, here's what the output looks like:
 717
 718 [pre'''
 719    __________________________________________________
 720    Student t test for two samples (unequal variances)
 721    __________________________________________________
 722
 723    Number of Observations (Sample 1)                      =  249
 724    Sample 1 Mean                                          =  20.145
 725    Sample 1 Standard Deviation                            =  6.4147
 726    Number of Observations (Sample 2)                      =  79
 727    Sample 2 Mean                                          =  30.481
 728    Sample 2 Standard Deviation                            =  6.1077
 729    Degrees of Freedom                                     =  136.87
 730    T Statistic                                            =  -12.946
 731    Probability that difference is due to chance           =  1.571e-025
 732
 733    Results for Alternative Hypothesis and alpha           =  0.0500'''
 734
 735    Alternative Hypothesis              Conclusion
 736    Sample 1 Mean != Sample 2 Mean       NOT REJECTED
 737    Sample 1 Mean <  Sample 2 Mean       NOT REJECTED
 738    Sample 1 Mean >  Sample 2 Mean       REJECTED
 739 ]
 740
 741 This time allowing the variances in the two samples to differ has yielded
 742 a higher likelihood that the observed difference is down to chance alone
 743 (1.571e-025 compared to 5.273e-030 when equal variances were assumed).
 744 However, the conclusion remains the same: US cars are less fuel efficient
 745 than Japanese models.
 746
 747 [endsect]
 748 [section:paired_st Comparing two paired samples with the Student's t distribution]
 749
 750 Imagine that we have a before and after reading for each item in the sample:
 751 for example we might have measured blood pressure before and after administration
 752 of a new drug.  We can't pool the results and compare the means before and after
 753 the change, because each patient will have a different baseline reading.
 754 Instead we calculate the difference between before and after measurements
 755 in each patient, and calculate the mean and standard deviation of the differences.
 756 To test whether a significant change has taken place, we can then test
 757 the null-hypothesis that the true mean is zero using the same procedure
 758 we used in the single sample cases previously discussed.
 759
 760 That means we can:
 761
 762 * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean].
 763 If the endpoints of the interval differ in sign then we are unable to reject
 764 the null-hypothesis that there is no change.
 765 * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the
 766 result is consistent with a true mean of zero, then we are unable to reject the
 767 null-hypothesis that there is no change.
 768 * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need
 769 in order to obtain a significant result].
 770
 771 [endsect]
 772
 773 [endsect][/section:st_eg Student's t]
 774
 775 [/
 776   Copyright 2006, 2012 John Maddock and Paul A. Bristow.
 777   Distributed under the Boost Software License, Version 1.0.
 778   (See accompanying file LICENSE_1_0.txt or copy at
 779   http://www.boost.org/LICENSE_1_0.txt).
 780 ]
 781