Student's T-Test
(Named after William Sealy Gosset, who published under the name "Student" to respect his employer's privacy.)
Student's t-test helps us compare two sets of data to see if the difference between their means is just random chance.
There are two main types:
- Independent Samples T-Test: to compare two distinct groups where members of one are not related to the other.
- Paired Samples T-Test: when the same participants are in both groups being compared, such as before-and-after observations.
The data should be normally distributed, and is not reliable if the sample sizes are too small or the variances are very different.
Note: you migh also like to investigate the Chi-Square Test.
Independent Samples T-Test
We typically use this formula (but there are other formulas):
Where:
- 1 and 2 are sample means
- s12 and s22 are sample variances
- n1 and n2 are sample sizes
Note: we use s2 for variance, because the variance is the square of the standard deviation s.
To do the t-test:
- Calculate mean, variance and samples size for each set of sample data
- Use the formula above to calculate "t"
- Then we look up a special table to find out how significant that "t" value is
In effect the t-value is a measure of how much the means of the two groups differ from each other compared to the variability of the data.
A higher t-value suggests a more significant difference between the groups (ie less likely due to random chance).
Let's try an example:
Example: Battery Life
This is an experiment you can do yourself (at the cost of 20 batteries).
Two batches of identical batteries are tested:
- One group of 10 were tested in a cool environment
- Another group of 10 were tested in a hot environment
The batteries power identical devices until they run out of charge, with these results (in hours):
- Cool Group: 85, 90, 76, 88, 93, 87, 91, 89, 95, 85
- Hot Group: 78, 82, 79, 88, 91, 85, 90, 87, 84, 86
Let's calculate the mean for each:
- Cool Group Mean = 85+90+76+88+93+87+91+89+95+8510 = 87.9 hours
- Hot Group Mean = 78+82+79+88+91+85+90+87+84+8610 = 85 hours
The Cool Group lasted, on average, about 3 hours longer.
But is this a significant result, or just random in nature?
Now, it's time to calculate the t-statistic.
Our next step is to calculate Variances.
Variance is a measure of how much the data points spread out from the mean. We calculate it by taking the average of the squared differences between each data point, as shown here:
For the Cool Group:
- n = 10
- Mean = 87.9
- Variance = (85-87.9)2+(90-87.9)2+(76-87.9)2+(88-87.9)2+(93-87.9)2+(87-87.9)2+(91-87.9)2+(89-87.9)2+(95-87.9)2+(85-87.9)29 = 27.88
For the Hot Group:
- n = 10
- Mean = 85
- Variance = (78-85)2+(82-85)2+(79-85)2+(88-85)2+(91-85)2+(85-85)2+(90-85)2+(87-85)2+(84-85)2+(86-85)29 = 18.89
Now let's compare our t-value to the "critical t-value" from the Student's t-test Table at a chosen significance level, typically 0.05 for a 95% confidence level with degrees of freedom df = n1 + n2 − 2 = 18, and using the two-tails value, as the change in battery life could theoretically go either way.
The table shows us that for 0.05 and df=18 we get a "critical t-value" of 2.101
Our actual t value is 1.341, and being less than 2.101 our result does not pass the 95% confidence level.
So there is not enough evidence to say there is a significant difference between the cool and hot groups.
Hypothesis
The t-test is often done more formally using the idea of a hypothesis and null hypothesis.
Example: Fruits and Vegetables
You want to investigate whether eating more fruits and vegetables reduces the risk of heart disease. Your hypothesis might be:
"Consuming at least 5 servings of fruits and vegetables daily leads to a lower incidence of heart disease among adults."
In this case, your hypothesis is testable because you can collect data on people's fruit and vegetable consumption and compare it to their heart health outcomes.
But we usually take the opposing point of view:
The null hypothesis is the baseline assumption for statistical analysis
The goal of statistical analysis is often to provide evidence that lets us reject the null hypothesis.
Rejecting the null hypothesis gives credibility to (but does not prove) our original hypothesis. In other words if we can show the skeptic is likely wrong, we can suggest our original idea might be right.
Example continued
The Null Hypothesis might be:
"Consuming at least 5 servings of fruits and vegetables daily does not lead to a significantly lower incidence of heart disease among adults."
If our data does not support the Null Hypothesis then we have evidence for our original hypothesis.
Hypothesis Steps
- Formulate the null hypothesis (H0): There is no significant difference between the two groups.
- Formulate the alternative hypothesis (Ha): There is a significant difference between the two groups.
- Choose a significance level (commonly 0.05), which is the probability of rejecting the null hypothesis when it is actually true.
- Collect the data and calculate the t-statistic based on the two sample means, standard deviations, and sample sizes.
- Compare the calculated t-statistic to the critical value from the t-distribution table to determine whether to accept or reject the null hypothesis.
Let's do our original example that way!
Example: Battery Life
Null Hypothesis (H0): The difference between a cool and hot environment has no effect on a battery's running time
Alternative Hypothesis (Ha): The difference between a cool and hot environment will affect a battery's running time
Using the values calculated earlier:
From the Student's t-test Table for 0.05 (which is a 95% confidence level) and df=18 we get a "critical t-value" of 2.101
Since our calculated t-value of 1.341 is less than the critical t-value of 2.101, we do not have enough evidence to reject the null hypothesis.
We cannot reject the null hypothesis
This means that based on our sample, we cannot confidently say a statistically significant difference exists.
In everyday terms, it's as if we're saying, "Based on what we've seen, we can't conclude that the change is anything more than just random chance."
Paired Samples T-Test
A Paired samples t-test, also known as a dependent samples t-test, is a statistical test that is used to compare two means (averages) when the data is paired in some way. Such as a before and after trial.
So the same subjects are involved in both sets of data.
Step-by-Step Calculation
To do the Paired Samples t-test:
- Calculate the difference (d) for each pair of scores
- Calculate the mean difference ( )
- Calculate the variance of the differences (var)
- Calculate the standard deviation of the differences (s)
- Calculate the t-statistic.
- Determine the degrees of freedom (df)
- Compare the calculated t-statistic to the critical value from the t-distribution table to determine whether to accept or reject the null hypothesis.
Example: Test Scores
We have test scores from 7 students before and after they did a special math program. Let's compare the two sets of scores!
Null Hypothesis (H0): The special math program has no significant effect on test score differences
Alternative Hypothesis (Ha): The special math program does affect test scores
- Results Before: 72, 75, 80, 72, 68, 92, 84
- Results After: 84, 74, 86, 79, 78, 91, 88
Put that in a table and do some calcs:
Before | After | Diff (d) | (d− | )2
---|---|---|---|
72 | 84 | 12 | 45.08 |
75 | 74 | -1 | 39.51 |
80 | 86 | 6 | 0.51 |
72 | 79 | 7 | 2.94 |
68 | 78 | 10 | 22.22 |
92 | 91 | -1 | 39.51 |
84 | 88 | 4 | 1.65 |
Sum: | 37 | 151.42 |
Mean of Differences (
) = 37/7 = 5.29Variance (var) = 151.42/(7−1) = 25.24
Standard Deviation (s) = √(25.24) = 5.02
We can calculate the t-value using either one of these formulas (they are mathematically equivalent):
t = = s/√(n)5.295.02/√(7) = 2.78
t = = √(var/n)5.29√(25.24/7) = 2.78
Now let's compare our t value to the critical t-value in the table.
- The degrees of freedom (df) for a paired t-test is n − 1: df = 7−1 = 6
- For a two-tailed test with df=6 at the 0.05 level, we find the critical t-value is 2.447
- Our value of 2.78 is above that, so the Null Hypothesis fails
It is unlikely that the Null Hypothesis is true (less than 5% chance), so we have good reason to believe that the alternative hypothesis "The special math program does affect test scores" is true
One-Tail and Two-Tail T-Tests
- One-Tail T-Test: Tests for a significant effect in one specific direction (either greater than or less than).
- Two-Tail T-Test: Tests for a significant effect in both directions (either greater or less than), without committing to one specific direction before the test.
Choose a one-tail test if you have a specific hypothesis about the direction of the effect. Use a two-tail test if you're interested in detecting any significant difference, regardless of direction.
Single Set of Data
For one set of data we can calculate a t value like this: