Student's T-Test

Cartoon owl Sage looking at two piles of bones to compare them

Named after William Sealy Gosset, who published under the name "Student" to respect his employer's privacy.

Student's t-test helps us compare two sets of data to see if the difference between their means is just random chance.

There are two main types:

Independent samples t-test: to compare two distinct groups where members of one aren't related to the other. Examples:

Paired samples t-test: when the same participants are in both groups being compared, such as before-and-after observations. Examples:

Examples:

The data should be normally distributed, and isn't reliable if the sample sizes are too small or the variances are very different.

Note: you might also like to investigate the Chi-Square Test.

Independent Samples T-Test

We typically use this formula (but there are other formulas):

t = x1x2√(s12/n1 + s22/n2)

Where:

Note: we use s2 for variance, because the variance is the square of the standard deviation s.

To do the t-test:

In effect the t-value is a measure of how much the means of the two groups differ from each other compared to the variability of the data.

A higher t-value suggests a more significant difference between the groups (ie less likely due to random chance).

Let's try an example:

Two batteries, one in a blue 'cool' box and one in a red 'hot' box

Example: Battery Life

This is an experiment you can do yourself (at the cost of 20 batteries).

Two batches of identical batteries are tested:

  • One group of 10 were tested in a cool environment
  • Another group of 10 were tested in a hot environment

The batteries power identical devices until they run out of charge, with these results (in hours):

  • Cool Group: 85, 90, 76, 88, 93, 87, 91, 89, 95, 85
  • Hot Group: 78, 82, 79, 88, 91, 85, 90, 87, 84, 86

Let's calculate the mean for each:

  • Cool Group Mean = 85+90+76+88+93+87+91+89+95+8510 = 87.9 hours
  • Hot Group Mean = 78+82+79+88+91+85+90+87+84+8610 = 85 hours

The Cool Group lasted, on average, about 3 hours longer.

Thinking face icon

But is this a significant result, or just random in nature?

Now, it's time to calculate the t-statistic.


Our next step is to calculate Variances.

Variance is a measure of how much the data points spread out from the mean. We calculate it by taking the average of the squared differences between each data point, as shown here:

For the Cool Group:

  • n = 10
  • Mean = 87.9
  • Variance = (85-87.9)2+(90-87.9)2+(76-87.9)2+(88-87.9)2+(93-87.9)2+(87-87.9)2+(91-87.9)2+(89-87.9)2+(95-87.9)2+(85-87.9)29 = 27.88

For the Hot Group:

  • n = 10
  • Mean = 85
  • Variance = (78-85)2+(82-85)2+(79-85)2+(88-85)2+(91-85)2+(85-85)2+(90-85)2+(87-85)2+(84-85)2+(86-85)29 = 18.89
Start with:
t = x1x2√(s12/n1 + s22/n2)
Put in our values:
87.9 − 85√(27.88/10 + 18.89/10)
Calculate:
t = 1.341

Now let's compare our t-value to the "critical t-value" from the Student's t-test Table at a chosen significance level, typically 0.05 for a 95% confidence level with degrees of freedom df = n1 + n2 − 2 = 18, and using the two-tails value, as the change in battery life could theoretically go either way.

The table shows us that for 0.05 and df=18 we get a "critical t-value" of 2.101

Our actual t value is 1.341, and being less than 2.101 our result does not pass the 95% confidence level.

So there's not enough evidence to say there's a significant difference between the cool and hot groups.

Hypothesis

The t-test is often done more formally using the idea of a hypothesis and null hypothesis.

A hypothesis is an informed guess about a relationship between variables, that's testable.

Example: Fruits and Vegetables

You want to investigate whether eating more fruits and vegetables reduces the risk of heart disease. Your hypothesis might be:

"Consuming at least 5 servings of fruits and vegetables daily leads to a lower incidence of heart disease among adults."

In this case, your hypothesis is testable because you can collect data on people's fruit and vegetable consumption and compare it to their heart health outcomes.

But we usually take the opposing point of view:

A "Null Hypothesis" is a statement that there's no significant relationship between two variables. It is the "default position" that any observed effects is just due to chance.

The null hypothesis is the baseline assumption for statistical analysis

The goal of statistical analysis is often to provide evidence that lets us reject the null hypothesis.

Rejecting the null hypothesis gives credibility to (but doesn't prove) our original hypothesis. In other words if we can show the skeptic is likely wrong, we can suggest our original idea might be right.

Example continued

The Null Hypothesis might be:

"Consuming at least 5 servings of fruits and vegetables daily does not lead to a significantly lower incidence of heart disease among adults."

If our data does not support the Null Hypothesis then we have evidence for our original hypothesis.

Hypothesis Steps

  1. Formulate the null hypothesis (H0): There's no significant difference between the two groups
  2. Formulate the alternative hypothesis (Ha): There is a significant difference between the two groups
  3. Choose a significance level (commonly 0.05), which is the probability of rejecting the null hypothesis when it is actually true
  4. Collect the data and calculate the t-statistic based on the two sample means, standard deviations, and sample sizes
  5. Compare the calculated t-statistic to the critical value from the t-distribution table to determine whether to accept or reject the null hypothesis

Let's do our original example that way!

Two batteries, one in a blue 'cool' box and one in a red 'hot' box

Example: Battery Life

Null Hypothesis (H0): The difference between a cool and hot environment has no effect on a battery's running time

Alternative Hypothesis (Ha): The difference between a cool and hot environment will affect a battery's running time

Using the values calculated earlier:

From before:
87.9 − 85√(27.88/10 + 18.89/10)
Calculate:
t = 1.341

From the Student's t-test Table for 0.05 (which is a 95% confidence level) and df=18 we get a "critical t-value" of 2.101

Since our calculated t-value of 1.341 is less than the critical t-value of 2.101, we don't have enough evidence to reject the null hypothesis.

We can't reject the null hypothesis

This means that based on our sample, we can't confidently say a statistically significant difference exists.

In everyday terms, it's as if we're saying, "Based on what we've seen, we can't conclude that the change is anything more than just random chance."

Paired Samples T-Test

A Paired samples t-test, also known as a dependent samples t-test, is a statistical test that's used to compare two means (averages) when the data is paired in some way. Such as a before and after trial.

So the same subjects are involved in both sets of data.

Step-by-Step Calculation

To do the Paired Samples t-test:

Example: Test Scores

We have test scores from 7 students before and after they did a special math program. Let's compare the two sets of scores!

Null Hypothesis (H0): The special math program has no significant effect on test score differences

Alternative Hypothesis (Ha): The special math program does affect test scores

  • Results Before: 72, 75, 80, 72, 68, 92, 84
  • Results After: 84, 74, 86, 79, 78, 91, 88

Put that in a table and do some calcs:

Before After Diff (d) (d−
d
)2
72 84 12 45.08
75 74 -1 39.51
80 86 6 0.51
72 79 7 2.94
68 78 10 22.22
92 91 -1 39.51
84 88 4 1.65
Sum: 37 151.42

Mean of Differences (d) = 37/7 = 5.29

Variance (var) = 151.42/(7−1) = 25.24

Standard Deviation (s) = √(25.24) = 5.02

We can calculate the t-value using either one of these formulas (they are mathematically equivalent):

t = ds/√(n) = 5.295.02/√(7) = 2.78

t = d√(var/n) = 5.29√(25.24/7) = 2.78

Now let's compare our t value to the critical t-value in the table.

It is unlikely that the Null Hypothesis is true (less than 5% chance), so we have good reason to believe that the alternative hypothesis "The special math program does affect test scores" is true

One-Tail and Two-Tail T-Tests

Choose a one-tail test if you have a specific hypothesis about the direction of the effect. Use a two-tail test if you're interested in detecting any significant difference, regardless of direction.

Single Set of Data

For one set of data we can compare our sample mean (x) to a known or hypothesized population mean (μ):

t = x − μs/√n
19047, 19048, 19049, 19050, 19051, 19052, 190534, 19054