One-sample t-test

7. Hypothesis Testing: Practical 7

One-sample t-test

The most basic statistical test uses observations for one sample to test a population mean against a specific value. The R command to conduct this test is t.test().

Let's take a look at the documentation.

?t.test

The t.test() command can perform one- as well as two-sample t-tests. In this chapter, we will focus on one-sample tests. The most important arguments are of the t-test() function are:

x: the vector of data values you want to test,
y: optional second vector of data values if you perform a two sample test,
alternative: specify if you want a two-sided, left-tailed or right tailed t-test with respectively the arguments "two.sided", "less" and "greater",
mu: the critical value you want to test your observations against,
paired: either TRUE or FALSE. The default is FALSE,
conf.level: confidence level of the interval, the default value is 0.95.

One sample t-test

Let's look at how the t.test() command works by applying it to the air quality data of Amsterdam.

Test whether the mean NO2 concentration at the Amsterdam-Vondelpark station exceeds the legal standard of $40$ (μg/m3). Use a one-sample right-tailed t-test on the mean. You can specify the hypotheses as follows

$H_0$ : $\mu \leq 40$
$H_a$ : $\mu > 40$

Use the separate data frames you created in the previous exercises.

Use the t.test() function in R with the following arguments:

x: You need the dataframe for the NO2 concentration at the Amsterdam-Vondelpark. Don't forget to specify in which column the concentrations are stored. x = NO2_vp$value.
y: this is a one-sample t-test. So there is no second vector.
alternative: You want to test if the legal standard is exceeded. This is thus a right tailed t-test. alternative = "greater".
mu: the critical value is 40. mu = 40.
paired: you don't have paired samples; you can keep the default value (=FALSE).
conf.level: you can keep the default confidence level of 0.95.

t.test(x = NO2_vp$value, alternative = "greater", mu = 40)

This command results in the following output:

	One Sample t-test

data:  NO2_vp$value

t = -62.374, df = 1817, p-value = 1
alternative hypothesis: true mean is greater than 40
95 percent confidence interval:
 23.84871      Inf
sample estimates:
mean of x 
 24.26389

The layout is common to many of the standard statistical tests and a "dissection" is given in the following:

	One Sample t-test

This should be self-explanatory. It is simply a description of the test that we have asked for. Notice that by omitting the y-argument, t.test() has automatically found out that a one-sample test is desired.

data:  NO2_vp$value

This tells us simply which data are being tested.

t = -62.374, df = 1817, p-value = 1

This is where it gets interesting. We get the $t$ statistic, the associated degrees of freedom, and the exact $p$ -value. We can immediately see that $p > 0.05$ and thus that (using $\alpha = 0.05$ ) the null hypothesis cannot be rejected. In other words, we cannot state that the sample mean is significantly greater than $40$ .

alternative hypothesis: true mean is greater than 40

This sentence contains two important pieces of information: 1) the critical value we wanted to test the mean on ( $40$ ) and 2) that this is a right-sided test ("greater than").

95 percent confidence interval:
 23.84871      Inf

This is the 95% confidence interval for the true mean; that is, the set of (hypothetical) mean values from which the data do not deviate significantly. (The upper boundary says "Inf" (Infinite) because we're running a right-sided test).

sample estimates:
mean of x

  24.26389

Lastly, the estimated mean of the sample is given.

The conclusion we can draw from this one-sample t-test is that we cannot reject the null hypothesis and that the mean NO2 concentration in this sample does thus not exceed the legal standard.

New example