7. Hypothesis Testing: Practical 7
One-sample t-test
The most basic statistical test uses observations for one sample to test a population mean against a specific value. The R command to conduct this test is t.test()
.
Let's take a look at the documentation.
?t.test
The t.test()
command can perform one- as well as two-sample t-tests. In this chapter, we will focus on one-sample tests. The most important arguments are of the t-test()
function are:
- x: the vector of data values you want to test,
- y: optional second vector of data values if you perform a two sample test,
- alternative: specify if you want a two-sided, left-tailed or right tailed t-test with respectively the arguments "two.sided", "less" and "greater",
- mu: the critical value you want to test your observations against,
- paired: either TRUE or FALSE. The default is FALSE,
- conf.level: confidence level of the interval, the default value is 0.95.
One sample t-test
Let's look at how the t.test()
command works by applying it to the air quality data of Amsterdam.
Test whether the mean NO2 concentration at the Amsterdam-Vondelpark station exceeds the legal standard of #40# (μg/m3). Use a one-sample right-tailed t-test on the mean. You can specify the hypotheses as follows
- #H_0#: #\mu \leq 40#
- #H_a#: #\mu > 40#
Use the separate data frames you created in the previous exercises.
t.test()
function in R with the following arguments:- x: You need the dataframe for the NO2 concentration at the Amsterdam-Vondelpark. Don't forget to specify in which column the concentrations are stored.
x = NO2_vp$value
. - y: this is a one-sample t-test. So there is no second vector.
- alternative: You want to test if the legal standard is exceeded. This is thus a right tailed t-test.
alternative = "greater"
. - mu: the critical value is 40.
mu = 40
. - paired: you don't have paired samples; you can keep the default value (=FALSE).
- conf.level: you can keep the default confidence level of 0.95.
t.test(x = NO2_vp$value, alternative = "greater", mu = 40)
This command results in the following output:
One Sample t-test
data: NO2_vp$value
t = -62.374, df = 1817, p-value = 1
alternative hypothesis: true mean is greater than 40
95 percent confidence interval:
23.84871 Inf
sample estimates:
mean of x
24.26389
The layout is common to many of the standard statistical tests and a "dissection" is given in the following:
One Sample t-test
This should be self-explanatory. It is simply a description of the test that we have asked for. Notice that by omitting the y-argument, t.test() has automatically found out that a one-sample test is desired.
data: NO2_vp$value
This tells us simply which data are being tested.
t = -62.374, df = 1817, p-value = 1
This is where it gets interesting. We get the #t# statistic, the associated degrees of freedom, and the exact #p#-value. We can immediately see that #p > 0.05# and thus that (using #\alpha = 0.05#) the null hypothesis cannot be rejected. In other words, we cannot state that the sample mean is significantly greater than #40#.
alternative hypothesis: true mean is greater than 40
This sentence contains two important pieces of information: 1) the critical value we wanted to test the mean on (#40#) and 2) that this is a right-sided test ("greater than").
95 percent confidence interval:
23.84871 Inf
This is the 95% confidence interval for the true mean; that is, the set of (hypothetical) mean values from which the data do not deviate significantly. (The upper boundary says "Inf" (Infinite) because we're running a right-sided test).
sample estimates:
mean of x
24.26389
Lastly, the estimated mean of the sample is given.
The conclusion we can draw from this one-sample t-test is that we cannot reject the null hypothesis and that the mean NO2 concentration in this sample does thus not exceed the legal standard.