Tags

, , , , , ,

T-Test Concept

A t-test, also referred as the “Student T-test”, is a statistical hypothesis test in which researchers want to answer simple questions about one or two means for normally distributed variables such as weight, height, exam result in a subject etc. Generally, there are two most commonly applied t-tests i.e. one sample and two (paired) sample t-tests.

While one sample t-test is used to compare a sample mean with the known population mean (such as a researcher wants to determine whether a single variable sample mean is different from some known standard e.g. an established population mean), the Two-Samples t-test or paired-sample t-test, on the other hand, is used to compare either independent samples or dependent samples.

In this tutorial, I will explain the concept of one variable t-test and in next tutorial, the two-sample t-test.

Consider the following example:

A researcher is interested to determine whether students of Indian Institute of Technology (IIT) score better in GMAT for admission to Stanford University as compare to the average mean of scores of all the students appearing for the GMAT.

To test that she collects a small sample (n = 30) data of scores of students from IITs who cleared CAT and also she has access to the mean score of the population which, suppose, is 650 out of 800.

Following table represents the data of scores of IIT students:

Student GMAT Score Student GMAT Score
1 724 16 715
2 747 17 737
3 684 18 648
4 697 19 686
5 646 20 666
6 645 21 705
7 641 22 704
8 679 23 693
9 675 24 748
10 685 25 648
11 725 26 704
12 644 27 741
13 667 28 667
14 710 29 712
15 733 30 653

 

W.S. Gossett, under the pen name “Student”, first introduced t-statistics and hence sometimes it is also referred as the Student’s t-test. Single sample t-test is calculated using the following formula:

p1

Where, Ms = Mean of the Sample

μ = Population Mean

εs = Standard Error of the Mean

Here, Standard Error of Mean is given by:

p2

Where σx is standard deviation in sample and N is the size of sample.

It can clearly be noted that when sample size increase, the error of mean decreases.

Now let us use R for applying T-Test

######################## R Code ##############

# I have stored all the data (GMAT Score), as above, in csv file which I named as “gmat

# Reading that csv in R environment

gmat<- read.csv(“gmat.csv”, header=TRUE)

#box plot to visualize the data

boxplot(gmat, ylab=”GMAT scores”, xlab=”IIT Students”)

gmat

 

#mean and SD of the sample
mean(gmat$GMATScore)

> 690.9667

sd(gmat$GMATScore)

> 33.85617

# Single sample t-test using R’s function t.test()

t.test(gmat)

# Returns the following result……

# One Sample t-test
# data: gmat

# t = 111.7841, df = 29, p-value < 2.2e-16
# alternative hypothesis: true mean is not equal to 0

# 95 percent confidence interval:
# 678.3246 703.6088

# sample estimates:
# mean of x = 690.9667

————————————————————————————————————————————————————–

The critical value of t with 29 (degree of freedom) df (p < .05) is 111.78. Thus, based on this result we can conclude that this difference is statistically significant.

Interpretation:

Researcher’s hypothesis was confirmed. She found a significant difference between GMAT Scores from those students who were from IITs and the overall.

 

Advertisements