For this assignment purpose I am going to user the following dataset:

ENROLL UNEMPRATE
5501 8.1
5945 7
6629 7.3
7556 7.5
8716 7
9369 6.4
9920 6.5
10167 6.4
11084 6.3
12504 7.7
13746 8.2
13656 7.5
13850 7.4
14145 8.2
14888 10.1
14991 9.2
14836 7.7
14478 5.7
14539 6.5
14395 7.5
14599 7.3
14969 9.2
15107 10.1
14831 7.5
15081 8.8
15127 9.1
15856 8.8
15938 7.8
16081 7

The above data is available to me on csv format and I will use the same.

Before we begin, you may want to know what this data is all about. Well, the data is yearly total enrollments of students at a university in various courses (each row, denoted by header ENROLL in column 1) and Unemployment Rate in the country in that particular year (denoted by header UNEMPRATE in column 2).

I am assuming that this data (above) has already been read into a vector and has been attached into memory. I will run the sample code. Then I will discuss about the results.

Reading Data

# Read data from csv into a variable called 'data'
data <- read.csv("DataSetW.csv")
# Attach data variable
attach(data)

# Display the data summary
summary(data)
##       YEAR        ENROLL        UNEMPRATE     
##  Min.   : 1   Min.   : 5501   Min.   : 5.700  
##  1st Qu.: 8   1st Qu.:10167   1st Qu.: 7.000  
##  Median :15   Median :14395   Median : 7.500  
##  Mean   :15   Mean   :12707   Mean   : 7.717  
##  3rd Qu.:22   3rd Qu.:14969   3rd Qu.: 8.200  
##  Max.   :29   Max.   :16081   Max.   :10.100

Building Simple Linear Regression Model

# Predict the response variable Enrollments (ENROLL) using the predictor variable 
# unemployment rate (UNEMPRATE)

SLM_ENROLL <- lm(ENROLL ~ UNEMPRATE, data = data)

# Displaying The Linear Model
SLM_ENROLL
## 
## Call:
## lm(formula = ENROLL ~ UNEMPRATE, data = data)
## 
# Parameter Estimate
## Coefficients:
## (Intercept)    UNEMPRATE  
##        3957         1134

 

Summarizing the Model

# Display information about the linear model
summary(SLM_ENROLL)
## 
## Call:
## lm(formula = ENROLL ~ UNEMPRATE, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7640.0 -1046.5   602.8  1934.3  4187.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   3957.0     4000.1   0.989   0.3313  
## UNEMPRATE     1133.8      513.1   2.210   0.0358 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3049 on 27 degrees of freedom
## Multiple R-squared:  0.1531, Adjusted R-squared:  0.1218 
## F-statistic: 4.883 on 1 and 27 DF,  p-value: 0.03579

Discussion

    The results of the linear regression model indicated that 
    Unemployement rate (Beta=1134, p=0.0358) 
    was significantly and positively associated with number of enrollments at the university.
    
    But low value of R-square also suggests that the distribution is highly scattered (high deviation) 
    around the regression line.
    

Sample Prediction

# What is the expected enrollment (ENROLL) for a given year's 
# unemployment rate (UNEMPRATE)? 
# Year = 2015
# UNEMPRATE = 10%

# Then, ENROLL is:
3957 + 1134 * 10
## [1] 15297

Prediction Result Summary

The predicted enrollment at the university, for a given 10% 
unemployment rate, is 15,297 students in 2015.

 

 

 

Advertisements