# Simple Linear Regression Tutorial

### From Socr

(→SOCR_EduMaterials_AnalysesActivities - Simple Linear Regression Tutorial) |
|||

Line 28: | Line 28: | ||

Assumption 2: The variance is constant | Assumption 2: The variance is constant | ||

- | * How to check: Look at plot of residuals vs. predicted values | + | * How to check: Look at plot of residuals vs. predicted values. Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase. |

* How to fix: Logging of variables, fixing underlying independence or linearity causes. | * How to fix: Logging of variables, fixing underlying independence or linearity causes. | ||

'''''Slight increase of residuals at the high end of age''''' | '''''Slight increase of residuals at the high end of age''''' |

## Revision as of 07:02, 24 July 2011

## SOCR_EduMaterials_AnalysesActivities - Simple Linear Regression Tutorial

**Simple Linear Regression Tutorial Using LA Neighborhoods Data**

**Data:** We will be using the LA Neighborhoods Data for this tutorial.

**Goal:** Our goal is to predict the median income using one explanatory variable by using SOCR. In this example, we will use the age variable.

*Step 1:* First, we will import the data into the SOCR Simple Regression Analysis Activity. Head to http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data#Data_Source and find the table with the data. Select all of the data, and press Ctrl+C (Command+C on Macs) to copy it.

*Step 2:*Next, head to http://socr.ucla.edu/htmls/SOCR_Analyses.html, and find the Simple Regression Analysis Activity in the drop-down menu.

*Step 3:*Now Click the “PASTE” button under the drop down menu. You should now see the data in the window.

*Step 4:*Click on the “MAPPING” tab, and add Income to the dependent variable list and Age to the independent variable list.

*Step 5:*Click “CALCULATE”. You will now be taken to the “RESULTS” tab. Here you can see the regression equation,

*R*

^{2}, individual residuals, and also mean and standard deviation for both variables.

*Step 6:*Click “GRAPH”. Here is the scatterplot of Income vs Age. We see the upward trend: As median age increases, so does median household income . There are also residual plots and the Normal-QQ Plot.

*Step 7:* We want to check that the assumptions of linear regression, and make sure that they are met.

Assumption 1: There is a linear relationship between the independent (age) and dependent variable (income)

- How to check: Make a scatter plot of income and age
- How to fix: Transformations (for example Log(y) vs x), or the relationship is not linear.

**Assumption Met**

Assumption 2: The variance is constant

- How to check: Look at plot of residuals vs. predicted values. Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase.
- How to fix: Logging of variables, fixing underlying independence or linearity causes.

**Slight increase of residuals at the high end of age**

Assumption 3: Errors are normally distributed

- How to check: Normal QQ Plot (Should lie close to straight line)
- How to fix: Take out outliers, if applicable. Non-linear transformation may be needed

**Assumption Met**

**Conclusions**

No major violation of linear regression assumptions, we proceed with our analysis:

We can see from the results tab that the regression equation is:Income = -74549.596 + 4096.055 age

Income is the predicted value, -74549.596 is the intercept, 4096.055 is the slope, and age is the independent variable.

**The linear model states that for every 1 year increase in median age, the median household income will increase by $4,096.06.**