# SOCR EduMaterials AnalysisActivities MLR

### From Socr

m (→See also) |
|||

Line 83: | Line 83: | ||

==See also== | ==See also== | ||

- | [[SOCR_EduMaterials_AnalysesCommandLineVolumeMultipleRegression|Command-line based multiple linear regression execution]] | + | [[SOCR_EduMaterials_AnalysesCommandLineVolumeMultipleRegression|Command-line based multiple linear regression execution]]. |

<hr> | <hr> | ||

{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_EduMaterials_AnalysisActivities_MLR}} | {{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_EduMaterials_AnalysisActivities_MLR}} |

## Current revision as of 17:00, 24 September 2012

## Contents |

## SOCR Analysis Example on Multiple Linear Regression

### This SOCR Activity demonstrates the utilization of the SOCR Analyses package for statistical Computing. In particular, it shows how to use Multiple Linear Regression and how to read the output results.

## Multiple Linear Regression Background

Multiple Linear Regression is a class of statistical analysis models and procedures, which takes one independent variable and one dependent and one or more variable, both sets being quantitative, and models the relationship between them. SOCR has another activity set for Simple Linear Regression, which only allows on independent variable in the input. However, SOCR Multiple Linear Regression allows one or more independent variables. In the linear model, the error is assumed to follow a standard normal distribution.

The goal of the Multiple Linear Regression computing procedure is to estimate all of the coefficients based on the data. Least Squares Fitting is used.

In this activity, the students can learn about:

- Reading results of Simple Linear Regression;
- Making interpretation of the coefficients;
- Observing and interpreting various data and resulting plots
- Scatter plots of the dependent vs. independent variables
- Diagnostic plots such as the Residual on Fit plot
- Normal QQ plot, etc.

## SOCR Multiple Linear Regression Data Input

Go to SOCR Analyses and select **Multiple Linear Regression** from the drop-down list of SOCR analyses, in the left panel. There are three ways to enter data in the SOCR Multiple Linear Regression applet:

- Click on the
**Example**button on the top of the right panel. - Generate random data by clicking on the
**Random Example**button. - Paste your own data from a spreadsheet into SOCR Multiple Linear Regression data table.

## SOCR Multiple Linear Regression Example

We will demonstrate Multiple Linear Regression with some SOCR built-in example. This example is based on a dataset from the statistical program "R." For more information of the R program, please see CRAN Home Page. The dataset used here is "hills" under R's "MASS" library. The dataset describe the record times in 1984 for 35 Scottish hill races. There are three variables: **dist** for distance in miles, **climb** total height gained during the route, in feet, and **time** record time in minutes. In our example, we will use **time** as the dependent variable, and **climb** and **dist** as the independent variables.

- As you start the SOCR Analyses Applet, click on "
**Multiple Linear Regression**" from the combo box in the left panel. Here's what the screen should look like.

- The left part of the panel looks like this (make sure that the "Multiple Linear Regression" is showing in the drop-down list of analyses, otherwise you won't be able to find the correct dataset and will not be able to reproduce the results!)

- in the SOCR MLR analysis, there are several SOCR built-in examples. In this activity, we'll be using
**Example 4**. Click on the "**Example 4**" button and next, click on the "**Data**" button in the right panel. You should see the data displayed in two columns. There are three columns here,**dist**,**climb**and**time**.

- Use column
**dist**and**climb**as the regressors (independent variables) and column**time**as the response (dependent variable). To tell the computer which variables are assigned to be the regressor and response, we have to do a "Mapping." This is done by clicking on the "**Mapping**" button first to get to the Mapping Panel, and then map the variables. For this Multiple Linear Regression activity, there are two places the variables can be mapped to. The top part says**DEPENDENT**that you'll need to**map**the dependent variable you want here. Just click on**ADD**under**DEPENDENT**and that will do it. If you change your mind, you can click on**REMOVE**. Similar for the**INDEPENDENT**variable. Once you get the screen to look like the screenshot below, you're done with the**Mapping**step. (Note that, since the columns C4 through C16 do not have data and they are not used, just ignore them.)

- After we do the "Mapping" to assign variables, now we use the computer to calculate the regression results -- click on the "
**Calculate**" button. Then select the "**Result**" panel to see the output. For each of the coefficients,**Estimate**stands for the estimated parameter value, followed by its**Standard Error**,**T-Value**and**P-Value**.

The text in the Result Panel summarizes the results of this simple linear regression analysis. The regression line is displayed. At this point, you can think about how the **dependent variable** changes, on average, in response to changes of the **independent variable**.

- If you'd like to see graphical component of this analysis, click on the "
**Graph**" panel. You'll then see the graph panel that displays scatter plot, as well as diagnostic plots of "residual on fit", "Normal QQ" plots, etc. The plot titles indicate plot types.

**Note**: If you happen to click on the "**Clear**" button in the middle of the procedure, **all the data will be cleared out**. Simply start over from step 1 and click on an **EXAMPLE** button for the data you want.

## Assumptions

The SOCR MLR analysis implements the General Linear Model (GLM) and does *not* require normality. Only the 2 assumptions listed are:

- The design matrix X must have full column rank, otherwise the parameter vector β will not be identified — at most we will be able to narrow down its value to some linear subspace of
*R*^{p}. For this property to hold, we must have n > p, where n is the sample size, and p is the column rank. Methods for fitting. - The regressors
*X*_{i}are assumed to be error-free, that is they are not contaminated with measurement errors. Although not realistic in many settings, dropping this assumption leads to significantly more difficult errors-in-variables models.

## See also

Command-line based multiple linear regression execution.

Translate this page: