Linear Regression and Correlation

Abstract

This lesson is designed to introduce students to correlation between two variables and the line of best fit.

These activities can be done individually or in groups of as many as four students. Allow 1.5-2 hours of class time for the entire lesson if all portions are done in class.

Objectives

Upon completion of this lesson, students will:

  • have plotted bivariate data onto a scatter plot
  • have seen the line of best fit for several different scatter plots
  • be able to estimate the lines of best fit for data sets
  • be able to estimate the correlation coefficient for data sets

Standards Addressed

Grade 6

  • Statistics and Probability

    • The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating; drawing or justifying conclusions).

Grade 7

  • Statistics and Probability

    • The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions; drawing or justifying conclusions).

Grade 8

  • Statistics and Probability

    • The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions, describing trends; drawing, formulating, or justifying conclusions).

Grade 9

  • Statistics and Probability

    • The student demonstrates an ability to classify and organize data.
    • The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions, describing trends; drawing, formulating, or justifying conclusions).

Grade 10

  • Statistics and Probability

    • The student demonstrates an ability to classify and organize data.
    • The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions, describing trends; drawing, formulating, or justifying conclusions).

Statistics and Probability

  • Interpreting Categorical and Quantitative Data

    • Summarize, represent, and interpret data on two categorical and quantitative variables
    • Interpret linear models

Grades 9-12

  • Algebra

    • Understand patterns, relations, and functions
  • Data Analysis and Probability

    • Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them
    • Select and use appropriate statistical methods to analyze data

Algebra I

  • Data Analysis and Probability

    • Competency Goal 3: The learner will collect, organize, and interpret data with matrices and linear models to solve problems.

Student Prerequisites

  • Arithmetic: Students must be able to:
    • plot points on the Cartesian coordinate system
  • Statistics: Students must be able to:
    • have a very basic understanding of correlation
  • Technological: Students must be able to:
    • perform basic mouse manipulations such as point, click and drag
    • use a browser for experimenting with the activities

Teacher Preparation

Students will need:

  • Access to a browser
  • Scatter Plot Exploration Questions
  • Graph paper and pencil

Key Terms

correlation

A statistical measure referring to the relationship between two random variables. It is a positive correlation when each variable tends to increase or decrease as the other does, and a negative or inverse correlation if one tends to increase as the other decreases.

correlation coefficient

A numerical value (between +1 and -1) that identifies the strength of the linear relationship between variables. A value of +1 indicates an exact positive relationship, -1 indicates an exact inverse relationship, and 0 indicates no predictable relationship between the variables.

line of best fit

A straight line used as a best approximation of a summary of all the points in a scatter-plot. The position and slope of the line are determined by the amount of correlation between the two, paired variables involved in generating the scatter-plot. This line can be used to make predictions about the value of one of the paired variables if only the other value in the pair is known.

linear regression

An attempt to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered as the independent variable, and the other is considered as the dependent variable.

residual

The observed value minus the predicted value. It is the difference of the results obtained by observation, and by computation from a formula.

scatter plot

A graphical representation of the distribution of two random variables as a set of points whose coordinates represent their observed paired values.

slope of a linear function

The slope of the line y = mx + b is the rate at which y is changing per unit of change in x. The units of measurement of the slope are units of y per unit of x (cf. Linear Functions Discussion).

Lesson Outline

  1. Focus and Review

    Review with the class the concept of correlation. Have the students begin to think about the words and ideas of this lesson:

    • What are two variables that have no correlation with one another? Can anyone give me an example of two variables that have some sort of correlation with one another? Is this a positive or a negative correlation?

  2. Objectives

    Let the students know what it is that they will be doing and learning today. Say something like this:

    • Today, class, we are going to learn more about correlation between two variables and be introduced to the line of best fit.
    • We are going to use the computers to learn more about correlation, but please do not turn your computers on until I ask you to. I want to show you a little about this activity first.

  3. Teacher Input

    • Lead a discussion on correlation of variables and the purpose of the line of best fit.
    • Lead a discussion on the correlation coefficient, r, and how it varies depending on the relationship of the data on the scatter plot.

  4. Guided Practice

    As a class complete the Scatter Plot Exploration Questions. Have the students draw a scatter plot of the class data on a sheet of graph paper. Ask the class where they predict the line of best fit will lie and what they think the correlation coefficient is. Together, graph this data using the Regression activity, look at the actual results, and compare these findings with your predictions.

  5. Independent Practice

    Have the students use the Regression activity to estimate the line of best fit for their own data sets and then see where the line of best fit actually lies. Encourage them to experiment with data sets that include outliers. Also, have the students experiment with creating scatter plots that will have a specific correlation coefficient.

  6. Closure

    You may wish to bring the class back together for a discussion on the findings. Once the students have been allowed to share what they have found, summarize the results of the lesson.

Alternate Outline

This lesson can be rearranged in several ways.

  • omit the discussion of the correlation coefficient
  • omit the scatter plot worksheet
  • As a class, before splitting them into groups, have the students plot specific points on the Regression activity and have each of them draw the line of best fit that they imagine. Then, have them select the true line of best fit and see who had the closest estimation.