
:focal(627x536:628x537)/https://tf-cmsv2-smithsonianmag-media.s3.amazonaws.com/filer/b7/16/b716f299-f842-4133-822d-efd55e2adc3f/1359415822_w640_h640_spot_it_card_game.jpg)
The essence of a linear regression problem is calculating the values of the coefficients using the raw data or, equivalently, the design matrix. This fact, in part, explains the column of 1.0 values in the design matrix. Notice that the leading intercept value (12.0157 in the example) can be considered a coefficient associated with a predictor variable that always has a value of 1. In other words, to make a prediction using linear regression, the predictor values are multiplied by their corresponding coefficient values and summed. The very last part of the output in Figure 1 uses the values of the coefficients to predict the income for a hypothetical person who has an education level of 14 12 years of work experience and whose sex is 0 (male). The second, third and fourth coefficient values (1.0180, 0.5489, -2.9566) are associated with education level, work experience and sex, respectively.

It’s a constant not associated with any predictor variable.

The first value, 12.0157, is usually called the intercept. The coefficients are sometimes called b-values or beta-values. The demo uses a technique that requires a design matrix.Īfter creating the design matrix, the demo program finds the values for four coefficients, (12.0157, 1.0180, 0.5489, -2.9566). There are several different algorithms that can be used for linear regression some can use the raw data matrix while others use a design matrix. A design matrix is just the data matrix with a leading column of all 1.0 values added. In a non-demo scenario, you’d probably read data from a text file using a method named something like MatrixLoad.Īfter generating the synthetic data, the demo program uses the data to create what’s called a design matrix. Income, in thousands of dollars, is in the last column. Sex is an indicator variable where male is the reference value, coded as 0, and female is coded as 1. Work experience is a value between 10 and 30. Education level is a value between 12 and 16. The demo begins by generating 10 synthetic data items. The C# demo program predicts annual income based on education, work and sex. When there are two or more predictor variables, the technique is generally called multiple, or multivariate, linear regression.Ī good way to see where this article is headed is to take a look at the demo program in Figure 1. When there’s just a single predictor variable, the technique is sometimes called simple linear regression. The predictor variables are usually called the independent variables. The variable to predict is usually called the dependent variable. For example, you might want to predict the annual income of a person based on his education level, years of work experience and sex (male = 0, female = 1). The goal of a linear regression problem is to predict the value of a numeric variable based on the values of one or more numeric predictor variables. Volume 30 Number 7 Test Run - Linear Regression Using C#
