SQL Server Analysis Services Documentation

Comprehensive resources for developers and administrators.

Category: Data Mining Algorithms Last Updated: October 26, 2023

Linear Regression Algorithm

The Linear Regression algorithm in SQL Server Analysis Services (SSAS) is used for data mining to build a predictive model. It identifies the relationship between a dependent variable and one or more independent variables, assuming a linear correlation. This algorithm is particularly useful for forecasting and understanding how changes in predictor variables affect an outcome.

Overview

Linear regression models a relationship between a dependent variable and one or more explanatory variables by fitting a linear equation to the observed data. The equation takes the form:

Y = b0 + b1*X1 + b2*X2 + ... + bn*Xn + e

Where:

The algorithm aims to minimize the sum of the squared differences between the observed and predicted values of the dependent variable.

When to Use Linear Regression

Key Concepts and Terminology

Parameters

The Linear Regression algorithm in SSAS has several configurable parameters:

Parameter Description Default Value Allowed Values
ORDER Specifies the order of the polynomial to use for regression. For standard linear regression, this should be 1. 1 Integer >= 1
MAX_INPUT_ATTRIBUTES The maximum number of input attributes that can be used in the model. 100 Integer >= 0
MAX_OUTPUT_ATTRIBUTES The maximum number of output attributes that can be used in the model. 100 Integer >= 0
COMPUTE_PROBABILITY When set to TRUE, the algorithm computes the probability for each prediction. FALSE TRUE, FALSE
ENABLE_HIERARCHY_VOTING Enables or disables hierarchy voting. Not typically relevant for linear regression. FALSE TRUE, FALSE

Example Usage

Imagine you want to predict the price of a house based on its size (square footage) and the number of bedrooms. The Linear Regression algorithm can help model this relationship.

Scenario: Predicting House Prices

Dependent Variable: Price (continuous numerical)

Independent Variables:

  • SquareFootage (continuous numerical)
  • NumberOfBedrooms (discrete numerical)

After training the model with historical house sales data, the algorithm might produce a model with an equation like:

Price = 50000 + 150 * SquareFootage + 10000 * NumberOfBedrooms

This equation suggests that for every additional square foot, the price increases by $150, and each additional bedroom adds $10,000 to the price, assuming the base intercept of $50,000.

DMX Query Example (Conceptual)

A Data Mining eXpressions (DMX) query to predict a house price:

Predicting a house price using DMX
DMX
SELECT
    [House Price Regressor].[Price] AS PredictedPrice,
    [House Price Regressor].Query([SquareFootage], [NumberOfBedrooms]) AS PredictionDetails
FROM
    [House Price Model]
PREDICTION JOIN
    (SELECT 1500 AS [SquareFootage], 3 AS [NumberOfBedrooms]) AS InputData
ON
    [House Price Model].[SquareFootage] = InputData.[SquareFootage]
    AND [House Price Model].[NumberOfBedrooms] = InputData.[NumberOfBedrooms]

Note: This is a simplified DMX example. Actual syntax might vary based on model structure.

Pros and Cons

Pros:

Cons:

Related Algorithms