Logistic Regression
Logistic regression is a statistical method for binary classification that estimates the probability that a given input belongs to a particular class. In Microsoft SQL Server Analysis Services (SSAS) Data Mining, logistic regression is one of the supported mining models.
When to Use
- Predicting yes/no outcomes (e.g., churn, purchase).
- When you need interpretable coefficients.
- Data set contains both numeric and categorical attributes.
Key Concepts
| Term | Description |
|---|---|
| Logit | The natural log of the odds. |
| Odds Ratio | Measure of effect size for each predictor. |
| Maximum Likelihood | Estimation technique used to fit the model. |
Creating a Logistic Regression Model in SSAS
Use the CREATE MINING MODEL statement with the LOGISTIC REGRESSION algorithm.
CREATE MINING MODEL dbo.CustomerChurn
USING Microsoft_LogisticRegression
ON CustomerData
(
TargetColumn = 'Churn',
ModelRetained = 0.9,
PredictUseModel = 'Default'
);
Evaluating the Model
After training, you can evaluate model performance using the SELECT statement with the PREDICT function.
SELECT
CustomerID,
PredictProbability(CustomerChurn, *) AS ChurnProb,
Predict(CustomerChurn, *) AS PredictedChurn
FROM dbo.CustomerData
WHERE PredictProbability(CustomerChurn, *) > 0.5;
Best Practices
- Scale numeric features to improve convergence.
- Encode categorical variables using one‑hot encoding or use SSAS's built‑in handling.
- Regularly evaluate model drift and retrain as needed.
- Use cross‑validation to select the optimal
ModelRetainedvalue.