This document provides a comprehensive guide to understanding and utilizing prediction probability calculations within SQL Server Analysis Services (SSAS). Prediction probability allows you to assess the likelihood of a particular outcome occurring, based on historical data and predictive models.
Prediction probability is a core concept in data mining and predictive analytics. In SSAS, it's often associated with classification models, where you're predicting whether an instance belongs to a specific class or category. The probability represents the confidence level of the model's prediction.
SSAS provides several ways to retrieve and work with prediction probabilities, depending on the mining model type and the client application.
When querying classification models (like Decision Trees, Naive Bayes, or Logistic Regression), you can use the PREDICTION_PROBABILITY function in DMX.
SELECT
[TargetColumn],
PREDICTION_PROBABILITY([TargetColumn], 'PositiveClassValue') AS ProbabilityOfPositive
FROM
[YourMiningModel]
PREDICTION JOIN
[YourDataSourceView] ON [YourMiningModel].[ForeignKey] = [YourDataSourceView].[ForeignKey]
WHERE
[SomeCondition]
In this DMX query:
[TargetColumn] is the column representing the predicted outcome.'PositiveClassValue' is the specific value of the target column for which you want to calculate the probability (e.g., 'TRUE', '1', 'Yes').[YourMiningModel] is the name of your deployed mining model.[YourDataSourceView] and the PREDICTION JOIN are used to provide new data for prediction.Programmatically, you can use AMO to execute DMX queries and retrieve prediction probabilities.
AMO provides a flexible way to integrate prediction probability calculations into your applications and automation scripts.
While primarily used for OLAP queries, MDX can interact with mining models deployed as part of a cube using the PredictProbability() function. This is less common for direct prediction probability retrieval compared to DMX but can be useful for scenario analysis within a cube context.
SELECT
[SomeDimension].[Level].Members ON COLUMNS,
-- Example of calling predict probability for a specific value
-- Actual syntax might require more context/setup if model is not directly embedded
-- More commonly, prediction results are consumed via DMX.
[Measures].[YourPredictionMeasure] * Measures.Value AS [Probability Example]
FROM
[YourCube]
WHERE
[YourTimeDimension].[Year].&[2023]
The interpretation of probability scores is crucial for making informed decisions.
Setting an appropriate threshold is a business decision that balances the costs of false positives and false negatives.
Example: In a credit scoring model, a high threshold for approving loans might minimize risk (fewer defaults) but also reject potentially good customers. A lower threshold might increase loan volume but also increase the risk of defaults.
You can implement threshold logic in your client application or by further processing the prediction results from SSAS.