This document guides you through the process of using data mining models created in SQL Server Analysis Services (SSAS) to generate predictions for new data.
Prediction in SSAS involves applying a trained data mining model to new, unseen data to forecast outcomes or classify instances. This is crucial for various business applications, such as customer churn prediction, sales forecasting, and fraud detection.
The primary interface for generating predictions is typically through Data Mining Extensions (DMX) queries or by using the prediction functionalities within SQL Server Management Studio (SSMS) or client applications.
DMX is a query language specifically designed for working with SSAS data mining models. You can use DMX statements to retrieve predictions.
Singleton queries are used to generate predictions for a single case (a single row of data) at a time. This is often used for real-time predictions.
Example DMX for a clustering model:
SELECT
[Cluster Probability]
FROM
[MyClusteringModel].Cluster(
NEW_LIST(
[Attribute1] = 'Value1',
[Attribute2] = 123
)
)
Batch queries are used to generate predictions for multiple cases from a specified data source, often another table or query within SSAS or SQL Server.
Example DMX for a classification model:
SELECT
[Source].[CustomerID],
[MyClassificationModel].Predict([AttributeA], [AttributeB]) AS PredictedClass
FROM
[MyDataSourceView].[CustomerData] AS [Source]
SSMS provides a user-friendly interface to perform predictions:
Predict(): The most general function for prediction.PredictNode(): Returns the node ID where the prediction was made.PredictProbability(): Returns the probability associated with a prediction.PredictDiscernibleColumn(): Returns the value of a discernible column for a predicted case.PredictCaseLikelihood(): Returns the likelihood of a case belonging to a specific cluster.Predictions can be returned directly as query results, saved to a table, or used in reports and applications. When performing batch predictions, you can select specific prediction columns to include in your output.