Mining Model Operations
This document details various operations you can perform on mining models within SQL Server Analysis Services (SSAS). These operations are crucial for managing, querying, and refining your data mining solutions.
Creating and Deleting Mining Models
Mining models are created within a mining structure. Each model is based on a specific algorithm and targets a particular set of columns from the mining structure.
- Creating a Model: Use SQL Server Data Tools (SSDT) or AMO (Analysis Management Objects) to define the model, select the algorithm, and map columns.
- Deleting a Model: Models can be deleted from SSDT or via AMO when they are no longer needed or if they need to be recreated.
Training Mining Models
Training is the process of building the actual mining model from the data. This involves applying the chosen algorithm to the data defined in the mining structure.
- Process: Full: This operation trains the model from scratch, ignoring any existing processed data.
- Process: Training Data: This operation re-trains the model using only the data that has been added or changed since the last training.
- Process: Data Only: This operation updates the underlying data structures without retraining the model. This is typically used after data updates.
- Process: Clear: This operation removes all processed data from the model, effectively resetting it.
Example: Training using AMO (C#)
using Microsoft.AnalysisServices.Tabular; // For Tabular, adjust for Multidimensional
// Assuming 'server' is an initialized Microsoft.AnalysisServices.Server object
// and 'cube' is a Microsoft.AnalysisServices.Cube or similar object
// For Multidimensional Models:
// var miningModel = cube.MiningModels.FindByName("MyDecisionTreeModel");
// if (miningModel != null)
// {
// miningModel.Process(ProcessType.ProcessFull);
// miningModel.Update();
// }
// For Tabular Models (using C# client libraries, conceptually similar):
// Assuming 'database' is an initialized Microsoft.AnalysisServices.Tabular.Database object
// and 'model' is a Microsoft.AnalysisServices.Tabular.Model object within it
// var tabularModel = database.Models.FindByName("MyTabularModel");
// if (tabularModel != null)
// {
// tabularModel.Process(ProcessType.ProcessFull);
// tabularModel.Update();
// }
// NOTE: The exact syntax depends on whether you are using Multidimensional or Tabular models
// and the specific AMO/TOM library version. This is a conceptual illustration.
Discovering and Querying Mining Models
Once a model is trained, you can query it to extract insights and predictions. Mining queries use the Data Mining Extensions (DMX) language.
- SELECT FROM <model>: Basic query to retrieve model content, structure, or predictions.
- PREDICTION JOIN: Used to generate predictions for new data.
- PATTERN SCHEMA: Retrieves the schema of patterns discovered by the algorithm.
- CONTENT: Retrieves the internal structure or rules of the model.
Example: DMX Query for Predictions
SELECT
Predict([MyClusterModel].[Cluster]) AS Cluster,
[Customer].[CustomerID] AS CustomerID
FROM
[MyClusterModel]
PREDICTION JOIN
OPENROWSET(
'Provider',
'Microsoft.ACE.OLEDB.12.0',
'SELECT * FROM [Excel 12.0;DATABASE=C:\Data\NewCustomers.xlsx]'
) AS T
ON T.[CustomerID] = [MyClusterModel].[CustomerID]
WHERE T.[CustomerID] IS NOT NULL
Validating and Scoring Models
Model validation helps assess the accuracy and reliability of your mining models.
- Lift Charts: Show how much better the model is at predicting outcomes compared to random selection.
- Profit Charts: Evaluate the potential profitability of using the model for predictions.
- ROC Charts: Used for binary classification models to visualize true positive vs. false positive rates.
- Scoring Models: Apply trained models to new data to generate predictions.
Exporting and Importing Models
Models can be exported to share them or migrate them between servers. Importing brings these models into a new Analysis Services instance.
- Export: Typically done via Management Studio (SSMS) or AMO. Models are usually exported as XMLA scripts.
- Import: The reverse of export, using SSMS or AMO to deploy an exported model.
Algorithm-Specific Operations
Different algorithms have unique operations and properties:
- Decision Trees: Explore branch splits, rules.
- Clustering: Analyze cluster characteristics, assign new data to clusters.
- Association Rules: Examine itemsets and rules.
- Sequence Clustering: Understand sequence patterns.
- Linear Regression/Logistic Regression: Examine coefficients, significance.