SQL Server Documentation / Analysis Services / Data Mining / Creating Mining Models

Creating Mining Models

This section guides you through the process of creating mining models in SQL Server Analysis Services (SSAS). Mining models are the core components of data mining solutions, enabling you to uncover patterns, predict future outcomes, and gain insights from your data.

Understanding Mining Model Creation

Creating a mining model involves several key steps:

Selecting an Algorithm: Choose the appropriate algorithm based on the type of problem you are trying to solve (e.g., classification, regression, clustering, association rules).
Defining Input and Predictors: Specify the columns from your data source that will be used as input attributes and those that will be predicted.
Configuring Algorithm Parameters: Adjust the settings for the selected algorithm to optimize its performance and behavior.
Training the Model: Use your data to train the model, allowing the algorithm to learn the underlying patterns.
Evaluating the Model: Assess the accuracy and effectiveness of the trained model using various metrics and viewers.

Steps to Create a Mining Model

You can create mining models using SQL Server Data Tools (SSDT) or programmatically using XMLA or AMO.

Using SQL Server Data Tools (SSDT)

Open your Analysis Services project in SSDT.
In Solution Explorer, right-click the Mining Models folder and select New Mining Model....
The New Mining Model dialog box will appear.
Select Data Source View: Choose the data source view that contains the data for your model.
Select Mining Method: Choose between Creating mining model directly or Creating mining structure and then mining model. It's generally recommended to create a mining structure first.
Click Next.
Select Algorithm: Choose your desired algorithm from the dropdown list.
Click Next.
Select Source Columns: Map the columns from your data source view to the Input, Predict, and other relevant roles for the selected algorithm.
Click Next.
Specify Table Type: If your algorithm supports different table types (e.g., Table, Time Series, Sequence), select the appropriate one.
Click Next.
Algorithm Properties: Configure any specific properties for the chosen algorithm.
Click Finish.

Tip: Understanding the characteristics of each algorithm is crucial for selecting the most suitable one for your analytical task. Refer to the Data Mining Algorithms section for detailed descriptions.

Example: Creating a Decision Tree Model

Let's consider creating a decision tree model to predict customer churn.


-- This is a conceptual example, actual DMX or XMLA would be more complex.
-- The following demonstrates the logic of creating a model.

-- In SSDT, you would visually select the algorithm (e.g., Microsoft Decision Trees)
-- and then map columns like CustomerID (Input), Age (Input), Income (Input),
-- and Churn (Predict).

The process involves defining which attributes are used for prediction and which are used as input variables for the model to learn from.

Configuring Algorithm Properties

Each mining algorithm has a set of properties that can be tuned to influence its behavior. For example, with Decision Trees, you can control the maximum number of nodes, the minimum number of cases per node, and the split criteria. Experimenting with these properties is essential for finding the optimal model configuration.

Important: Overfitting can occur if a model is too complex. Ensure you have sufficient data and set appropriate parameters to prevent the model from memorizing the training data instead of generalizing patterns.

Training and Testing the Model

Once a model is created, it needs to be processed (trained) using your data. After training, you can evaluate its performance using various metrics and visualizations provided by SSAS.

Accuracy: How well the model predicts correct outcomes.
Coverage: The proportion of data points explained by the model.
Lift: The improvement in prediction accuracy compared to random guessing.

You can access the model viewer in SSDT to explore these results visually.