Creating Mining Models

This section guides you through the process of creating mining models in SQL Server Analysis Services (SSAS). Mining models are the core components of data mining solutions, enabling you to uncover patterns, predict future outcomes, and gain insights from your data.

Understanding Mining Model Creation

Creating a mining model involves several key steps:

Steps to Create a Mining Model

You can create mining models using SQL Server Data Tools (SSDT) or programmatically using XMLA or AMO.

Using SQL Server Data Tools (SSDT)

  1. Open your Analysis Services project in SSDT.
  2. In Solution Explorer, right-click the Mining Models folder and select New Mining Model....
  3. The New Mining Model dialog box will appear.
  4. Select Data Source View: Choose the data source view that contains the data for your model.
  5. Select Mining Method: Choose between Creating mining model directly or Creating mining structure and then mining model. It's generally recommended to create a mining structure first.
  6. Click Next.
  7. Select Algorithm: Choose your desired algorithm from the dropdown list.
  8. Click Next.
  9. Select Source Columns: Map the columns from your data source view to the Input, Predict, and other relevant roles for the selected algorithm.
  10. Click Next.
  11. Specify Table Type: If your algorithm supports different table types (e.g., Table, Time Series, Sequence), select the appropriate one.
  12. Click Next.
  13. Algorithm Properties: Configure any specific properties for the chosen algorithm.
  14. Click Finish.
Tip: Understanding the characteristics of each algorithm is crucial for selecting the most suitable one for your analytical task. Refer to the Data Mining Algorithms section for detailed descriptions.

Example: Creating a Decision Tree Model

Let's consider creating a decision tree model to predict customer churn.


-- This is a conceptual example, actual DMX or XMLA would be more complex.
-- The following demonstrates the logic of creating a model.

-- In SSDT, you would visually select the algorithm (e.g., Microsoft Decision Trees)
-- and then map columns like CustomerID (Input), Age (Input), Income (Input),
-- and Churn (Predict).
            

The process involves defining which attributes are used for prediction and which are used as input variables for the model to learn from.

Configuring Algorithm Properties

Each mining algorithm has a set of properties that can be tuned to influence its behavior. For example, with Decision Trees, you can control the maximum number of nodes, the minimum number of cases per node, and the split criteria. Experimenting with these properties is essential for finding the optimal model configuration.

Important: Overfitting can occur if a model is too complex. Ensure you have sufficient data and set appropriate parameters to prevent the model from memorizing the training data instead of generalizing patterns.

Training and Testing the Model

Once a model is created, it needs to be processed (trained) using your data. After training, you can evaluate its performance using various metrics and visualizations provided by SSAS.

You can access the model viewer in SSDT to explore these results visually.