Tutorial: Data Mining with Analysis Services

This tutorial guides you through the process of building a data mining model using SQL Server Analysis Services (SSAS). We will cover creating a data source, defining a data source view, building mining structures and models, and processing and exploring the results.

Introduction

Data mining is the process of discovering patterns and relationships in large datasets. SQL Server Analysis Services provides powerful tools and algorithms for data mining, allowing you to build predictive models such as classification, clustering, and association rules. This tutorial uses the AdventureWorksDW sample database.

Prerequisites

Step 1: Create a New Analysis Services Project

Start by creating a new Analysis Services project in Visual Studio.

  1. Open Visual Studio.
  2. Go to File > New > Project.
  3. Under Business Intelligence, select Analysis Services Project.
  4. Name your project (e.g., BikeSalesMining) and click OK.

Step 2: Create a Data Source

Define a connection to your data source, which is typically the AdventureWorksDW database.

  1. In Solution Explorer, right-click Data Sources and select Add Data Source.
  2. In the Data Source Wizard, click New.
  3. Configure the connection manager:
    • Provider: Microsoft OLE DB Provider for SQL Server
    • Server name: Your SQL Server instance name
    • Authentication: Windows Authentication or SQL Server Authentication
    • Database name: AdventureWorksDW2017 (or your version)
  4. Click Test Connection to verify. Click OK twice to close the dialogs.
  5. Click Next, then Finish.

Step 3: Create a Data Source View

A Data Source View (DSV) provides a unified logical view of your data, abstracting the physical schema.

  1. In Solution Explorer, right-click Data Source Views and select Add Data Source View.
  2. In the Data Source View Wizard, click Next.
  3. Select the data source you created in Step 2 and click Next.
  4. From the list of tables, select relevant tables like DimProduct, DimCustomer, FactInternetSales, and DimGeography. Drag them into the diagram pane.
  5. SSAS will automatically create relationships based on foreign keys. Verify these relationships.
  6. Click Next, name the view (e.g., SalesDSV), and click Finish.
Tip: You can rename columns and create calculated columns within the Data Source View designer for better usability.

Step 4: Create a Mining Structure

The mining structure defines the data that will be used to train and test your mining model. You'll choose an algorithm and specify input and predictable columns.

  1. In Solution Explorer, right-click Mining Structures and select Add Mining Structure.
  2. In the Mining Structure Wizard, click Next.
  3. Select Create from existing table and click Next.
  4. Select your Data Source View (SalesDSV) and click Next.
  5. Choose the Decision Trees algorithm for this tutorial. Click Next.
  6. In the Input Columns section, select relevant columns that might influence sales, such as EnglishProductName, EnglishCountry, Gender, Age, etc.
  7. In the Predictable Columns section, select SalesAmount from the FactInternetSales table.
  8. Click Next.
  9. Configure the training and testing data split. A 70/30 or 80/20 split is common. Click Next.
  10. Name your mining structure (e.g., SalesDecisionTree) and click Finish.

Step 5: Train and Process the Model

Before you can explore the model, it needs to be processed (trained) using the selected data and algorithm.

  1. Right-click the Server project in Solution Explorer and select Process.
  2. In the Process dialog, ensure all items (Data Source, Data Source View, Mining Structure, Mining Model) are selected. Click Run.
  3. Monitor the process. Once all steps are green (successful), click Close.
Important: Processing can take time depending on the size of your dataset and the complexity of the algorithm.

Step 6: Explore the Model

After processing, you can visualize and interact with your trained model.

  1. In Solution Explorer, double-click the mining structure you created (SalesDecisionTree) to open the mining model designer.
  2. Click the Viewer tab at the bottom.
  3. Select Decision Tree Viewer from the dropdown.
  4. Navigate the decision tree to understand the factors influencing sales amount. You can drill down into nodes to see specific data subsets.
  5. Explore other viewers available for the algorithm, such as the Lift Chart, to evaluate model performance.
-- Example of how you might query the model using DMX (Data Mining Extensions)
SELECT
  [SalesAmount]
FROM
  [SalesDecisionTree].CASES
WHERE
  [EnglishCountry] = 'United States'
                

Next Steps