Tutorial: Data Mining with Analysis Services
This tutorial guides you through the process of building a data mining model using SQL Server Analysis Services (SSAS). We will cover creating a data source, defining a data source view, building mining structures and models, and processing and exploring the results.
Introduction
Data mining is the process of discovering patterns and relationships in large datasets. SQL Server Analysis Services provides powerful tools and algorithms for data mining, allowing you to build predictive models such as classification, clustering, and association rules. This tutorial uses the AdventureWorksDW sample database.
Prerequisites
- SQL Server Data Tools (SSDT) for Visual Studio
- SQL Server Analysis Services instance installed
- AdventureWorksDW sample database installed and accessible
Step 1: Create a New Analysis Services Project
Start by creating a new Analysis Services project in Visual Studio.
- Open Visual Studio.
- Go to File > New > Project.
- Under Business Intelligence, select Analysis Services Project.
- Name your project (e.g.,
BikeSalesMining) and click OK.
Step 2: Create a Data Source
Define a connection to your data source, which is typically the AdventureWorksDW database.
- In Solution Explorer, right-click Data Sources and select Add Data Source.
- In the Data Source Wizard, click New.
- Configure the connection manager:
- Provider: Microsoft OLE DB Provider for SQL Server
- Server name: Your SQL Server instance name
- Authentication: Windows Authentication or SQL Server Authentication
- Database name: AdventureWorksDW2017 (or your version)
- Click Test Connection to verify. Click OK twice to close the dialogs.
- Click Next, then Finish.
Step 3: Create a Data Source View
A Data Source View (DSV) provides a unified logical view of your data, abstracting the physical schema.
- In Solution Explorer, right-click Data Source Views and select Add Data Source View.
- In the Data Source View Wizard, click Next.
- Select the data source you created in Step 2 and click Next.
- From the list of tables, select relevant tables like
DimProduct,DimCustomer,FactInternetSales, andDimGeography. Drag them into the diagram pane. - SSAS will automatically create relationships based on foreign keys. Verify these relationships.
- Click Next, name the view (e.g.,
SalesDSV), and click Finish.
Step 4: Create a Mining Structure
The mining structure defines the data that will be used to train and test your mining model. You'll choose an algorithm and specify input and predictable columns.
- In Solution Explorer, right-click Mining Structures and select Add Mining Structure.
- In the Mining Structure Wizard, click Next.
- Select Create from existing table and click Next.
- Select your Data Source View (
SalesDSV) and click Next. - Choose the Decision Trees algorithm for this tutorial. Click Next.
- In the Input Columns section, select relevant columns that might influence sales, such as
EnglishProductName,EnglishCountry,Gender,Age, etc. - In the Predictable Columns section, select
SalesAmountfrom theFactInternetSalestable. - Click Next.
- Configure the training and testing data split. A 70/30 or 80/20 split is common. Click Next.
- Name your mining structure (e.g.,
SalesDecisionTree) and click Finish.
Step 5: Train and Process the Model
Before you can explore the model, it needs to be processed (trained) using the selected data and algorithm.
- Right-click the Server project in Solution Explorer and select Process.
- In the Process dialog, ensure all items (Data Source, Data Source View, Mining Structure, Mining Model) are selected. Click Run.
- Monitor the process. Once all steps are green (successful), click Close.
Step 6: Explore the Model
After processing, you can visualize and interact with your trained model.
- In Solution Explorer, double-click the mining structure you created (
SalesDecisionTree) to open the mining model designer. - Click the Viewer tab at the bottom.
- Select Decision Tree Viewer from the dropdown.
- Navigate the decision tree to understand the factors influencing sales amount. You can drill down into nodes to see specific data subsets.
- Explore other viewers available for the algorithm, such as the Lift Chart, to evaluate model performance.
-- Example of how you might query the model using DMX (Data Mining Extensions)
SELECT
[SalesAmount]
FROM
[SalesDecisionTree].CASES
WHERE
[EnglishCountry] = 'United States'
Next Steps
- Try different mining algorithms (e.g., Clustering, Association Rules).
- Create more complex mining structures with multiple models.
- Learn how to integrate SSAS data mining with Reporting Services or Power BI.
- Explore DMX queries for more advanced analysis.