Mining Structures
This section provides comprehensive documentation on mining structures in SQL Server Analysis Services (SSAS). Mining structures are the foundation for all data mining tasks in SSAS. They define the data that will be used for mining, including the prediction sources and related attributes.
What is a Mining Structure?
A mining structure in SSAS is a metadata object that defines the data to be analyzed by a data mining algorithm. It acts as a container for:
- Data Sources: The tables and columns from which mining data is extracted.
- Columns: The attributes within the data sources that will be used for mining, categorized by their role (e.g., predictable, input).
- Partitions: Subsets of the data that can be used for training and testing the mining model.
- Mining Models: One or more mining models that can be built upon the structure.
Key Components of a Mining Structure
1. Data Sources
The data source view defines the source of data for the mining structure. This can be a relational database, a data warehouse, or any other data source supported by SSAS.
2. Columns
Each column in the mining structure represents an attribute from the source data. You must specify the following properties for each column:
- Name: A unique identifier for the column.
- Data Type: The type of data (e.g., integer, string, date).
- Content Type: How the algorithm should interpret the data (e.g., `Discrete`, `Continuous`, `Key`).
- Usage: The role of the column in the mining process:
- Input: Used as predictor attributes.
- Predictable: The attribute that the mining model will attempt to predict.
- Ignore: The attribute is not used in the mining process.
3. Partitions
Partitions allow you to divide your data into subsets for training and testing. This is crucial for evaluating the performance and accuracy of your mining models.
4. Mining Models
A mining structure can contain multiple mining models, each built using a different algorithm or configuration. This allows you to compare the results of various mining techniques on the same data.
Creating a Mining Structure
You can create mining structures using SQL Server Data Tools (SSDT) for Visual Studio. The process typically involves:
- Creating a new Analysis Services project.
- Adding a Data Source View.
- Creating a new Mining Structure.
- Configuring the columns and their properties.
- Selecting the mining algorithms you want to use.
Example
Consider a customer churn prediction scenario. Your mining structure might include:
- Predictable Column: `ChurnStatus` (e.g., 'Yes', 'No')
- Input Columns: `Age`, `Gender`, `MonthlyCharges`, `ContractType`, `Tenure`, `SupportCalls`
You would then build mining models (e.g., Decision Trees, Logistic Regression) on this structure to predict which customers are likely to churn.
-- Example SQL query to select data for a mining structure (conceptual)
SELECT
CustomerID,
Age,
Gender,
MonthlyCharges,
ContractType,
Tenure,
SupportCalls,
ChurnStatus
FROM
CustomerData
WHERE
CustomerID IS NOT NULL AND ChurnStatus IS NOT NULL;