Mining Structures
A mining structure is the fundamental object in SQL Server Analysis Services (SSAS) data mining. It provides a framework for creating, managing, and using data mining models. Think of it as a container that holds the data source, the data preprocessing steps, and one or more data mining models built on that data.
Key Components of a Mining Structure
- Data Source View: A representation of the data that the mining structure will use. This can be a subset of a data warehouse or a relational database.
- Columns: Defines the attributes from the data source view that will be used for mining. These columns are categorized by their content type (e.g., Key, Predictive, Discretized) and usage (e.g., Input, Predict).
- Drillthrough Settings: Allows users to explore the underlying data that led to a specific prediction or pattern found by a data mining model.
- Data Mining Models: One or more models (e.g., decision trees, clustering, neural networks) can be built and associated with a single mining structure.
Creating a Mining Structure
Mining structures are typically created using SQL Server Data Tools (SSDT) or SQL Server Management Studio (SSMS). The process involves:
- Selecting a Data Source View.
- Choosing the columns from the data source view to include.
- Defining the content type and usage for each column.
- Optionally, defining drillthrough columns.
Mining Structure Properties
When you create a mining structure, you define several properties that dictate how data is processed and how models are built:
- Name: A unique identifier for the mining structure.
- Description: Provides a brief explanation of the mining structure's purpose.
- Mining Column Properties: For each column, you specify:
- Content: Defines how the data in the column should be treated (e.g., `Continuous`, `Discrete`, `OrderedDiscrete`, `Key`).
- Usage: Specifies the role of the column in the mining model (e.g., `Input`, `Predict`, `PredictOnly`, `Ignore`).
- Training Mode: Determines whether the mining structure is processed to build models or if it's for testing and applying existing models.
- Allow Drillthrough: Enables or disables the drillthrough capability for the mining structure.
Example of a Mining Structure Definition (Conceptual XML)
<MiningStructure xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ID>CustomerSegmentationStructure</ID>
<Name>Customer Segmentation</Name>
<DataSourceViewID>SalesDW_DSV</DataSourceViewID>
<Columns>
<Column xsi:type="ScalarMiningStructureColumn">
<ID>CustomerID</ID>
<Name>Customer ID</Name>
<DataType>Integer</DataType>
<Content>Key</Content>
<Usage>Key</Usage>
</Column>
<Column xsi:type="ScalarMiningStructureColumn">
<ID>Age</ID>
<Name>Age</Name>
<DataType>Integer</DataType>
<Content>Continuous</Content>
<Usage>Input</Usage>
</Column>
<Column xsi:type="ScalarMiningStructureColumn">
<ID>AnnualIncome</ID>
<Name>Annual Income</Name>
<DataType>Double</DataType>
<Content>Continuous</Content>
<Usage>Input</Usage>
</Column>
<Column xsi:type="ScalarMiningStructureColumn">
<ID>Education</ID>
<Name>Education</Name>
<DataType>String</DataType>
<Content>Discrete</Content>
<Usage>Input</Usage>
</Column>
<Column xsi:type="ScalarMiningStructureColumn">
<ID>Purchases</ID>
<Name>Number of Purchases</Name>
<DataType>Integer</DataType>
<Content>Discretized</Content>
<Usage>Predict</Usage>
<DiscretizationMethod>Automatic</DiscretizationMethod>
</Column>
</Columns>
<MiningModels>
<MiningModel>
<ID>ClusteringModel</ID>
<Name>Customer Clusters</Name>
<Algorithm>CLUSTERING</Algorithm>
</MiningModel>
</MiningModels>
</MiningStructure>
Benefits of Using Mining Structures
- Organization: Centralizes data preparation and model definitions.
- Reusability: Multiple models can be built upon the same mining structure.
- Performance: SSAS optimizes data processing for mining structures.
- Scalability: Handles large datasets efficiently.
Understanding and effectively utilizing mining structures is crucial for successful data mining in SQL Server Analysis Services. They form the foundation upon which you build intelligent insights from your data.
For more advanced scenarios, consider exploring related topics such as Mining Models and Data Mining Extensions (DMX).