Mining Structure and Mining Model (Analysis Services)

This document provides a comprehensive overview of mining structures and mining models within Microsoft SQL Server Analysis Services (SSAS). Understanding these fundamental concepts is crucial for designing, developing, and deploying effective data mining solutions.

What is a Mining Structure?

A mining structure is a container that defines the source data, the selected columns, and the data types for a data mining project. It acts as a blueprint for the data that will be used to train and query mining models. A mining structure can contain multiple related tables and views, and it defines how these sources are joined. It also specifies the content type and usage of each column within the data mining context.

Key Components of a Mining Structure:

Data Sources: References to the relational or multidimensional data sources.
Columns: The attributes from the source data that are relevant for mining. Each column has a defined data type (e.g., numeric, text, boolean) and a usage role (e.g., predictable, input).
Relationships: How different tables or views within the mining structure are related, typically defined by keys.
Partitions: Optional divisions of the mining structure for incremental training or managing large datasets.

What is a Mining Model?

A mining model is built upon a mining structure and represents the output of applying a specific data mining algorithm to the data defined by the structure. Each mining model discovers patterns, relationships, or predictions within the data. For example, a decision tree model might discover rules for customer purchasing behavior, while a clustering model might group customers into distinct segments.

Relationship Between Mining Structure and Mining Model:

A single mining structure can be used to build multiple mining models, each employing a different algorithm (e.g., Decision Trees, Clustering, Neural Networks, Association Rules, Sequence Clustering). The mining model inherits the data schema from the mining structure but adds its own learned patterns and metadata.

Creating Mining Structures and Models

You typically create mining structures and models using SQL Server Data Tools (SSDT) for Analysis Services or by writing DMX (Data Mining Extensions) or AMO (Analysis Management Objects) scripts.

Steps Involved:

Define the Mining Structure: Select data sources, specify tables/views, define column mappings, and set column types and usages.
Create Mining Models: Select a mining algorithm and associate it with an existing mining structure. Configure algorithm-specific parameters.
Process the Models: Train the mining models by running them against the data defined in the mining structure.
Explore and Query: Use visualization tools and DMX queries to understand the discovered patterns and make predictions.

Example DMX Snippet (Conceptual):


-- Creating a basic mining structure
CREATE MINING STRUCTURE [MyCustomerStructure]
(
    [CustomerID] LONG KEY
)
WITH (
    DATA SOURCE = [MyDataSource],
    MAX_ROWS = 0,
    MAX_SIZE = 0
);

-- Adding a table to the structure
ALTER MINING STRUCTURE [MyCustomerStructure]
ADD NODE
(
    [CustomerTable]
    (
        [Age] LONG INPUT,
        [Gender] TEXT PREDICTABLE,
        [Income] DOUBLE INPUT
    )
);

-- Creating a decision tree model based on the structure
CREATE MINING MODEL [MyDecisionTree]
(
    [MyCustomerStructure].*
)
USING
    [Microsoft_Decision_Trees]
    (
        COMPLEXITY_PENALTY = 0.5
    );

Best Practices

Start with a clear business objective.
Understand your data thoroughly before modeling.
Select relevant columns and appropriate data types/usages.
Experiment with different algorithms and parameters to find the best fit.
Validate your models using appropriate metrics.

This section provides the foundational knowledge for working with mining structures and models in SQL Server Analysis Services. Further sections will delve into specific algorithms, data preparation techniques, and querying methods.