Home > SQL Server > Analysis Services > Mining Model Extensions (DMX)

Mining Model Extensions (DMX)

This document provides comprehensive information on Data Mining Extensions (DMX), a query language for SQL Server Analysis Services (SSAS) that enables you to query and manipulate data mining models.

Introduction to DMX

DMX is a declarative query language designed specifically for interacting with OLAP data mining models. It shares similarities with SQL but is tailored for data mining tasks. DMX allows you to:

Define, create, and manipulate mining structures and models.
Extract insights and patterns from your data.
Generate predictions based on trained models.
Score new data using existing models.

Understanding DMX is crucial for any developer or analyst working with SQL Server Analysis Services data mining features.

DMX Syntax Overview

DMX statements typically follow a structure that includes clauses for selecting data, specifying the source model, applying filters, and defining output. The core statements are often SELECT, INSERT, CREATE MINING MODEL, ALTER MINING MODEL, and DROP MINING MODEL.

A common pattern for querying is:

SELECT
    
FROM
    
[WHERE
    ]
[ORDER BY
    ]

For predictions, the syntax often involves a prediction function:

SELECT
    Predict()
FROM
    
WHERE

Common DMX Operations

Creating Mining Models

DMX can be used to create mining structures and models programmatically. This involves defining the source data, the mining algorithm, and the parameters.

Example structure (simplified):

CREATE MINING MODEL MyNewModel
(
    [CustomerID] LONG KEY,
    [Age] DOUBLE DISCRETIZED(10),
    [Gender] TEXT
)
USING
    Microsoft_Decision_Trees
(
    MAX_DEPTH = 8
);

Predicting Data

The Predict function is used to generate predictions for new data instances. You can predict a single value or multiple possible values.

SELECT
    Predict([Product]) AS PredictedProduct
FROM
    MyModel
WHERE
    [CustomerID] = 'ABC-123';

Browsing Model Content

DMX allows you to explore the internal structure of a trained mining model, such as decision tree nodes, clusters, or association rules.

SELECT * FROM MyModel.Tree([NodeID])

Scoring Data

Scoring involves applying a trained model to a dataset to get predictions or probabilities. This can be done using Predict or other prediction functions.

SELECT
    [CustomerID],
    Predict([IsHighValueCustomer]) AS PredictedValue,
    PredictProbability([IsHighValueCustomer], 1) AS ProbabilityOfHighValue
FROM
    MyCustomerModel
WHERE
    [TotalSpend] > 1000;

Key DMX Statements

SELECT: Retrieves data from mining models or structures.
INSERT INTO: Inserts data into a mining model or structure.
CREATE MINING MODEL: Creates a new mining model.
ALTER MINING MODEL: Modifies an existing mining model.
DROP MINING MODEL: Deletes a mining model.
CREATE MINING STRUCTURE: Creates a new mining structure.
ALTER MINING STRUCTURE: Modifies an existing mining structure.
DROP MINING STRUCTURE: Deletes a mining structure.

DMX Examples

Here are a few more practical examples:

Example 1: Predict next purchase for a customer

SELECT
    Predict([ProductBought]) AS NextProduct
FROM
    MyPurchaseModel
WHERE
    [CustomerID] = 'XYZ-789'
    AND EXISTS (SELECT * FROM [OrderHistory] WHERE [CustomerID] = 'XYZ-789');

Example 2: Get probability of a customer belonging to a cluster

SELECT
    Cluster(),
    ClusterProbability([CustomerID]) AS Probability
FROM
    MyClusteringModel
WHERE
    [CustomerID] = 'PQR-456';

Example 3: Browse rules in an association model

SELECT
    [Model],
    [RuleID],
    [Support],
    [Confidence]
FROM
    MyAssociationModel.Rules
WHERE
    [Support] > 0.05;

Best Practices

Understand your data: Ensure your data is clean and well-structured before building models.
Choose the right algorithm: Select the algorithm that best suits your business problem.
Iterate and refine: Data mining is an iterative process. Experiment with parameters and model types.
Optimize queries: Write efficient DMX queries to minimize processing time.
Document your models: Keep clear records of model configurations and DMX scripts.