Mining Model Extensions (DMX)
This document provides comprehensive information on Data Mining Extensions (DMX), a query language for SQL Server Analysis Services (SSAS) that enables you to query and manipulate data mining models.
On This Page
Introduction to DMX
DMX is a declarative query language designed specifically for interacting with OLAP data mining models. It shares similarities with SQL but is tailored for data mining tasks. DMX allows you to:
- Define, create, and manipulate mining structures and models.
- Extract insights and patterns from your data.
- Generate predictions based on trained models.
- Score new data using existing models.
Understanding DMX is crucial for any developer or analyst working with SQL Server Analysis Services data mining features.
DMX Syntax Overview
DMX statements typically follow a structure that includes clauses for selecting data, specifying the source model, applying filters, and defining output. The core statements are often SELECT, INSERT, CREATE MINING MODEL, ALTER MINING MODEL, and DROP MINING MODEL.
A common pattern for querying is:
SELECT
FROM
[WHERE
]
[ORDER BY
]
For predictions, the syntax often involves a prediction function:
SELECT
Predict()
FROM
WHERE
Common DMX Operations
Creating Mining Models
DMX can be used to create mining structures and models programmatically. This involves defining the source data, the mining algorithm, and the parameters.
Example structure (simplified):
CREATE MINING MODEL MyNewModel
(
[CustomerID] LONG KEY,
[Age] DOUBLE DISCRETIZED(10),
[Gender] TEXT
)
USING
Microsoft_Decision_Trees
(
MAX_DEPTH = 8
);
Predicting Data
The Predict function is used to generate predictions for new data instances. You can predict a single value or multiple possible values.
SELECT
Predict([Product]) AS PredictedProduct
FROM
MyModel
WHERE
[CustomerID] = 'ABC-123';
Browsing Model Content
DMX allows you to explore the internal structure of a trained mining model, such as decision tree nodes, clusters, or association rules.
SELECT * FROM MyModel.Tree([NodeID])
Scoring Data
Scoring involves applying a trained model to a dataset to get predictions or probabilities. This can be done using Predict or other prediction functions.
SELECT
[CustomerID],
Predict([IsHighValueCustomer]) AS PredictedValue,
PredictProbability([IsHighValueCustomer], 1) AS ProbabilityOfHighValue
FROM
MyCustomerModel
WHERE
[TotalSpend] > 1000;
Key DMX Statements
- SELECT: Retrieves data from mining models or structures.
- INSERT INTO: Inserts data into a mining model or structure.
- CREATE MINING MODEL: Creates a new mining model.
- ALTER MINING MODEL: Modifies an existing mining model.
- DROP MINING MODEL: Deletes a mining model.
- CREATE MINING STRUCTURE: Creates a new mining structure.
- ALTER MINING STRUCTURE: Modifies an existing mining structure.
- DROP MINING STRUCTURE: Deletes a mining structure.
DMX Examples
Here are a few more practical examples:
SELECT
Predict([ProductBought]) AS NextProduct
FROM
MyPurchaseModel
WHERE
[CustomerID] = 'XYZ-789'
AND EXISTS (SELECT * FROM [OrderHistory] WHERE [CustomerID] = 'XYZ-789');
SELECT
Cluster(),
ClusterProbability([CustomerID]) AS Probability
FROM
MyClusteringModel
WHERE
[CustomerID] = 'PQR-456';
SELECT
[Model],
[RuleID],
[Support],
[Confidence]
FROM
MyAssociationModel.Rules
WHERE
[Support] > 0.05;
Best Practices
- Understand your data: Ensure your data is clean and well-structured before building models.
- Choose the right algorithm: Select the algorithm that best suits your business problem.
- Iterate and refine: Data mining is an iterative process. Experiment with parameters and model types.
- Optimize queries: Write efficient DMX queries to minimize processing time.
- Document your models: Keep clear records of model configurations and DMX scripts.