Sequence Clustering Algorithm
The Sequence Clustering algorithm groups similar sequences of events into clusters based on the similarity of their statistical patterns. It is commonly used for analyzing temporal data such as purchase histories, web clickstreams, or any ordered set of categorical events.
Syntax
CREATE MINING MODEL [model_name]
(
[column_name] [data_type] [attribute_properties]
...
)
USING Microsoft_Sequence_Clustering
(
[algorithm_option] = [value],
...
);
Algorithm Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| MAX_CLUSTERS | int | 10 | Maximum number of clusters to generate. |
| MIN_SUPPORT | float | 0.05 | Minimum relative frequency a sequence must have to be considered. |
| MAX_ITERATIONS | int | 100 | Maximum number of EM iterations. |
| SEQUENCE_COLUMN | string | NULL | Name of the column that contains the sequence data. |
| SEQUENCE_PATTERN | string | NULL | Pattern defining the delimiter and format of the sequence (e.g., “,{ }”). |
Remarks
- The algorithm expects the sequence column to contain a delimited list of items (e.g.,
'A,B,C'). - All columns used in the model must have a defined data type that is supported for mining (numeric, categorical, or datetime).
- Cluster assignment can be queried using the
$CLUSTERpseudo‑column. - For large datasets, consider increasing
MAX_ITERATIONSor adjustingMIN_SUPPORTto improve convergence.
Example
The following script creates a sequence clustering model named CustomerPurchaseSeqModel using purchase histories stored in the Purchases table.
CREATE MINING MODEL CustomerPurchaseSeqModel
(
CustomerID INT KEY,
PurchaseDate DATETIME,
PurchaseAmount DECIMAL(10,2),
PurchaseSeq NVARCHAR(4000) NULL
)
USING Microsoft_Sequence_Clustering
(
SEQUENCE_COLUMN = N'PurchaseSeq',
SEQUENCE_PATTERN = N',',
MAX_CLUSTERS = 5,
MIN_SUPPORT = 0.02,
MAX_ITERATIONS = 150
);
GO
INSERT INTO CustomerPurchaseSeqModel (CustomerID, PurchaseDate, PurchaseAmount, PurchaseSeq)
SELECT CustomerID, PurchaseDate, PurchaseAmount,
STRING_AGG(ProductID, ',') WITHIN GROUP (ORDER BY PurchaseDate)
FROM dbo.Purchases
GROUP BY CustomerID;
GO
SELECT CustomerID, $CLUSTER AS ClusterID
FROM CustomerPurchaseSeqModel;