Sequence Clustering Algorithm

The Sequence Clustering algorithm groups similar sequences of events into clusters based on the similarity of their statistical patterns. It is commonly used for analyzing temporal data such as purchase histories, web clickstreams, or any ordered set of categorical events.

Syntax

CREATE MINING MODEL [model_name]
(
    [column_name] [data_type] [attribute_properties]
    ...
)
USING Microsoft_Sequence_Clustering
(
    [algorithm_option] = [value],
    ...
);

Algorithm Parameters

ParameterTypeDefaultDescription
MAX_CLUSTERSint10Maximum number of clusters to generate.
MIN_SUPPORTfloat0.05Minimum relative frequency a sequence must have to be considered.
MAX_ITERATIONSint100Maximum number of EM iterations.
SEQUENCE_COLUMNstringNULLName of the column that contains the sequence data.
SEQUENCE_PATTERNstringNULLPattern defining the delimiter and format of the sequence (e.g., “,{ }”).

Remarks

Example

The following script creates a sequence clustering model named CustomerPurchaseSeqModel using purchase histories stored in the Purchases table.

CREATE MINING MODEL CustomerPurchaseSeqModel
(
    CustomerID      INT          KEY,
    PurchaseDate    DATETIME,
    PurchaseAmount  DECIMAL(10,2),
    PurchaseSeq     NVARCHAR(4000)   NULL
)
USING Microsoft_Sequence_Clustering
(
    SEQUENCE_COLUMN = N'PurchaseSeq',
    SEQUENCE_PATTERN = N',',
    MAX_CLUSTERS = 5,
    MIN_SUPPORT = 0.02,
    MAX_ITERATIONS = 150
);
GO

INSERT INTO CustomerPurchaseSeqModel (CustomerID, PurchaseDate, PurchaseAmount, PurchaseSeq)
SELECT CustomerID, PurchaseDate, PurchaseAmount,
       STRING_AGG(ProductID, ',') WITHIN GROUP (ORDER BY PurchaseDate)
FROM dbo.Purchases
GROUP BY CustomerID;
GO

SELECT CustomerID, $CLUSTER AS ClusterID
FROM CustomerPurchaseSeqModel;

See Also