Overview
The Microsoft ARIMA (AutoRegressive Integrated Moving Average) algorithm is a time‑series forecasting model that predicts future values based on past observations. It is commonly used for forecasting sales, demand, and other sequential data in SQL Server Analysis Services (SSAS) data mining projects.
Key Features
- Supports non‑seasonal and seasonal data series.
- Automatic parameter selection (p, d, q) via the
AutoSelectionproperty. - Integration with the
DMXquery language for model training and prediction. - Handles missing values with built‑in imputation.
Creating an ARIMA Model
Below is a step‑by‑step example of creating an ARIMA model using DMX in SSAS.
-- Create a mining structure
CREATE MINING STRUCTURE SalesForecast
(
SalesDate DATETIME NOT NULL,
SalesAmount DOUBLE PRECISION
)
USING Microsoft_ARIMA;
-- Populate the mining model
INSERT INTO SalesForecast (SalesDate, SalesAmount)
SELECT OrderDate, TotalAmount
FROM dbo.Sales
WHERE OrderDate >= '2020-01-01';
-- Train the model (auto selects parameters)
CREATE MINING MODEL SalesModel
FROM SalesForecast
WITH
(
AUTO_SELECTION = TRUE
);
Prediction Example
Query the model to predict sales for the next 12 months.
SELECT
PREDICTED.SalesAmount,
PREDICTED.Confidence
FROM
OPENQUERY([YourCube], '
SELECT
[SalesAmount] AS [PredictedSales],
[Confidence] AS [ConfidenceLevel]
FROM
[SalesModel]
PREDICTION
SELECT
NULL AS [SalesDate],
12 AS [PredictionLength]
');
Algorithm Parameters
| Parameter | Data Type | Description | Default |
|---|---|---|---|
| AutoSelection | BOOLEAN | If TRUE, the engine automatically determines optimal p, d, q values. | TRUE |
| Seasonality | INTEGER | Number of periods in a season (e.g., 12 for monthly data). | 0 (none) |
| MaxLag | INTEGER | Maximum lag to consider for AR terms. | 12 |
| ConfidenceLevel | DOUBLE | Confidence level for prediction intervals (0‑1). | 0.95 |
Best Practices
- Ensure your time series data is uniformly spaced (e.g., daily, monthly).
- Remove outliers and handle missing values before training.
- Use
AutoSelection=TRUEfor initial experiments; fine‑tune parameters for production. - Validate model accuracy with a hold‑out set or cross‑validation.