Naive Bayes (Data Mining)
The Naive Bayes algorithm creates a classification model that predicts the probability of a given outcome based on the assumption that all predictors are independent. It is particularly useful for large datasets and can handle both categorical and continuous data.
Key Features
- Fast model building and scoring.
- Works well with high‑dimensional data.
- Provides probabilistic predictions.
- Supports missing values.
When to Use
- Text classification, spam detection.
- Medical diagnosis where independence assumption is reasonable.
- Any scenario requiring quick, interpretable probabilities.
Syntax
CREATE MINING MODEL [model_name]
FROM [source_table]
WITH (
NATIVE_QUERY = N'',
CONTENT = N'',
CLUSTERING_METHOD = N'',
DATAMINING_ALGORITHM = N'NAIVE_BAYES',
DATA_SOURCE = N'',
INPUT = (COLUMN1 [TYPE], COLUMN2 [TYPE], …),
TARGET = N'target_column',
ALGORITHM_OPTIONS = N'…'
);
Parameters
| Option | Default | Description |
|---|---|---|
TARGET | — | Name of the column to predict. |
ALGORITHM_OPTIONS | — | Additional settings like SUPPORT_VECTOR_MACHINES etc. (not applicable here). |
DATA_SOURCE | — | Name of the data source view. |
Example
This example creates a Naive Bayes model to predict customer churn based on demographic data.
USE AdventureWorksDW2019;
GO
CREATE MINING MODEL dbo.CustomerChurnModel
FROM dbo.DimCustomer
WITH
(
DATAMINING_ALGORITHM = N'NAIVE_BAYES',
TARGET = N'Churn',
INPUT = (Gender, Age, IncomeLevel, Region)
);
GO
SELECT * FROM
OPENQUERY(MINING_SERVICES,
'NATIVE SELECT * FROM [dbo].[CustomerChurnModel]') ;