Naive Bayes (Data Mining)

The Naive Bayes algorithm creates a classification model that predicts the probability of a given outcome based on the assumption that all predictors are independent. It is particularly useful for large datasets and can handle both categorical and continuous data.

Key Features

Fast model building and scoring.
Works well with high‑dimensional data.
Provides probabilistic predictions.
Supports missing values.

When to Use

Text classification, spam detection.
Medical diagnosis where independence assumption is reasonable.
Any scenario requiring quick, interpretable probabilities.

Syntax

CREATE MINING MODEL [model_name]
FROM [source_table]
WITH ( 
    NATIVE_QUERY = N'',
    CONTENT = N'',
    CLUSTERING_METHOD = N'',
    DATAMINING_ALGORITHM = N'NAIVE_BAYES',
    DATA_SOURCE = N'',
    INPUT = (COLUMN1 [TYPE], COLUMN2 [TYPE], …),
    TARGET = N'target_column',
    ALGORITHM_OPTIONS = N'…'
);

Parameters

Option	Default	Description
`TARGET`	—	Name of the column to predict.
`ALGORITHM_OPTIONS`	—	Additional settings like `SUPPORT_VECTOR_MACHINES` etc. (not applicable here).
`DATA_SOURCE`	—	Name of the data source view.

Example

This example creates a Naive Bayes model to predict customer churn based on demographic data.

USE AdventureWorksDW2019;
GO

CREATE MINING MODEL dbo.CustomerChurnModel
FROM dbo.DimCustomer
WITH
(
    DATAMINING_ALGORITHM = N'NAIVE_BAYES',
    TARGET = N'Churn',
    INPUT = (Gender, Age, IncomeLevel, Region)
);
GO

SELECT * FROM
OPENQUERY(MINING_SERVICES, 
'NATIVE SELECT * FROM [dbo].[CustomerChurnModel]') ;

Model created successfully.
Preview of model predictions:
+----------+----------+----------+
| Customer | Churn    | Probability |
+----------+----------+----------+
| 101      | Yes      | 0.78 |
| 102      | No       | 0.63 |
| 103      | No       | 0.55 |
+----------+----------+----------+