What is Data Mining?
Data mining in SQL Server Analysis Services (SSAS) is the process of discovering patterns, correlations, and trends in large datasets to predict future outcomes. It combines statistical techniques with machine learning to produce predictive models that can be queried, visualized, and integrated into applications.
Core Concepts
- Data Mining Model: A trained model that predicts outcomes based on input data.
- Mining Structure: Defines the schema for mining models, including tables, columns, and relationships.
- Mining Algorithm: The method used to train a model (e.g., Decision Trees, Neural Networks).
- Training Data: Historical data used to build the model.
- Prediction: Applying a trained model to new data to generate outcomes.
- Drill‑Through: Retrieve underlying transaction details that contribute to a particular prediction.
Typical Workflow
- Identify business problem & data sources.
- Create a Mining Structure that models the problem.
- Select a suitable Algorithm and configure parameters.
- Process the structure to train the Mining Model.
- Validate the model using Test Data and accuracy metrics.
- Deploy the model for Prediction and integrate with reports or applications.
Key Terms Glossary
Algorithm : Method for building a model (e.g., Decision Trees)
Attribute : Column used as input or output in a model
Confidence Level : Probability that a prediction is correct
Support : Percentage of rows that satisfy a rule
Lift : Measure of rule strength over random chance
Sample T‑SQL to Create a Mining Structure
CREATE MINING STRUCTURE dbo.CustomerChurn
(
[CustomerID] LONG KEY,
[Gender] TEXT DISCRETE,
[Age] LONG CONTINUOUS,
[Tenure] LONG CONTINUOUS,
[Churn] TEXT DISCRETE -- Target column
)
CONTENT (
SELECT CustomerID, Gender, Age, Tenure, Churn
FROM dbo.Customers
);