SQL Machine Learning Services Documentation

Introduction to SQL Machine Learning Services

SQL Machine Learning Services allows you to run your machine learning models directly within your SQL Server environment, leveraging the power of your data where it resides. This integration eliminates the need for complex data movement and synchronization, enabling faster insights and more efficient operationalization of AI and ML workloads.

By bringing compute to data, you can reduce latency, improve security, and utilize the familiar SQL syntax to score data, build predictive models, and automate business processes.

Getting Started

Follow these steps to set up and start using SQL Machine Learning Services:

Installation: Ensure SQL Server Machine Learning Services is installed with your SQL Server instance. This can be done during initial installation or by modifying an existing instance.
Configuration: Enable external scripts and configure the necessary language extensions (e.g., Python, R).
Permissions: Grant appropriate permissions to users or service accounts that will execute ML scripts.

Refer to the Installation Guide for detailed instructions.

Key Features

Supported Languages

SQL Machine Learning Services natively supports the following popular programming languages for developing and executing machine learning algorithms:

Python: Leverage powerful libraries like scikit-learn, TensorFlow, PyTorch, and Pandas.
R: Utilize the extensive R ecosystem for statistical computing and graphics.

You can install additional packages to extend the capabilities of these languages within SQL Server.

Algorithms

SQL Machine Learning Services provides access to a wide range of built-in algorithms and allows you to use custom algorithms developed in Python or R. Some common use cases include:

Classification
Regression
Clustering
Time Series Analysis
Anomaly Detection

Model Deployment

Deploy your trained models as stored procedures, functions, or directly within your T-SQL queries. This enables seamless integration into your existing applications and workflows.

The process typically involves saving the model artifact and then creating a T-SQL object that loads and uses the model for scoring or prediction.

Tutorials

Explore our comprehensive tutorials to learn how to build, train, and deploy various machine learning models using SQL Server:

Predicting Housing Prices with Linear Regression
Customer Churn Prediction using Logistic Regression
Customer Segmentation with K-Means Clustering

Code Examples

Here are some common code snippets to get you started:

Python Example: Scoring Data


EXEC sp_execute_external_script
    @language = N'Python',
    @script = N'
import pandas as pd
from sklearn.linear_model import LogisticRegression
import joblib

# Load the trained model (assuming it was saved previously)
# In a real scenario, you would manage model persistence.
# For demonstration, we'll create a dummy model.
if ''model'' in InputDataSet.columns:
    loaded_model = InputDataSet[""model""].iloc[0] # Placeholder for loaded model
else:
    # Create a dummy model for illustration
    X_train = [[1, 2], [2, 3], [3, 4]]
    y_train = [0, 1, 0]
    dummy_model = LogisticRegression()
    dummy_model.fit(X_train, y_train)
    loaded_model = dummy_model

# Get input data from SQL Server
input_data = InputDataSet.drop(columns=["model"]) if "model" in InputDataSet.columns else InputDataSet

# Make predictions
predictions = loaded_model.predict(input_data)
output_data = pd.DataFrame({"PredictedValue": predictions})
'
    --@input_data_name = N'InputDataSet' -- Optional: to pass data from SQL
    --@output_data_name = N'OutputDataSet' -- Optional: to return data to SQL
;

R Example: Training a Model


EXEC sp_execute_external_script
    @language = N'R',
    @script = N'
library(caret)

# Load data from SQL Server (assuming it's passed via @input_data_name)
# input_data_frame is the default variable name for input data

# Create a dummy model for illustration
X_train <- data.frame(
  feature1 = c(1, 2, 3, 4, 5),
  feature2 = c(2, 3, 4, 5, 6)
)
y_train <- factor(c("A", "B", "A", "B", "A"))

model <- train(X_train, y_train, method = "rpart")

# Save the model for later use
# In a real scenario, you would save this artifact.
# For demonstration, we'll just output a message.
print("Model training complete.")

# To return the model object for scoring, you would typically
# use `saveRDS` and then load it in a scoring script.
# For this example, we'll just output a placeholder column.
output_data_frame <- data.frame(ModelStatus = "Trained")
'
    --@input_data_name = N'InputDataSet' -- Optional: to pass data from SQL
    --@output_data_name = N'OutputDataSet' -- Optional: to return data to SQL
;

API Reference

Explore the T-SQL stored procedures and functions available for managing and executing external scripts.

Procedure/Function	Description
`sp_execute_external_script`	Executes external scripts written in Python or R.
`PREDICT` (if applicable)	T-SQL function for real-time scoring using deployed models.
`CREATE MODEL` (hypothetical)	Stored procedure for simplified model creation and deployment.

Troubleshooting Common Issues

Note on Environment Setup

Ensure your Python/R environments are correctly configured and accessible by SQL Server. Path issues and missing dependencies are common challenges.

Tip for Large Datasets

For very large datasets, consider using techniques like data chunking or sampling to manage memory usage and improve performance during training and scoring.

Error: External script execution failed

This is a general error. Check the SQL Server error logs and the external script logs for more specific details. Ensure the language runtime is installed and configured correctly.

Error: ModuleNotFoundError / Package Not Found

This indicates a missing Python or R package. Install the required package in the environment used by SQL Server. You might need to use pip install or install.packages() on the server.