Azure AI Machine Learning Designer Reference
This document provides a detailed reference for all modules available in the Azure AI Machine Learning Designer. Understand each module's purpose, parameters, and usage to build effective machine learning pipelines.
Data Input Modules
These modules are used to bring data into your Machine Learning pipeline.
1. Datasets
Allows you to select pre-registered datasets within your Azure Machine Learning workspace.
Use Case: Load your training or testing data from your workspace's datastore.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Dataset | The specific dataset to load. | Dataset Reference | Yes |
2. Enter Data Manually
Allows you to directly input small amounts of data in a tabular format.
Use Case: Create small, specific datasets for testing or simple tasks.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Data | Comma-separated values (CSV) representing the data. | String | Yes |
Column Names | A comma-separated list of column headers. | String | No |
Data Transformation Modules
Transform and clean your data to prepare it for machine learning.
1. Select Columns in Dataset
Enables you to choose specific columns from your dataset based on various criteria.
Use Case: Remove irrelevant features or select columns for specific analysis.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Selection mode | Mode for selecting columns (e.g., 'With names', 'Range'). | Enum | Yes |
Column names | List of columns to select. | Column List | Yes (if mode is 'With names') |
2. Clean Missing Data
Handles missing values in your dataset by imputation or removal.
Use Case: Address incomplete data points that could affect model performance.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Missing data handling mode | Method to use (e.g., 'Remove row', 'Replace with mean'). | Enum | Yes |
Mean imputation, Median imputation, etc. | Options for imputation strategies. | Number | No |
Model Training Modules
Train various machine learning models on your prepared data.
1. Train SVM Model
Trains a Support Vector Machine classifier.
Use Case: Classification tasks, especially with high-dimensional data.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Left dataset | Training data. | Dataset | Yes |
Untrained model | An untrained SVM model object. | Model | Yes |
Target column | The column to predict. | Column Name | Yes |
2. Linear Regression
Trains a linear regression model for regression tasks.
Use Case: Predicting continuous values.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Left dataset | Training data. | Dataset | Yes |
Formula | R-style formula for the model. | String | Yes |
Model Evaluation Modules
Assess the performance of your trained models.
1. Score Model
Applies a trained model to a dataset to generate predictions.
Use Case: Get predictions on test data or new data.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Untrained model | The trained model. | Model | Yes |
Dataset | Data to score. | Dataset | Yes |
2. Evaluate Model
Calculates various metrics to evaluate the performance of a classification or regression model.
Use Case: Understand accuracy, precision, recall, AUC, RMSE, etc.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Scored dataset | Dataset with predicted values. | Dataset | Yes |
Actual column | The column containing the true labels. | Column Name | Yes |
Scored probabilities column | The column containing predicted probabilities (for classification). | Column Name | No |
Scoring & Deployment Modules
Prepare models for deployment and generate scoring scripts.
1. Execute Python Script
Allows you to run custom Python code within your pipeline.
Use Case: Implement complex logic not covered by built-in modules, custom data preprocessing, or post-processing.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Script file | The Python script file (.py) to execute. | File Path | Yes |
Module input 1, 2, ... | Input datasets/models to the script. | Dataset/Model | No |
Utility Modules
Commonly used modules for pipeline management and data handling.
1. Split Data
Divides a dataset into two or more subsets.
Use Case: Creating training and testing sets.
Parameters:
Name | Description | Type | Required |
---|---|---|---|
Fraction of the first subset | The proportion of data for the first output. | Number (0.0 to 1.0) | Yes |
Stratified split | Whether to perform stratified sampling. | Boolean | No |