Azure AI Machine Learning Designer Reference

Comprehensive documentation for components and modules in Azure Machine Learning designer.

Azure AI Machine Learning Designer Reference

This document provides a detailed reference for all modules available in the Azure AI Machine Learning Designer. Understand each module's purpose, parameters, and usage to build effective machine learning pipelines.

Data Input Modules

These modules are used to bring data into your Machine Learning pipeline.

1. Datasets

Allows you to select pre-registered datasets within your Azure Machine Learning workspace.

Use Case: Load your training or testing data from your workspace's datastore.

Parameters:

Name Description Type Required
Dataset The specific dataset to load. Dataset Reference Yes

2. Enter Data Manually

Allows you to directly input small amounts of data in a tabular format.

Use Case: Create small, specific datasets for testing or simple tasks.

Parameters:

Name Description Type Required
Data Comma-separated values (CSV) representing the data. String Yes
Column Names A comma-separated list of column headers. String No

Data Transformation Modules

Transform and clean your data to prepare it for machine learning.

1. Select Columns in Dataset

Enables you to choose specific columns from your dataset based on various criteria.

Use Case: Remove irrelevant features or select columns for specific analysis.

Parameters:

Name Description Type Required
Selection mode Mode for selecting columns (e.g., 'With names', 'Range'). Enum Yes
Column names List of columns to select. Column List Yes (if mode is 'With names')

2. Clean Missing Data

Handles missing values in your dataset by imputation or removal.

Use Case: Address incomplete data points that could affect model performance.

Parameters:

Name Description Type Required
Missing data handling mode Method to use (e.g., 'Remove row', 'Replace with mean'). Enum Yes
Mean imputation, Median imputation, etc. Options for imputation strategies. Number No

Model Training Modules

Train various machine learning models on your prepared data.

1. Train SVM Model

Trains a Support Vector Machine classifier.

Use Case: Classification tasks, especially with high-dimensional data.

Parameters:

Name Description Type Required
Left dataset Training data. Dataset Yes
Untrained model An untrained SVM model object. Model Yes
Target column The column to predict. Column Name Yes

2. Linear Regression

Trains a linear regression model for regression tasks.

Use Case: Predicting continuous values.

Parameters:

Name Description Type Required
Left dataset Training data. Dataset Yes
Formula R-style formula for the model. String Yes

Model Evaluation Modules

Assess the performance of your trained models.

1. Score Model

Applies a trained model to a dataset to generate predictions.

Use Case: Get predictions on test data or new data.

Parameters:

Name Description Type Required
Untrained model The trained model. Model Yes
Dataset Data to score. Dataset Yes

2. Evaluate Model

Calculates various metrics to evaluate the performance of a classification or regression model.

Use Case: Understand accuracy, precision, recall, AUC, RMSE, etc.

Parameters:

Name Description Type Required
Scored dataset Dataset with predicted values. Dataset Yes
Actual column The column containing the true labels. Column Name Yes
Scored probabilities column The column containing predicted probabilities (for classification). Column Name No

Scoring & Deployment Modules

Prepare models for deployment and generate scoring scripts.

1. Execute Python Script

Allows you to run custom Python code within your pipeline.

Use Case: Implement complex logic not covered by built-in modules, custom data preprocessing, or post-processing.

Parameters:

Name Description Type Required
Script file The Python script file (.py) to execute. File Path Yes
Module input 1, 2, ... Input datasets/models to the script. Dataset/Model No

Utility Modules

Commonly used modules for pipeline management and data handling.

1. Split Data

Divides a dataset into two or more subsets.

Use Case: Creating training and testing sets.

Parameters:

Name Description Type Required
Fraction of the first subset The proportion of data for the first output. Number (0.0 to 1.0) Yes
Stratified split Whether to perform stratified sampling. Boolean No