Azure ML Basics - MSDN Documentation

Azure Machine Learning Basics

Welcome to the foundational concepts of Azure Machine Learning (Azure ML). This article provides an overview of what Azure ML is, its core components, and how it can help you build, train, and deploy machine learning models at scale.

What is Azure Machine Learning?

Azure Machine Learning is a cloud-based service that enables you to develop, train, and deploy machine learning models rapidly. It offers a comprehensive environment for data scientists and developers to manage their ML workflows, from data preparation to production deployment. Azure ML empowers organizations to leverage the power of AI to solve complex business problems.

Key Concepts and Components

Azure ML is built around several key concepts and components that work together to form a robust ML platform:

1. Azure Machine Learning Workspace

The Azure ML Workspace is the central hub for all your Azure ML activities. It provides a secure and collaborative environment for managing your datasets, experiments, models, and endpoints. When you create an Azure ML workspace, it provisions associated Azure resources like Azure Storage, Azure Key Vault, and an Azure Container Registry.

2. Datasets and Data Stores

Data is the foundation of any machine learning project. Azure ML allows you to connect to various data sources through Data Stores. These stores act as pointers to your data residing in Azure storage services (like Azure Blob Storage, Azure Data Lake Storage) or on-premises. You can then create Datasets from these data stores, which represent specific versions of your data, making it easier to version and track your data for reproducible experiments.

Note: Datasets can be tabular (for structured data) or file-based (for unstructured data like images or text documents).

3. Experiments

An Experiment in Azure ML is a container for your training runs. Each time you train a model, you submit a run to an experiment. Experiments help you organize your training processes, track metrics (like accuracy, loss), log parameters, and store the resulting model artifacts. This makes it easy to compare different training attempts and identify the best-performing models.

4. Compute Resources

Training machine learning models can be computationally intensive. Azure ML provides a flexible compute management system. You can create and manage various types of compute targets:

Compute Instances: Cloud-based workstations for development and testing.
Compute Clusters: Scalable clusters of VMs for training large models or batch inference.
Inference Clusters: Kubernetes clusters for deploying models for real-time inference.
Attached Compute: Use existing Azure compute resources like Azure Databricks or Azure HDInsight.

5. Models

Once you have trained a model, you register it with Azure ML as a Model. Registered models are versioned, making it easy to manage different iterations of your trained model. These registered models can then be deployed as web services for inference.

6. Endpoints (Deployments)

An Endpoint (or deployment) is how you make your trained model available for predictions. Azure ML supports two types of endpoints:

Real-time Endpoints: For low-latency, synchronous requests (e.g., scoring a single customer transaction).
Batch Endpoints: For scoring large volumes of data asynchronously.

Tip: Choosing the right endpoint type depends on your application's latency and throughput requirements.

The Azure ML Workflow

A typical Azure ML workflow involves the following steps:

Set up your Workspace: Create an Azure ML Workspace in the Azure portal.
Connect to Data: Register your data sources using Data Stores and create Datasets.
Choose Compute: Select or create appropriate compute targets for training and deployment.
Train Model: Write your training script (e.g., using Python with Scikit-learn, TensorFlow, PyTorch) and submit it as an Experiment to your chosen compute target. Log metrics and parameters.
Register Model: Once satisfied with a training run, register the resulting model in your workspace.
Deploy Model: Deploy the registered model to a real-time or batch endpoint.
Consume Endpoint: Integrate the deployed endpoint into your applications to get predictions.

Azure ML offers multiple ways to interact with the platform, including the Azure ML studio (a web-based UI), the Azure ML Python SDK, and the Azure CLI extension.

This article provides a high-level introduction. The following articles will delve deeper into setting up your workspace, managing data, training sophisticated models, and deploying them effectively.