An Azure Machine Learning workspace is the top-level resource for Azure Machine Learning, providing a centralized place to easily work with all the artifacts you create as you use Azure Machine Learning. The artifacts include:
- Notebooks
- The managed compute instances that data scientists use for distributed training
- Azure Machine Learning Pipelines
- Models
- Datasets
- Environments
What is an Azure Machine Learning Workspace?
A workspace provides a collaborative environment where data scientists and machine learning engineers can manage their Azure Machine Learning projects. It acts as a central hub for all your machine learning activities, from data preparation and experimentation to model training, deployment, and monitoring.
Key Components of a Workspace:
-
Compute: Managed compute instances and clusters for training and deployment.
- Compute Instances: Managed cloud-based workstations for development and experimentation.
- Compute Clusters: Scalable clusters of VMs for distributed training jobs.
- Inference Clusters: Kubernetes clusters for deploying models as real-time web services.
- Datastores: Securely store and manage connections to your data storage services (e.g., Azure Blob Storage, Azure Data Lake Storage).
- Datasets: Lightweight references to data in your datastores. They simplify data management and enable versioning and tracking.
- Environments: Encapsulate the environment needed to run your Python scripts, including Python packages, environment variables, and Python package configurations.
- Experiments: Used to group runs and track their metrics and outputs.
- Runs: Represent a single execution of a script or pipeline, capturing all logged metrics, parameters, and outputs.
- Pipelines: Define reusable ML workflows to automate complex ML tasks.
- Models: Registered models in the workspace, along with their metadata and versions.
- Endpoints: Managed endpoints for real-time and batch inference.
Creating a Workspace
You can create an Azure Machine Learning workspace using various methods:
- Azure Portal: A user-friendly graphical interface.
- Azure CLI: Command-line interface for scripting and automation.
- SDKs: Python SDK for programmatic creation and management.
Example using Azure CLI:
az ml workspace create --name myworkspace --resource-group myresourcegroup --location eastus
Example using Python SDK:
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
resource_group='myresourcegroup',
location='eastus',
subscription_id='your-subscription-id',
create_resource_group=True,
exist_ok=True)
print(f"Workspace created: {ws.name}")
Note: When you create a workspace, it automatically creates and links several Azure resources, including an Azure Storage account, an Azure Application Insights instance, and an Azure Key Vault.
Benefits of Using a Workspace
- Centralized Management: All ML assets are organized in one place.
- Collaboration: Facilitates teamwork among data scientists and engineers.
- Reproducibility: Tracks experiments, runs, models, and datasets for reproducible results.
- Scalability: Leverages Azure's scalable compute resources.
- Security: Integrates with Azure security features for data and code protection.