Azure Community Forums

Deploying AI Model on AKS

JD

Hi everyone,

I'm trying to deploy a custom AI model to Azure Kubernetes Service (AKS) and I'm running into a few challenges. My model is built using TensorFlow and I've containerized it using Docker. The goal is to serve predictions via a REST API.

I've successfully built the Docker image and pushed it to Azure Container Registry (ACR). My current setup involves a Deployment and a Service in AKS. However, I'm unsure about the best practices for managing model artifacts, especially for large models, and how to efficiently expose the API endpoint securely.

Any advice or examples on:

  • Storing and loading model weights within the container
  • Setting up the ingress for secure access (SSL/TLS)
  • Potential optimizations for inference speed
  • Monitoring the deployed model

Would be greatly appreciated!

Thanks!

AS

Hi John,

Deploying AI models on AKS is a common scenario. For model artifacts, consider these options:

1. Include in Docker Image: Simple for small models. Build them directly into the image.

2. Azure Blob Storage: For larger models, store them in Blob Storage and download them during container startup. You can use `azcopy` or mount the blob storage as a volume (using CSI driver).

3. Azure Files: Similar to Blob Storage, but can be mounted as a network drive. Good for shared access if needed.

For your REST API, using Azure Application Gateway or Azure API Management as an Ingress Controller for AKS is a good choice for SSL termination and advanced routing.

Here's a snippet of how you might download a model from Blob Storage on container start:

# Example script in your Dockerfile's CMD or ENTRYPOINT
                #!/bin/bash
                AZURE_STORAGE_ACCOUNT="yourstorageaccount"
                AZURE_CONTAINER_NAME="models"
                MODEL_BLOB_NAME="my_tensorflow_model.h5"
                LOCAL_MODEL_PATH="/app/models/my_tensorflow_model.h5"

                mkdir -p /app/models
                azcopy copy "https://%AZURE_STORAGE_ACCOUNT%.blob.core.windows.net/%AZURE_CONTAINER_NAME%/%MODEL_BLOB_NAME%" "%LOCAL_MODEL_PATH%" --content-type "application/octet-stream"

                # Now load the model from %LOCAL_MODEL_PATH% in your Python app
                python your_app.py
                

For monitoring, Azure Monitor integrated with AKS provides metrics and logs. You can also instrument your application to send custom metrics for inference latency and throughput.

PK

Following up on Alice's points regarding ingress:

If you're using Azure API Management, it can handle authentication, rate limiting, and API versioning, which are crucial for production AI services. You can expose your AKS service behind API Management.

For inference speed, consider:

  • Hardware Acceleration: If your model benefits from it, explore using GPUs on AKS nodes.
  • Model Optimization: Techniques like quantization, pruning, or using optimized runtimes (e.g., ONNX Runtime) can significantly improve performance.
  • Batching: If your API can handle multiple requests at once, batching inference calls can improve throughput.

Also, ensure your AKS cluster nodes are sized appropriately for your model's resource requirements (CPU, memory, GPU).

Reply to this thread