Hi everyone,
I'm trying to deploy a custom AI model to Azure Kubernetes Service (AKS) and I'm running into a few challenges. My model is built using TensorFlow and I've containerized it using Docker. The goal is to serve predictions via a REST API.
I've successfully built the Docker image and pushed it to Azure Container Registry (ACR). My current setup involves a Deployment and a Service in AKS. However, I'm unsure about the best practices for managing model artifacts, especially for large models, and how to efficiently expose the API endpoint securely.
Any advice or examples on:
- Storing and loading model weights within the container
- Setting up the ingress for secure access (SSL/TLS)
- Potential optimizations for inference speed
- Monitoring the deployed model
Would be greatly appreciated!
Thanks!