Image Analysis Quickstart

This quickstart guide will walk you through the essential steps to integrate Azure Computer Vision's image analysis capabilities into your application. You'll learn how to get started with basic image analysis, including feature extraction, tagging, and captioning.

Prerequisites

An Azure subscription. If you don't have one, you can create a free account.
A Computer Vision resource created in the Azure portal. See How to create a Computer Vision resource for instructions.
The Computer Vision API endpoint and key. You can find these in the "Keys and Endpoint" section of your Computer Vision resource in the Azure portal.
A programming environment set up for your chosen language (e.g., Python, C#, Node.js).

Step 1: Install the Azure Computer Vision SDK

The easiest way to interact with the Computer Vision API is by using the official SDKs. Here's how to install the Python SDK:

$ pip install azure-cognitiveservices-vision-computervision

Step 2: Analyze an Image

Let's write some code to analyze an image. This example uses Python and a publicly accessible image URL.

Note: Replace YOUR_COMPUTER_VISION_ENDPOINT and YOUR_COMPUTER_VISION_KEY with your actual subscription key and endpoint. Keep your keys secure and do not expose them in client-side code.


from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
import os

# Replace with your Computer Vision subscription key and endpoint
endpoint = os.environ["VISION_ENDPOINT"] # Or hardcode: "YOUR_COMPUTER_VISION_ENDPOINT"
key = os.environ["VISION_KEY"]       # Or hardcode: "YOUR_COMPUTER_VISION_KEY"

# Authenticate with the Computer Vision service
credentials = CognitiveServicesCredentials(key)
computervision_client = ComputerVisionClient(endpoint, credentials)

# Image URL to analyze
image_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-vision-error-codes/master/SampleImages/red_kitchen.jpg"

print(f"Analyzing image: {image_url}")

# Call the analyze image API
# Specify features you want to retrieve (e.g., tags, description, objects)
image_analysis = computervision_client.analyze_image(
    image_url,
    visual_features=["description", "tags", "objects", "faces"]
)

# Print the results
print("Image analysis results:")

# Description
if image_analysis.description and image_analysis.description.captions:
    print("\n--- Description ---")
    for caption in image_analysis.description.captions:
        print(f"Caption: {caption.text} (Confidence: {caption.confidence:.2f})")

# Tags
if image_analysis.tags:
    print("\n--- Tags ---")
    for tag in image_analysis.tags:
        print(f"Tag: {tag.name} (Confidence: {tag.confidence:.2f})")

# Objects
if image_analysis.objects:
    print("\n--- Objects ---")
    for obj in image_analysis.objects:
        print(f"Object: {obj.object_name} (Confidence: {obj.confidence:.2f}) at bounding box: {obj.rectangle}")

# Faces
if image_analysis.faces:
    print("\n--- Faces ---")
    for face in image_analysis.faces:
        print(f"Face detected at bounding box: {face.face_rectangle}")

print("\nAnalysis complete.")

Step 3: Run the Code

Save the code above as a Python file (e.g., analyze_image.py). Make sure you've set your environment variables VISION_ENDPOINT and VISION_KEY, or replace the placeholders directly in the code (for testing purposes only).

$ python analyze_image.py

Expected Output

The output will vary depending on the image, but it will generally look like this:


Analyzing image: https://raw.githubusercontent.com/Azure-Samples/cognitive-services-vision-error-codes/master/SampleImages/red_kitchen.jpg

Image analysis results:

--- Description ---
Caption: A kitchen with a red accent and a dining table. (Confidence: 0.78)

--- Tags ---
Tag: kitchen (Confidence: 0.95)
Tag: food (Confidence: 0.88)
Tag: interior (Confidence: 0.85)
Tag: dining (Confidence: 0.82)
Tag: room (Confidence: 0.80)
Tag: home (Confidence: 0.75)

--- Objects ---
Object: table (Confidence: 0.92) at bounding box: {'x': 476, 'y': 393, 'w': 394, 'h': 311}
Object: chair (Confidence: 0.85) at bounding box: {'x': 466, 'y': 433, 'w': 195, 'h': 272}
Object: chair (Confidence: 0.82) at bounding box: {'x': 697, 'y': 430, 'w': 169, 'h': 272}

--- Faces ---
Face detected at bounding box: {'x': 615, 'y': 470, 'w': 66, 'h': 65}

Analysis complete.

Next Steps

Explore other features like Optical Character Recognition (OCR).
Learn about detecting celebrities and landmarks.
Integrate image analysis into your web or mobile application.
Read the full Computer Vision API documentation for more advanced options.