Azure Cosmos DB Python SDK: Error Handling

Introduction to Error Handling

When working with Azure Cosmos DB, it's crucial to implement robust error handling to ensure the resilience and stability of your Python applications. The Cosmos DB Python SDK raises specific exceptions that you can catch and manage to gracefully handle various scenarios, such as network issues, throttling, conflicts, and more.

Common Exception Types

The Cosmos DB Python SDK defines a set of exceptions that inherit from a base exception class. Understanding these exceptions is the first step to effective error management.

`azure.cosmos.exceptions.CosmosHttpResponseError`

This is the most common exception raised for HTTP-related errors returned by the Cosmos DB service. It provides details about the status code, sub-status code, and message returned by the service.

CosmosHttpResponseError

This exception often indicates issues like:

400 Bad Request: Invalid request payload or syntax.
401 Unauthorized: Authentication or authorization failure.
404 Not Found: Resource does not exist.
409 Conflict: Resource already exists or a conflict occurred during an update (e.g., optimistic concurrency).
429 Too Many Requests: Throttling due to exceeding RU/s.
500 Internal Server Error: An issue on the Cosmos DB service side.

`azure.cosmos.exceptions.CosmosClientError`

This exception is raised for errors originating from the client-side, such as issues with the SDK configuration or local validation problems.

`azure.cosmos.exceptions.RetryError`

Raised when a request cannot be completed after multiple retry attempts, often due to transient network issues or persistent throttling.

Handling Throttling (429 Errors)

Throttling is a common occurrence when your application exceeds the provisioned Request Units per second (RU/s). The SDK has built-in retry logic, but you can also implement custom handling.

Best Practice: The SDK's default retry policy is generally sufficient. If you need more control, you can configure the retry policy when creating the CosmosClient.

Example: Catching and Inspecting a 429 Error


from azure.cosmos import CosmosClient
from azure.cosmos.exceptions import CosmosHttpResponseError

# Assume client is already initialized
# client = CosmosClient(COSMOS_ENDPOINT, credential=COSMOS_KEY)

try:
    # Your Cosmos DB operation (e.g., creating a container, upserting an item)
    container.upsert_item({"id": "item1", "name": "Sample Item"})
except CosmosHttpResponseError as e:
    if e.status_code == 429:
        print(f"Throttling detected: {e.headers.get('x-ms-retry-after-ms')} ms")
        # Implement custom retry logic or backoff strategy here
    else:
        print(f"An unexpected HTTP error occurred: {e.status_code}")
        print(f"Error message: {e.message}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Handling Conflicts (409 Errors)

Conflict errors typically occur when you try to create a resource that already exists or when using optimistic concurrency control and the item has been modified by another process since you last read it.

Example: Handling Item Conflict During Upsert


from azure.cosmos import CosmosClient
from azure.cosmos.exceptions import CosmosHttpResponseError

# Assume client and container are initialized
# container = client.get_database_client("mydatabase").get_container_client("mycontainer")

item_to_upsert = {"id": "unique-item-id", "value": "initial"}

try:
    container.upsert_item(item_to_upsert)
    print("Item upserted successfully.")
except CosmosHttpResponseError as e:
    if e.status_code == 409:
        print(f"Conflict detected for item ID '{item_to_upsert['id']}'. It might already exist or was modified.")
        # You might want to read the item again and re-apply your changes,
        # or handle this as a business logic decision.
    else:
        print(f"An unexpected HTTP error occurred: {e.status_code}")
        print(f"Error message: {e.message}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Handling Other HTTP Errors

It's good practice to have a general catch-all for CosmosHttpResponseError to handle unexpected status codes.


from azure.cosmos.exceptions import CosmosHttpResponseError

try:
    # Your Cosmos DB operation
    pass
except CosmosHttpResponseError as e:
    print(f"Received a Cosmos DB HTTP error:")
    print(f"  Status Code: {e.status_code}")
    print(f"  Sub-Status Code: {e.sub_status_code}")
    print(f"  Message: {e.message}")
    print(f"  Headers: {e.headers}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Retries and Idempotency

Many Cosmos DB operations are designed to be idempotent, meaning they can be called multiple times with the same input without changing the result beyond the initial application. This is crucial for building reliable systems, especially when dealing with retries.

Read operations are naturally idempotent.
Write operations (Create, Upsert, Replace) are also generally idempotent when using unique identifiers. If you attempt to create an item with an ID that already exists, you'll get a conflict (409), which you can handle. Upsert is particularly useful here.
Delete operations are idempotent.

When implementing custom retry logic, ensure your operations are idempotent to avoid unintended side effects.

Best Practices Summary

Use try...except blocks: Always wrap your Cosmos DB operations in try-except blocks.
Catch specific exceptions: Catch CosmosHttpResponseError and other SDK-specific exceptions.
Inspect error details: Log or examine status_code, sub_status_code, and message for detailed diagnostics.
Handle common errors gracefully: Implement specific logic for 429 (throttling) and 409 (conflict) errors.
Leverage SDK retry policies: Understand and configure the built-in retry mechanisms.
Design for idempotency: Ensure your write operations can be safely retried.
Log extensively: Record errors and retry attempts for debugging and monitoring.