Data Modeling Samples for Azure Cosmos DB

This page provides practical examples and sample scenarios to illustrate effective data modeling techniques in Azure Cosmos DB. Understanding how to structure your data is crucial for optimizing performance, cost, and developer productivity.

Sample Scenario 1: E-commerce Product Catalog

Option A: Single Container (Denormalized)

For a simple product catalog where product details are frequently accessed together, a single container with denormalized data can be very efficient. Each item document contains all relevant information.

Product Document Example:

{
    "id": "prod-12345",
    "partitionKey": "electronics",
    "type": "product",
    "name": "Ultra HD Smart TV",
    "brand": "SpectraView",
    "description": "A 55-inch 4K UHD Smart TV with HDR support and built-in streaming apps.",
    "price": {
        "amount": 899.99,
        "currency": "USD"
    },
    "specifications": {
        "displaySize": "55 inch",
        "resolution": "3840x2160",
        "smartFeatures": ["Netflix", "YouTube", "Hulu"]
    },
    "categories": ["electronics", "televisions", "smart-home"],
    "reviews": [
        {"userId": "user-abc", "rating": 5, "comment": "Amazing picture quality!"},
        {"userId": "user-def", "rating": 4, "comment": "Good value for the price."}
    ],
    "createdAt": "2023-10-27T10:00:00Z",
    "updatedAt": "2023-10-27T11:30:00Z"
}

Partition Key Strategy: Use a logical partition key like type or category to distribute read and write operations evenly across partitions.

Option B: Multiple Containers (Normalized)

If product details, specifications, and reviews are managed and accessed separately, or if certain data grows very large (e.g., extensive reviews), normalizing into multiple containers can be beneficial.

Products Container (Product Header):

{
    "id": "prod-12345",
    "partitionKey": "electronics",
    "type": "product-header",
    "name": "Ultra HD Smart TV",
    "brand": "SpectraView",
    "price": {
        "amount": 899.99,
        "currency": "USD"
    },
    "categories": ["electronics", "televisions", "smart-home"],
    "createdAt": "2023-10-27T10:00:00Z",
    "updatedAt": "2023-10-27T10:05:00Z"
}

Specifications Container:

{
    "id": "spec-12345",
    "productId": "prod-12345",
    "partitionKey": "prod-12345",
    "type": "product-spec",
    "specifications": {
        "displaySize": "55 inch",
        "resolution": "3840x2160",
        "refreshRate": "120Hz"
    },
    "lastUpdated": "2023-10-27T10:10:00Z"
}

Reviews Container:

{
    "id": "rev-abcde",
    "productId": "prod-12345",
    "userId": "user-abc",
    "partitionKey": "prod-12345",
    "type": "product-review",
    "rating": 5,
    "comment": "Amazing picture quality!",
    "reviewDate": "2023-10-27T11:30:00Z"
}

Modeling Relationships: Use direct references (e.g., productId) and aggregate with Cosmos DB queries or application logic.

Sample Scenario 2: User Profiles and Activity Feed

Managing user profiles and their associated activity feeds requires careful consideration of access patterns.

Option A: Single Container (User Centric)

A common approach is to store user profiles and recent activity within the same container, partitioned by user ID.

User Profile Document:

{
    "id": "user-98765",
    "partitionKey": "user-98765",
    "type": "user",
    "username": "jane.doe",
    "displayName": "Jane Doe",
    "email": "jane.doe@example.com",
    "profilePictureUrl": "https://example.com/avatars/jane.png",
    "followersCount": 1500,
    "followingCount": 300,
    "lastLogin": "2023-10-27T12:00:00Z"
}

Activity Feed Document:

{
    "id": "activity-xyz123",
    "userId": "user-98765",
    "partitionKey": "user-98765",
    "type": "activity",
    "timestamp": "2023-10-27T11:55:00Z",
    "activityType": "new_post",
    "details": {
        "postId": "post-abc",
        "postTitle": "My thoughts on cloud computing"
    },
    "likesCount": 50
}

Querying: Efficiently retrieve a user's profile and their most recent activities using a single query filtering by userId and ordering by timestamp DESC.

Option B: Two Containers (Activity Feed Optimized)

If the activity feed is extremely high volume and you need to optimize for querying a global feed or very recent activity across many users, consider a separate container for activities.

Users Container:

{
    "id": "user-98765",
    "partitionKey": "user-98765",
    "type": "user",
    "username": "jane.doe",
    "displayName": "Jane Doe",
    "profilePictureUrl": "https://example.com/avatars/jane.png"
}

Activity Log Container (Global or Time-Based Partitioning):

{
    "id": "activity-xyz123",
    "userId": "user-98765",
    "username": "jane.doe",
    "partitionKey": "2023-10-27",
    "type": "activity",
    "timestamp": "2023-10-27T11:55:00Z",
    "activityType": "new_post",
    "details": {
        "postId": "post-abc",
        "postTitle": "My thoughts on cloud computing"
    }
}

Partitioning Strategy: In the Activity Log container, partitioning by date (e.g., YYYY-MM-DD) can optimize for time-based queries or pruning old data. If you need to see a user's recent global feed, you would still need to join or look up user information.

Key Considerations for Data Modeling: