Data Modeling Samples for Azure Cosmos DB
This page provides practical examples and sample scenarios to illustrate effective data modeling techniques in Azure Cosmos DB. Understanding how to structure your data is crucial for optimizing performance, cost, and developer productivity.
Sample Scenario 1: E-commerce Product Catalog
Option A: Single Container (Denormalized)
For a simple product catalog where product details are frequently accessed together, a single container with denormalized data can be very efficient. Each item document contains all relevant information.
Product Document Example:
{
"id": "prod-12345",
"partitionKey": "electronics",
"type": "product",
"name": "Ultra HD Smart TV",
"brand": "SpectraView",
"description": "A 55-inch 4K UHD Smart TV with HDR support and built-in streaming apps.",
"price": {
"amount": 899.99,
"currency": "USD"
},
"specifications": {
"displaySize": "55 inch",
"resolution": "3840x2160",
"smartFeatures": ["Netflix", "YouTube", "Hulu"]
},
"categories": ["electronics", "televisions", "smart-home"],
"reviews": [
{"userId": "user-abc", "rating": 5, "comment": "Amazing picture quality!"},
{"userId": "user-def", "rating": 4, "comment": "Good value for the price."}
],
"createdAt": "2023-10-27T10:00:00Z",
"updatedAt": "2023-10-27T11:30:00Z"
}
Partition Key Strategy: Use a logical partition key like type
or category
to distribute read and write operations evenly across partitions.
Option B: Multiple Containers (Normalized)
If product details, specifications, and reviews are managed and accessed separately, or if certain data grows very large (e.g., extensive reviews), normalizing into multiple containers can be beneficial.
Products Container (Product Header):
{
"id": "prod-12345",
"partitionKey": "electronics",
"type": "product-header",
"name": "Ultra HD Smart TV",
"brand": "SpectraView",
"price": {
"amount": 899.99,
"currency": "USD"
},
"categories": ["electronics", "televisions", "smart-home"],
"createdAt": "2023-10-27T10:00:00Z",
"updatedAt": "2023-10-27T10:05:00Z"
}
Specifications Container:
{
"id": "spec-12345",
"productId": "prod-12345",
"partitionKey": "prod-12345",
"type": "product-spec",
"specifications": {
"displaySize": "55 inch",
"resolution": "3840x2160",
"refreshRate": "120Hz"
},
"lastUpdated": "2023-10-27T10:10:00Z"
}
Reviews Container:
{
"id": "rev-abcde",
"productId": "prod-12345",
"userId": "user-abc",
"partitionKey": "prod-12345",
"type": "product-review",
"rating": 5,
"comment": "Amazing picture quality!",
"reviewDate": "2023-10-27T11:30:00Z"
}
Modeling Relationships: Use direct references (e.g., productId
) and aggregate with Cosmos DB queries or application logic.
Sample Scenario 2: User Profiles and Activity Feed
Managing user profiles and their associated activity feeds requires careful consideration of access patterns.
Option A: Single Container (User Centric)
A common approach is to store user profiles and recent activity within the same container, partitioned by user ID.
User Profile Document:
{
"id": "user-98765",
"partitionKey": "user-98765",
"type": "user",
"username": "jane.doe",
"displayName": "Jane Doe",
"email": "jane.doe@example.com",
"profilePictureUrl": "https://example.com/avatars/jane.png",
"followersCount": 1500,
"followingCount": 300,
"lastLogin": "2023-10-27T12:00:00Z"
}
Activity Feed Document:
{
"id": "activity-xyz123",
"userId": "user-98765",
"partitionKey": "user-98765",
"type": "activity",
"timestamp": "2023-10-27T11:55:00Z",
"activityType": "new_post",
"details": {
"postId": "post-abc",
"postTitle": "My thoughts on cloud computing"
},
"likesCount": 50
}
Querying: Efficiently retrieve a user's profile and their most recent activities using a single query filtering by userId
and ordering by timestamp
DESC.
Option B: Two Containers (Activity Feed Optimized)
If the activity feed is extremely high volume and you need to optimize for querying a global feed or very recent activity across many users, consider a separate container for activities.
Users Container:
{
"id": "user-98765",
"partitionKey": "user-98765",
"type": "user",
"username": "jane.doe",
"displayName": "Jane Doe",
"profilePictureUrl": "https://example.com/avatars/jane.png"
}
Activity Log Container (Global or Time-Based Partitioning):
{
"id": "activity-xyz123",
"userId": "user-98765",
"username": "jane.doe",
"partitionKey": "2023-10-27",
"type": "activity",
"timestamp": "2023-10-27T11:55:00Z",
"activityType": "new_post",
"details": {
"postId": "post-abc",
"postTitle": "My thoughts on cloud computing"
}
}
Partitioning Strategy: In the Activity Log container, partitioning by date (e.g., YYYY-MM-DD
) can optimize for time-based queries or pruning old data. If you need to see a user's recent global feed, you would still need to join or look up user information.
Key Considerations for Data Modeling:
- Access Patterns: Design your models around how your application reads and writes data.
- Partitioning: Choose logical partitions that distribute your workload evenly and efficiently.
- Denormalization vs. Normalization: Balance the trade-offs between read performance (denormalization) and data consistency/storage efficiency (normalization).
- Joins: Cosmos DB supports `JOIN` operations in SQL queries, but they can be more expensive than direct lookups. Aim to denormalize where possible to avoid them.
- Container Size: Understand the limits and performance implications of container and item sizes.