Gremlin API
Azure Cosmos DB offers a Gremlin API, a graph database service that supports the Apache TinkerPop Gremlin query language. This API allows you to build highly scalable and performant graph applications.
Getting Started
To start using the Gremlin API, you need to create an Azure Cosmos DB account with the Gremlin API enabled. Once your account is created, you can connect to your database using a Gremlin-compatible driver.
Key Concepts
- Vertices: Represent entities in your graph.
- Edges: Represent relationships between vertices.
- Properties: Key-value pairs associated with vertices and edges.
- Traversal: The process of navigating the graph using Gremlin.
Basic Gremlin Queries
Here are some fundamental Gremlin queries:
// Add a vertex
g.addV('person').property('name', 'Alice')
// Add another vertex
g.addV('person').property('name', 'Bob')
// Add an edge between two vertices
g.V().has('name', 'Alice').addE('knows').to(g.V().has('name', 'Bob'))
// Traverse the graph to find who Alice knows
g.V().has('name', 'Alice').out('knows').values('name')
Connecting with Gremlin Drivers
Azure Cosmos DB Gremlin API is compatible with various Apache TinkerPop Gremlin drivers. The connection string typically includes the Gremlin endpoint and a primary key.
Note: Ensure you use the Gremlin endpoint, not the SQL API endpoint, when connecting.
Example Connection (Python)
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
endpoint = "wss://your-cosmosdb-account.gremlin.cosmos.azure.com:443/"
primary_key = "YOUR_PRIMARY_KEY" # Replace with your actual key
connection = DriverRemoteConnection(endpoint, 'g', traversal_source='g', username='/dbs/your_database/colls/your_collection', password=primary_key)
g = connection.traversal()
# Example traversal
result = g.V().hasLabel("person").valueMap("name").toList()
print(result)
connection.close()
Best Practices
- Indexing: Understand how indexing works in Cosmos DB for Gremlin to optimize query performance.
- Partitioning: Choose an appropriate partition key for your graph data to distribute load effectively.
- Request Units (RUs): Monitor and manage your RUs to ensure performance and control costs.