Azure OpenAI Service Pricing
Understand the costs associated with using the Azure OpenAI Service. Pricing is based on token usage, model type, and specific features like fine-tuning.
Model Pricing Tiers
The Azure OpenAI Service offers various models, each with its own pricing structure. Costs are generally calculated per 1,000 tokens for input (prompt) and output (completion).
| Model Family |
Model Name |
Input Tokens (per 1,000) |
Output Tokens (per 1,000) |
Fine-tuning (per hour) |
| GPT-3.5 |
gpt-35-turbo |
$0.0015 |
$0.0020 |
$0.0004 |
| GPT-3.5 |
gpt-35-turbo-16k |
$0.0030 |
$0.0040 |
$0.0004 |
| GPT-4 |
gpt-4 (8k context) |
$0.03 |
$0.06 |
$0.0020 |
| GPT-4 |
gpt-4-32k |
$0.06 |
$0.12 |
$0.0020 |
| Embeddings |
text-embedding-ada-002 |
$0.0001 |
N/A |
N/A |
Key Pricing Considerations
- Token Usage: The fundamental unit of pricing. A token is roughly 4 characters for common English text.
- Input vs. Output: Input (prompt) tokens are typically cheaper than output (completion) tokens.
- Model Variations: Different models (e.g., GPT-3.5, GPT-4) have distinct price points based on their capabilities and size. Larger context windows (e.g., 16k, 32k) generally incur higher costs.
- Fine-Tuning: Training custom models incurs costs based on the compute time required.
- Managed OpenAI Service vs. Azure OpenAI Service: While there are similarities, ensure you are referencing the Azure pricing for Azure OpenAI Service.
- Regions: Prices may vary slightly by Azure region.
Example Calculation
Let's say you make a request to the gpt-35-turbo model:
- Your prompt uses 500 tokens.
- The model's completion uses 1,000 tokens.
Cost:
- Input Cost: (500 tokens / 1000) * $0.0015 = $0.00075
- Output Cost: (1000 tokens / 1000) * $0.0020 = $0.0020
- Total Cost for this request: $0.00275
# This is a conceptual representation and not executable code.
# Actual API calls would involve SDKs or HTTP requests.
prompt_tokens = 500
completion_tokens = 1000
model_input_price_per_1k = 0.0015
model_output_price_per_1k = 0.0020
input_cost = (prompt_tokens / 1000) * model_input_price_per_1k
output_cost = (completion_tokens / 1000) * model_output_price_per_1k
total_request_cost = input_cost + output_cost
print(f"Input Cost: ${input_cost:.6f}")
print(f"Output Cost: ${output_cost:.6f}")
print(f"Total Cost: ${total_request_cost:.6f}")
Additional Resources
Pricing information is subject to change. Always refer to the official Microsoft Azure pricing page for the most up-to-date details. Usage metrics and billing are managed through the Azure portal.