Azure OpenAI Service Pricing

Understand the costs associated with using the Azure OpenAI Service. Pricing is based on token usage, model type, and specific features like fine-tuning.

Model Pricing Tiers

The Azure OpenAI Service offers various models, each with its own pricing structure. Costs are generally calculated per 1,000 tokens for input (prompt) and output (completion).

Model Family	Model Name	Input Tokens (per 1,000)	Output Tokens (per 1,000)	Fine-tuning (per hour)
GPT-3.5	`gpt-35-turbo`	$0.0015	$0.0020	$0.0004
GPT-3.5	`gpt-35-turbo-16k`	$0.0030	$0.0040	$0.0004
GPT-4	`gpt-4` (8k context)	$0.03	$0.06	$0.0020
GPT-4	`gpt-4-32k`	$0.06	$0.12	$0.0020
Embeddings	`text-embedding-ada-002`	$0.0001	N/A	N/A

Key Pricing Considerations

Token Usage: The fundamental unit of pricing. A token is roughly 4 characters for common English text.
Input vs. Output: Input (prompt) tokens are typically cheaper than output (completion) tokens.
Model Variations: Different models (e.g., GPT-3.5, GPT-4) have distinct price points based on their capabilities and size. Larger context windows (e.g., 16k, 32k) generally incur higher costs.
Fine-Tuning: Training custom models incurs costs based on the compute time required.
Managed OpenAI Service vs. Azure OpenAI Service: While there are similarities, ensure you are referencing the Azure pricing for Azure OpenAI Service.
Regions: Prices may vary slightly by Azure region.

Example Calculation

Let's say you make a request to the gpt-35-turbo model:

Your prompt uses 500 tokens.
The model's completion uses 1,000 tokens.

Cost:

Input Cost: (500 tokens / 1000) * $0.0015 = $0.00075
Output Cost: (1000 tokens / 1000) * $0.0020 = $0.0020
Total Cost for this request: $0.00275

            # This is a conceptual representation and not executable code.
            # Actual API calls would involve SDKs or HTTP requests.

            prompt_tokens = 500
            completion_tokens = 1000
            model_input_price_per_1k = 0.0015
            model_output_price_per_1k = 0.0020

            input_cost = (prompt_tokens / 1000) * model_input_price_per_1k
            output_cost = (completion_tokens / 1000) * model_output_price_per_1k
            total_request_cost = input_cost + output_cost

            print(f"Input Cost: ${input_cost:.6f}")
            print(f"Output Cost: ${output_cost:.6f}")
            print(f"Total Cost: ${total_request_cost:.6f}")
        

Additional Resources

Pricing information is subject to change. Always refer to the official Microsoft Azure pricing page for the most up-to-date details. Usage metrics and billing are managed through the Azure portal.