Model Selection
Choosing the right model can significantly impact cost and quality.Task-Based Recommendations
| Task | Recommended Models | Reasoning |
|---|---|---|
| Simple Q&A | gpt-5-mini, gemini-2.5-flash | Fast, cheap, good enough |
| Complex reasoning | gpt-5.4, claude-opus-4-6, deepseek-r1 | Better logic and planning |
| Coding | claude-sonnet-4-6, gpt-4o, deepseek-v3.2 | Optimized for code |
| Creative writing | claude-sonnet-4-6, gpt-4o | Better prose quality |
| Vision/Images | gpt-4o, claude-sonnet-4-6, gemini-2.5-flash | Native vision support |
| Long context | gemini-2.5-pro, claude-sonnet-4-6 | 1M+ token windows |
| Cost-sensitive | gpt-5-mini, gemini-2.5-flash, deepseek-v3.2 | Best value |
Cost Tiers
Cost Optimization
1. Use Smaller Models First
2. Set max_tokens
Always set a reasonablemax_tokens limit:
3. Optimize Prompts
4. Batch Similar Requests
Performance Optimization
5. Use Streaming for UX
Streaming improves perceived performance:6. Choose Fast Models for Interactive Use
| Use Case | Recommended | Latency |
|---|---|---|
| Chat UI | gpt-5-mini, gemini-2.5-flash | ~200ms first token |
| Tab completion | claude-haiku-4-5 | ~150ms first token |
| Background processing | gpt-4o, claude-sonnet-4-6 | ~500ms first token |
7. Set Timeouts
Reliability
8. Implement Retries
9. Handle Errors Gracefully
10. Use Fallback Models
Security
11. Protect API Keys
12. Validate User Input
13. Set API Key Limits
Create separate API keys with spending limits for:- Development/testing
- Production
- Different applications
Monitoring
14. Track Usage
Check your dashboard regularly for:- Token usage by model
- Cost breakdown
- Cache hit rates
- Error rates
15. Log Important Metrics
16. Set Up Alerts
Configure low balance alerts in your dashboard to avoid service interruption.Checklist
Cost optimization
Cost optimization
- Using appropriate model for each task
- Setting max_tokens limits
- Prompts are concise
- Caching enabled where appropriate
- Batching similar requests
Performance
Performance
- Streaming for interactive UX
- Fast models for real-time use
- Timeouts configured
Reliability
Reliability
- Retry logic implemented
- Error handling in place
- Fallback models configured
Security
Security
- API keys in environment variables
- Input validation
- Separate keys for dev/prod
- Spending limits set