Request Body
A list of messages comprising the conversation.Each message object contains:
role(string):system,user, orassistantcontent(string | array): The message content
content is an array, AI Sonar supports structured multimodal blocks for compatible models:- text:
{ "type": "text", "text": "..." } - image:
{ "type": "image_url", "image_url": { "url": "https://..." } } - video:
{ "type": "video_url", "video_url": { "url": "https://..." } } - audio:
{ "type": "audio_url", "audio_url": { "url": "https://..." } }
https URLs. AI Sonar will translate these media blocks into the provider-specific request shape required by the routed physical model.Sampling temperature between 0 and 2. Higher values make output more random.
Maximum number of tokens to generate.
If true, partial message deltas will be sent as SSE events.
Options for streaming. Set
include_usage: true to receive token usage in stream chunks.Nucleus sampling parameter. We recommend altering this or temperature, not both.
Number between -2.0 and 2.0. Positive values penalize repeated tokens.
Number between -2.0 and 2.0. Positive values penalize tokens already in the text.
Up to 4 sequences where the API will stop generating tokens.
A list of tools the model may call (function calling).
Controls how the model uses tools. Options:
auto, none, required, or a specific tool object.Whether to enable parallel function calling. Set to false to call functions sequentially.
Maximum tokens for the completion. Alternative to
max_tokens, useful for newer reasoning-enabled model families.Reasoning effort for reasoning-enabled models. Options:
low, medium, high.Random seed for deterministic sampling.
Number of completions to generate (1-128).
Whether to return log probabilities.
Number of top log probabilities to return (0-20). Requires
logprobs: true.Top-K sampling parameter (for Anthropic/Gemini models).
Response format specification. Use
{"type": "json_object"} for JSON mode. Treat {"type": "json_schema", "json_schema": {...}} as a best-effort path that depends on the selected model and routed behavior.Modify the likelihood of specified tokens appearing. Map token IDs (as strings) to bias values from -100 to 100.
A unique identifier representing your end-user for abuse monitoring.
Response
Unique identifier for the completion.
Always
chat.completion.Unix timestamp of when the completion was created.
The model used for completion.
List of completion choices.Each choice contains:
index(integer): Index of the choicemessage(object): The generated messagefinish_reason(string): Why the model stopped (stop,length,tool_calls)
Token usage statistics.
prompt_tokens(integer): Tokens in the promptcompletion_tokens(integer): Tokens in the completiontotal_tokens(integer): Total tokens used