Create Chat Completion

Request Body

model

string

required

ID of the model to use. See Models for available options.

messages

array

required

A list of messages comprising the conversation.Each message object contains:

role (string): system, user, or assistant
content (string | array): The message content

When content is an array, AI Sonar supports structured multimodal blocks for compatible models:

text: { "type": "text", "text": "..." }
image: { "type": "image_url", "image_url": { "url": "https://..." } }
video: { "type": "video_url", "video_url": { "url": "https://..." } }
audio: { "type": "audio_url", "audio_url": { "url": "https://..." } }

For multimodal production traffic, prefer public https URLs. AI Sonar will translate these media blocks into the provider-specific request shape required by the routed physical model.

temperature

number

default:"1"

Sampling temperature between 0 and 2. Higher values make output more random.

max_tokens

integer

Maximum number of tokens to generate.

stream

boolean

default:"false"

If true, partial message deltas will be sent as SSE events.

stream_options

object

Options for streaming. Set include_usage: true to receive token usage in stream chunks.

top_p

number

default:"1"

Nucleus sampling parameter. We recommend altering this or temperature, not both.

frequency_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize repeated tokens.

presence_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens already in the text.

stop

string | array

Up to 4 sequences where the API will stop generating tokens.

tools

array

A list of tools the model may call (function calling).

tool_choice

string | object

Controls how the model uses tools. Options: auto, none, required, or a specific tool object.

parallel_tool_calls

boolean

default:"true"

Whether to enable parallel function calling. Set to false to call functions sequentially.

max_completion_tokens

integer

Maximum tokens for the completion. Alternative to max_tokens, useful for newer reasoning-enabled model families.

reasoning_effort

string

Reasoning effort for reasoning-enabled models. Options: low, medium, high.

seed

integer

Random seed for deterministic sampling.

integer

default:"1"

Number of completions to generate (1-128).

logprobs

boolean

Whether to return log probabilities.

top_logprobs

integer

Number of top log probabilities to return (0-20). Requires logprobs: true.

top_k

integer

Top-K sampling parameter (for Anthropic/Gemini models).

response_format

object

Response format specification. Use {"type": "json_object"} for JSON mode. Treat {"type": "json_schema", "json_schema": {...}} as a best-effort path that depends on the selected model and routed behavior.

logit_bias

object

Modify the likelihood of specified tokens appearing. Map token IDs (as strings) to bias values from -100 to 100.

user

string

A unique identifier representing your end-user for abuse monitoring.

Response

string

Unique identifier for the completion.

object

string

Always chat.completion.

created

integer

Unix timestamp of when the completion was created.

model

string

The model used for completion.

choices

array

List of completion choices.Each choice contains:

index (integer): Index of the choice
message (object): The generated message
finish_reason (string): Why the model stopped (stop, length, tool_calls)

usage

object

Token usage statistics.

prompt_tokens (integer): Tokens in the prompt
completion_tokens (integer): Tokens in the completion
total_tokens (integer): Total tokens used

curl -X POST "https://api.aisonar.dev/v1/chat/completions" \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

Multimodal Example

{
  "model": "gemini-2.5-pro",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe this video briefly." },
        { "type": "video_url", "video_url": { "url": "https://example.com/demo.mp4" } }
      ]
    }
  ],
  "max_tokens": 64
}

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 9,
    "total_tokens": 29
  }
}

​Request Body

​Response

​Multimodal Example

Request Body

Response

Multimodal Example