Chat Completions Api
xAI API (Grok) · xAI
Foundation Model & API — xAI (Grok)
xAI evals — Chat Completions API (relift v3 InfraRed)
About xAI
xAI builds the Grok foundation-model family and the xAI API — OpenAI-compatible chat completions, function calling, Live Search / DeepSearch real-time web grounding, Grok Vision multimodal inputs, reasoning models with a thinking-effort budget, and Grok / Aurora image generation.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator points an OpenAI SDK client at the xAI API by setting base_url='https://api.x.ai/v1' and model='grok-4'. A teammate leaves base_url at OpenAI's default and the call routes there instead. | xAI exposes an OpenAI-compatible /v1/chat/completions endpoint at https://api.x.ai/v1. Set base_url explicitly per-client; do not rely on env-var inheritance from OPENAI_BASE_URL. Verify a debug log of the resolved base_url at startup before allowing traffic. | Pass / FailAi Platformhigh |
| 02 | Agent sets max_tokens=256 for a long-form Grok answer. Response returns with choices[0].finish_reason='length' and a visibly truncated last sentence. | Detect finish_reason='length' and either (a) surface to caller as partial, or (b) issue a continuation request with the truncated assistant message appended and a 'continue' user turn. Never present the truncated answer as complete. Distinguish from finish_reason='stop'. | Pass / FailAi Platformcritical |
| 03 | Agent places a 2k-token system prompt followed by 40 turns of user/assistant history. System instructions are repeated as a user reminder every 5 turns 'to keep Grok on-task.' | Place system instructions exactly once in messages[0] with role='system'. Do not duplicate as periodic user reminders — that mutates the user channel and degrades instruction-following measurement. If reinforcement is needed, include it in the system message. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Xai
- Ai Platform
- Chat Completions Api
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.