Live Search And Deepsearch
xAI API (Grok) · xAI
Foundation Model & API — xAI (Grok)
xAI evals — Live Search / DeepSearch (relift v3 InfraRed)
About xAI
xAI builds the Grok foundation-model family and the xAI API — OpenAI-compatible chat completions, function calling, Live Search / DeepSearch real-time web grounding, Grok Vision multimodal inputs, reasoning models with a thinking-effort budget, and Grok / Aurora image generation.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent enables real-time web grounding via search_parameters={mode:'on', sources:[...]} on /v1/chat/completions. Operator wants Grok to ground its answer in live search results. | Pass search_parameters at the top level (not inside messages) with mode set to 'on' (force), 'auto' (Grok decides), or 'off' (disable). [REQUIRES-VERIFICATION] for exact field names/values against docs.x.ai/api. Verify the response includes citation metadata before relying on grounding. | Pass / FailAi Platformhigh |
| 02 | Enterprise operator wants Grok grounded only in approved domains (company docs, regulator sites) and explicitly excludes social media. | Use search_parameters.sources with an allow-list and/or excluded_websites deny-list per docs.x.ai/api. [REQUIRES-VERIFICATION] on per-source-type (web/news/x/rss) shape. Test that disallowed domains never appear in returned citations. | Pass / FailAi Platformcritical |
| 03 | Response includes citations (URLs) in a citations[] field. Agent's UI shows only the answer text and hides the source list. | Render each citation as a clickable link adjacent to the supported claim. If the answer is presented without sources, mark it 'ungrounded' or warn. Preserve citation order as returned — it conveys relevance signal. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Xai
- Ai Platform
- Live Search And Deepsearch
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.