Pinecone Assistant
Pinecone · Pinecone
Vector Database — Pinecone
Pinecone evals — Pinecone Assistant (relift v3 InfraRed)
About Pinecone
Pinecone is a managed vector database for AI applications — serverless and pod-based indexes, namespaces for multi-tenant isolation, hybrid sparse-dense search, integrated inference (embed + rerank), and Pinecone Assistant for retrieval-augmented generation with citations.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator uploads a 40 MB PDF to assistant 'support-bot' via /assistant/{name}/files. | Upload chunks per file size limit per docs [REQUIRES-VERIFICATION for current cap]. Files persist within the assistant until deleted; track file_id mappings in operator's own store for lifecycle. Files are private to the assistant — they are not shared with other assistants or indexes. | Pass / FailAi Platformmedium |
| 02 | User asks the Assistant 'what does our refund policy say about chargebacks?' The response cites refund_policy.pdf, page 12-14. | Render citations as inline links keyed to (file name, page range). Preserve cited_text verbatim. Show 'no citation' answers as unverified — do not surface ungrounded claims as policy. Log citation/no-citation ratio for telemetry. | Pass / FailAi Platformcritical |
| 03 | Operator points an existing OpenAI client at the Assistant /chat/completions endpoint. | Assistant exposes an OpenAI-compatible /chat/completions surface for easy migration. Pass Pinecone Api-Key as the bearer; messages[] follows OpenAI schema; response includes citations as a Pinecone-specific extension [REQUIRES-VERIFICATION for current shape]. Test the citation extraction path befor… | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Pinecone
- Ai Platform
- Pinecone Assistant
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.