Hume Evi Tool Use
Hume AI (EVI / Octave) · Hume AI
Empathic Voice AI — Hume AI
Hume AI evals — Tool Use / Function Calling in EVI (relift v3 InfraRed)
About Hume AI
Hume AI builds empathic voice technology — the Empathic Voice Interface (EVI), an expression-measurement API across face, voice, and language, and the Octave text-to-speech model — for emotionally aware, responsibly governed voice experiences.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Mid-conversation EVI emits a tool_call event (with a tool_call_id) to look up an order status; the client executes the function. | Return a tool_response message carrying the same tool_call_id and the result payload so EVI can ground its spoken reply. Match strictly by tool_call_id; do not return an unkeyed result the server cannot correlate. | Pass / FailAi Platformcritical |
| 02 | The order-lookup function returns a structured object; the client must send it back to EVI for the assistant to verbalize. | Send the tool_response content in the documented shape (e.g. a JSON-serializable result/content string keyed by tool_call_id) so the LLM can consume it. Keep the payload concise and free of secrets the assistant should not speak. | Pass / FailAi Platformhigh |
| 03 | A tool call takes ~3 seconds; without handling, the assistant goes silent and the user thinks the call dropped. | Use the documented affordance (e.g. event/tool messages such as 'let me check that') to keep the user informed during tool latency, then deliver the grounded answer when the tool_response arrives. Do not let dead air imply a dropped call. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3. Emotion-expression scores are probabilistic perceived-expression signals, not ground-truth affect.
Rubric criteria
- Hume
- Ai Platform
- Evi Tool Use
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.