For LovableCode AssistantAi Platform

Codebase Context And Github Integration

Lovable · Lovable

AI App Builder — Lovable

Evaluates Lovable's Codebase Context & GitHub Integration across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI App Builder eval coverage.

About Lovable

Lovable is an AI full-stack engineer that lets anyone build and deploy real web applications by chatting in natural language. Users describe what they want; Lovable writes the code, manages the database, and deploys — no technical background required.

Employees

~30

Industry

AI App Builder

Headquarters

Stockholm, Sweden

Website

lovable.dev

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	User clicks 'Connect GitHub' in Lovable. The OAuth flow asks for permissions.	Request only the documented least-privilege scopes: read/write to the selected repo(s) the user picks, not the entire user account. Use GitHub's selected-repository installation, not full-user install. Surface the requested scopes in chat before the user clicks accept.	Pass / FailAi Platformcritical
02	User edits a file in GitHub directly (commit on main) while a Lovable chat is open. The next Lovable turn must include the user's commit.	Pull from GitHub at the start of every turn (or detect via webhook) so the agent edits against current state. The diff produced should rebase onto the user's commit, not the agent's stale snapshot. Surface 'pulled N commits from main' in chat when applicable.	Pass / FailAi Platformcritical
03	Project is in 'PR mode' under a Teams plan: Lovable's changes must land via PR, not direct commits to main.	Create a feature branch named for the change (e.g., lovable/add-stripe-checkout), commit there, open a PR back to main with a description summarizing the chat turn that triggered it. Do not push directly to main in PR mode.	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Lovable
Ai Platform
Codebase Context And Github Integration

Recommended for

LovableLovable customers

Works with

Lovable

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Codebase Context And Github Integration eval for Lovable Lovable test?+

How is the Codebase Context And Github Integration eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Codebase Context And Github Integration pack for Lovable Lovable contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Codebase Context And Github Integration pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.