Testing

LLM Prompt Testing, Automated

Rigorously testing LLM prompts requires test cases to input to the prompt and evaluations to judge whether the answers are good. But both test cases and evals are a pain to set up. With Libretto, you get test cases and evals, set up automatically.

Automatic Test Sets

Your Best Tests Come From Your Users

Stop spending hours inventing test scenarios. We automatically sample real user interactions to build a comprehensive test suite that covers actual use cases, edge cases, and failure modes.

Menu showing an option to Add Test Case.
LLM as judge panel, showing a criterion to judge a prompt based on whether or not the prompt proposes a meeting.
Generated Evals

Custom Evals, Created Automatically For Every Prompt

We read and understand your prompts, then build specific tests for exactly what they're trying to achieve. If you're generating JSON, we'll verify the format. If you're writing emails, we'll check for clear calls to action. Every prompt gets its own custom test suite, created automatically.

Tuned Evals

Align AI Testing with Human Judgment

Are your AI-based evaluations not working as you hoped? Show us how you'd score just 10 responses, and we'll tune our entire testing system to match your standards. Get AI testing that thinks exactly like your team does.

Screenshot showing how LLM judges get aligned with humans via grading.

Elevate Your AI Testing Today

Experience automatic testing and evaluation setup of your LLM prompts with Libretto's platform.


FAQs

Discover how Libretto simplifies testing and enhances your LLM prompt evaluations effortlessly.

What are test cases?

Test cases are specific inputs for your prompt that are used to evaluate the prompt's performance. Test cases help identify strengths and weaknesses in your AI's responses. By automating the creation of test cases from production traffic, Libretto ensures you have a comprehensive set of cases to work with.

How does Libretto work?

Libretto samples your traffic to automatically generate test cases, and it analyzes your prompts to provide both subjective and objective evaluations of LLM outputs. This streamlined approach saves you time and enhances your testing accuracy.

What are evals?

Evaluations, or evals, are tests used to assess the quality of your LLM's responses. They can be both qualitative and quantitative, providing a well-rounded view of performance. Libretto generates these evals to help you make informed adjustments, but you can add, remove, or modify evals as you see fit.

Why automate testing?

Automating testing reduces the manual workload and speeds up the evaluation process. It allows you to focus on making improvements to the prompt rather than getting bogged down in repetitive tasks. With Libretto, you can quickly see the impact of your changes.

Can I customize my evals?

Absolutely. While evals are created automatically, you have full control:

° Add custom evaluation rules
° Adjust quality thresholds
° Define what "good" means for your use case
° Train our evaluators on your standards
° Exclude or modify any auto-generated evals