Monitoring

LLM Monitoring That Finds Failures, Fast

Easily integrate Libretto to track all your LLM calls, and we will find the responses that are toxic, unhelpful, or low quality. With Libretto, you gain valuable insights into your LLM prompts' performance, making it possible to fix problems fast.

Flagging Errors

Catch Every AI Mistake Before Users Do

Your AI is talking to users right now. Know exactly what it's saying and when it goes wrong.

Block Harmful Content

Detect toxic AI responses instantly across 13 categories of harmful content

Find AI Mistakes

Know when your AI stops working and leaves users hanging

Spot Bad Actors

Detect toxic users, abuse attempts, prompt injections, and jailbreak attacks instantly

SOC2 Compliant

Stay secure with Libretto's SOC2 Type 2-compliant system.

Custom Evals

Test AI Quality on Your Terms

Every AI use case is unique. Our custom evaluations catch the problems that matter to your business:

Verify data formats and structures automatically

Run custom code checks for your specific requirements

Use AI to judge AI responses against your quality standards

Daily Alerts

Problems Find You, Not Your Users

Don't wait for customer complaints. Get daily alerts about AI issues, with:

Daily summaries of critical failures and potential problems

Clear examples of what went wrong

Actionable steps to fix issues fast

Menu showing an option to Add Test Case.

Building Test Sets

Turn Today's Failures into Tomorrow's Tests

Found a problem? One click adds it to your permanent test suite:

Build a bulletproof test suite from real-world issues

Automatically check new prompts against past issues

Never repeat the same mistake twice

FAQs

Discover how Libretto's Monitoring feature enhances your LLM performance and user experience.

Contact

What exactly does Monitoring track?

Every AI interaction in your app gets tracked, organized, and analyzed. In addition to showing the inputs to your prompts and outputs from AI, we catch:

° Toxic or harmful AI responses
° Times when AI refuses to help users
° Manipulation attempts by bad actors
° Quality issues specific to your prompts

How does this make my AI better?

Instead of discovering problems through user complaints, you'll:

° Catch harmful content before it reaches users
° Spot when your AI starts failing
° Know exactly which prompts need fixing
° Build better test cases from real failures
° Get actionable data to improve performance

How much work is it to set up?

Replace your current AI client library with ours - it takes about 5 minutes:

1. Install our SDK with one command.
2. Swap out your existing AI client with our SDK.
3. Add prompt tags to track variables in your prompt templates.

That's it. Start seeing results immediately.

What specific problems can you catch?

We detect four main categories of issues:

1. Harmful content (across 13 categories)
2. AI refusals and non-answers
3. User attacks such as prompt injections
4. Custom quality issues you define, including AI generated quality judges

How do I know when there's a problem?

You can check your Libretto dashboard at any time, and we also send you daily email summaries with all of the issues we've detected.

Can I try monitoring with my current AI setup?

Yes! We currenty support:

° OpenAI (GPT-4o, GPT-4o mini, o1, etc.)
° Anthropic (Claude)
° Any provider that uses the OpenAI client library (some implementations of Meta Llama, for example)
° More providers coming soon