Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation

Databricks

9 Dec 202406:20

Summary

TLDRIn this demo, Eric Peter introduces the new synthetic data capabilities for agent evaluation, designed to help developers improve the quality of AI agents. By generating high-quality evaluation data in minutes, the synthetic data API reduces the need for time-consuming labeling by subject matter experts. The process allows developers to assess agent quality, identify issues, and make improvements quickly. By leveraging ML Flow and proprietary AI agent evaluation tools, the system identifies root causes of quality issues and suggests fixes, speeding up the path to production-ready agents with higher ROI.

Takeaways

😀 Synthetic data capabilities enable developers to generate high-quality evaluation data quickly, saving time and resources.
😀 The tool addresses the challenge of labeling evaluation data, which is typically time-consuming and costly without subject matter experts.
😀 The synthetic data API allows developers to generate a full set of evaluation questions in just minutes, enhancing agent quality before subject matter expert involvement.
😀 Generated data includes both the questions and the criteria for evaluating agent responses, improving the accuracy and efficiency of the evaluation process.
😀 By using this synthetic data, customers have achieved up to 60% improvements in agent quality prior to engaging subject matter experts.
😀 The demo showcases the ease of use of the API, with a simple setup that requires only the document corpus, question guidelines, and agent description.
😀 The tool uses `mlflow` for observability, tracking agent performance in both development and production environments.
😀 The evaluation process provides an overall quality score, helping developers pinpoint areas that need improvement, such as issues with data retrieval or response accuracy.
😀 A key feature is the ability to adjust parameters like the number of documents returned by a vector search retriever, which can significantly improve agent quality (e.g., increasing from 1 to 5 results raised quality by 17%).
😀 The tool helps developers reduce costs and accelerate time to market by automating the generation of high-quality evaluation data.
😀 The synthetic data capabilities are currently available in public preview, with a demo notebook for developers to try out the tool themselves.

Q & A

What is the primary goal of the new synthetic data capabilities introduced by Databricks?
-The primary goal is to help developers improve the quality of their AI agents by automating the generation of high-quality evaluation data, eliminating the need for manual labeling by subject matter experts.
Why is it difficult for developers to evaluate and improve AI agents without high-quality evaluation data?
-Without high-quality evaluation data, it's challenging to accurately assess the performance of AI agents, making it difficult to reach production-quality targets. Gathering such data manually is also time-consuming and resource-intensive.
How does the synthetic data API help developers overcome these challenges?
-The synthetic data API generates high-quality evaluation data in minutes, allowing developers to quickly assess and improve their agents, significantly reducing the need for manual data labeling.
What were the results of testing the new synthetic data capability?
-Customers who tested the capability saw a 60% improvement in agent quality using the generated synthetic data, before even engaging subject matter experts.
Can you explain the process of using the synthetic data API in the demo?
-In the demo, the speaker installs Databricks' agent evaluation tools, loads a document corpus, and uses the synthetic data API to generate evaluation questions. The API produces high-quality test cases that assess the agent's performance.
How does the synthetic data API generate questions and evaluation data?
-The API generates questions by passing a document corpus, the desired number of evaluation questions, and optional agent description and guidelines. It then generates a set of fully formed test cases that assess the quality of the agent’s responses.
What is the role of MLflow in this process?
-MLflow is used to track the agent's code, configuration, and performance during the evaluation process. It helps in providing observability, logging, and visualizing the results of agent evaluations to identify areas for improvement.
How does the agent evaluation framework identify quality issues in the agent's responses?
-The framework uses proprietary LM judges to assess each record in the evaluation data set. It identifies quality issues and provides a written rationale for why the agent’s response is correct or incorrect, including factors like missing facts or poor retrieval.
What impact did adjusting the number of documents retrieved have on agent quality?
-Increasing the number of retrieved documents from 1 to 5 led to a 17% improvement in the agent’s quality, reducing issues related to retrieval and providing better results.
What are the overall benefits of using synthetic data capabilities for agent evaluation?
-Synthetic data capabilities reduce the time spent labeling data, accelerate time to market, lower costs, and help deliver higher ROI by improving agent quality faster.