Unleashing the Power of BAML in LLM Applications

Vaibhav Gupta on BAML, Structured Data Extraction, and LLM Integration: Revolutionizing AI Development.

Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

Vaibhav Gupta is the CEO and co-founder of Boundary. In this episode, we explore BAML, an open source domain-specific language designed to streamline interactions with large language models (LLMs). We discuss how BAML simplifies structured data extraction, enhances prompt engineering, and reduces token usage across multiple programming languages. The conversation highlights robust features like error handling, hallucination detection, and integration with Retrieval-Augmented Generation (RAG) systems, showcasing real-world applications in healthcare and finance. Finally, we delve into BAML’s future roadmap, including support for agent-based architectures and advanced mathematical validation.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Related content:


If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:


Transcript.

Below is a heavily edited excerpt, in Question & Answer format.

What is BAML and what problem is it trying to address?

BAML is a domain-specific language for writing and testing LLM functions. It helps you get structured data from LLMs with an optimal developer experience. Looking ahead 10 years, we’ll likely have codebases with thousands of prompts doing heavy algorithmic lifting. We need something more rigorous than passing around JSON objects. Think about how React made HTML a first-class citizen in JavaScript – BAML does the same for LLM interactions. It provides syntax checking and compile-time errors, making code more robust and readable.

How does BAML compare to existing approaches for getting structured data from LLMs?

Today, developers typically use three approaches: plain text prompts asking for JSON, XML or YAML formats, or OpenAI’s structured outputs. The problem with simply requesting JSON is reliability – models often break format. With OpenAI’s constrained generation, the model is restricted to output only according to your schema. This creates a different problem – if the input isn’t what you expect (like getting a resume when you’re configured for receipts), the model still outputs data matching your receipt schema, effectively hallucinating.

BAML offers a more elegant approach. Instead of restricting what the model can output, we use what we call “schema-aligned parsing.” This allows the model freedom in reasoning, while our algorithms handle the error correction and schema alignment in under a millisecond without requiring another LLM call.

What makes BAML more efficient than alternatives?

BAML is significantly more token-efficient. For example, describing a case number in JSON schema takes about 13 tokens, while the equivalent in BAML takes just 4 tokens. This efficiency translates directly to cost savings and reduced latency.

Beyond token efficiency, BAML handles model output cleanup automatically. If an LLM outputs verbose markdown with backticks and commentary, BAML strips it out instantly. We also eliminate unnecessary syntax elements like quotation marks that aren’t needed for understanding but consume tokens.

Most importantly, BAML can detect when the LLM output doesn’t match your expected data model and raise exceptions you can catch in your code, preventing hallucinated outputs from propagating through your system.

Can you explain how BAML would be used in a typical application?

Let’s take a common example – extracting information from a receipt. With BAML, you define a data model specifying what you want to extract: establishment name, date, total amount, currency, and a list of items with their prices and quantities.

You write this as a BAML function that takes either an image or string and outputs your structured data model. BAML provides a clear, readable syntax for this, similar to TypeScript or Python, but specifically designed for LLM interactions.

When you run this function with an image of a receipt, BAML handles sending it to the LLM with your prompt and then parses the response into your structured format. If the input isn’t actually a receipt but something completely different, BAML will raise an exception rather than returning malformed data.

Does BAML work with all LLMs, and how does performance vary across models?

Yes, BAML works with essentially any LLM, from powerful models like GPT-4 down to smaller 3B parameter models. We’ve found that LLMs are generally good at reasoning about problems but less reliable at formatting outputs precisely according to specifications.

BAML addresses this by handling all the schema error correction without using additional LLM calls, making it incredibly fast. This approach means you can often use smaller, cheaper models while maintaining high-quality outputs. We’ve had customers drop from GPT-4 to GPT-4 Mini with the same or better results while reducing costs by up to 98%.

What kinds of applications are people building with BAML?

We’re seeing diverse use cases. One veterinary software company uses BAML to convert unstructured clinical notes into standardized SOAP formats. Since every vet has their own notation style, they first analyze historical notes to generate a personalized schema for each vet, then process new transcripts against that schema.

Financial data extraction is another major use case – extracting data from 100-page bank statements with minimal errors. We also have customers using BAML for entity extraction, sentiment analysis, and metadata extraction from SEC filings.

Some teams are using BAML for agent routing layers, allowing them to use smaller 8B or 70B models instead of much larger ones while improving reliability and reducing latency and cost.

How does BAML fit into the RAG (Retrieval-Augmented Generation) ecosystem?

Many of our early customers migrated their RAG pipelines to BAML. One common pattern we see is using BAML for pre-filtering categories. For example, if you have 500 potential categories in your database, you can use a RAG approach to narrow down to the 20 most relevant ones for a query, then use BAML to select the best 3 from those 20.

We also see BAML used for question-answering with citations and hallucination detection. The key advantage is that BAML makes it easier to implement robust validation with just a few lines of code, which developers are then willing to add throughout their codebase.

What’s the learning curve for BAML?

It’s comparable to learning Next.js – not something you’ll master in five minutes, but with about two hours of playing around, most developers get comfortable with it. The setup process is simple: just a pip install or npm install, and you’re ready to go.

We’ve grown from fewer than 10 users 90 days ago to over 400 today, with more than 300 self-onboarding through our documentation. The sweet spot for getting started is using BAML for extracting structured information from unstructured text, but developers are also building more complex pipelines including agents.

What’s on the roadmap for BAML?

We’re about to release features for double-checking LLM outputs using mathematical validation. For example, if you extract a receipt with line items and a total, BAML will automatically verify that the sum of items plus tax equals the total and raise warnings when they don’t match.

We’re also working on first-class support for agents, expected by the end of the year. Unlike other frameworks where modifying agent behavior requires digging through complex code, BAML will give you full control over the entire prompt with clean syntax.

We generally take a measured approach to adding features. Rather than rushing to support every new API that comes out, we wait to see how multiple LLMs implement similar functionality before creating abstractions in BAML.

How can developers get started with BAML?

BAML is open source and available through pip or npm. The best way to track our progress is through our Discord community, where we post about events and webinars. We also run live coding sessions and are starting to organize monthly meetups in San Francisco and Seattle.

For developers who want to minimize how much BAML code they write, we’ve also released functionality where LLMs can generate BAML code for you. This works particularly well with coding assistants like Cursor that can integrate with your existing codebase.


[Ben Lorica is an investor and advisor to Boundary and other startups.]