Prompts as Functions: The BAML Revolution in AI Engineering

Ben Lorica

11 months ago

David Hughes on How BAML Transforms AI Development.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.

David Hughes, Principal Data & AI Solution Architect at Enterprise Knowledge. Our discussion centers on BAML, a domain-specific language that transforms prompts into structured functions with defined inputs and outputs, enabling developers to create more deterministic and maintainable AI applications. This approach fundamentally changes prompt engineering by focusing on output schemas rather than crafting perfect prompt text, resulting in more robust applications that can adapt to new models without significant refactoring. BAML’s polyglot nature, testing capabilities, and runtime adaptability make it particularly valuable for enterprise environments and agentic AI applications, including multimodal systems.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript

Related content:

A video version of this conversation is available on our YouTube channel.
Seven Features That Make BAML Ideal for AI Developers
Structure Is All You Need
GraphRAG: Design Patterns, Challenges, Recommendations
Faster Iteration, Lower Costs: BAML’s Impact on AI Projects
What AI Teams Need to Know for 2025
AI Unlocked – Overcoming The Data Bottleneck
Vaibhav Gupta → Unleashing the Power of BAML in LLM Applications
Semih Salihoglu → The Intersection of LLMs, Knowledge Graphs, and Query Generation
Mars Lan → The Security Debate: How Safe is Open-Source Software

Support our work by subscribing to our newsletter📩

Transcript

Below is a heavily edited excerpt, in Question & Answer format.

Q: What is BAML and what initially drew you to it?

BAML is a domain-specific language that treats prompts as structured functions with defined inputs and outputs. Like many practitioners, I’ve used popular frameworks like LangChain, LlamaIndex, and Haystack extensively. While I could build innovative solutions with them, they often proved brittle. Any changes to input data or integrating new models required significant re-engineering and prompt adjustments.

My “aha moment” came from a podcast episode where prompts were described as functions within BAML. This concept of rigorously managing language model interactions resonated deeply. After trying BAML for just one afternoon, I realized it offered a fundamentally better approach, and I wouldn’t need to touch another framework again.

Q: How does BAML’s approach of treating prompts as functions differ from traditional prompt engineering?

The shift is profound. With BAML, I no longer fixate on crafting the perfect prompt text. Instead, I concentrate on defining clear schemas using classes in BAML, similar to Pydantic. I prioritize defining the desired output structure and ensuring deterministic results for reliable downstream system integration.

Instead of “begging and pleading” with a language model, I spend more time thinking rigorously about my objective. BAML’s Playground, an IDE extension, is also transformative. I can test prompts and schemas directly within my development environment, drastically shortening the iteration cycle compared to traditional approaches where you might need to deploy to containers for testing.

Q: Is there a steep learning curve with BAML given that it’s a programming language?

Surprisingly, no. My initial learning curve was incredibly short – about two hours of documentation and Playground exploration. Within that time, I was already refactoring complex codebases.

While BAML is a DSL, this is actually a strength. Many developers are proficient in languages like Rust for production environments, yet AI development often defaults to Python. BAML, being polyglot, bridges this gap. In under an hour, developers can gain access to a deterministic system for language models, regardless of their primary language. The initial investment unlocks significant productivity gains.

Q: How does BAML compare to frameworks like LangChain, LlamaIndex, or agentic frameworks like Crew or LangGraph?

For me, BAML is a replacement, not a supplement. Frameworks like LangChain are heavily prompt-centric. BAML distinguishes itself by focusing on deterministic outputs. This fundamental difference is crucial.

The other frameworks become brittle because they’re centered around crafting prompts for specific language models. With the rapid pace of model releases, this requires constant refactoring. BAML’s runtime refactoring capabilities—allowing dynamic adjustments to prompts, models, and even output schemas—address this brittleness, especially with the rapid evolution of new language models. This is particularly valuable as different models have different task affinities – with BAML, you can easily match the right model to the right task.

Q: The “prompt engineer” role emerged with the rise of LLMs. Does BAML diminish the need for specialized prompt engineers?

Largely, yes. You still write prompts in BAML, but it removes the need for “prompt whisperers” who specialize in model-specific prompting nuances. Keeping up with the task affinities of rapidly evolving models was becoming unsustainable for prompt engineers.

BAML’s philosophy encourages breaking down complex tasks into smaller, type-safe, atomic units of operation and reasoning. By emphasizing composability and leveraging smaller, task-aligned language models (which I use about 95% of the time), BAML promotes more resilient and higher-quality outputs, reducing reliance on intricate, model-specific prompt engineering.

Q: How does BAML help teams manage costs associated with LLM API usage?

BAML offers several cost-saving mechanisms. In the Playground, you can directly see token counts for your prompts, enabling iterative optimization for conciseness while maintaining output quality. BAML’s context injection is also highly efficient, minimizing token usage.

Crucially, BAML’s schema-aligned parser significantly reduces costs by eliminating re-prompting. Unlike approaches that rely on re-prompting to correct output formats, BAML ensures the output conforms to the defined schema, regardless of the raw LLM response, avoiding unnecessary API calls and their associated costs.

Q: LLMs are known for sometimes producing “approximately correct” formats. How does BAML handle output parsing and error handling?

Even with BAML, inspecting the raw LLM output in the Playground reveals that models often don’t produce perfectly formed JSON. However, BAML’s parser consistently delivers well-structured JSON based on the schema.

While some systems constrain LLM outputs, they may still return technically valid JSON with incorrect content. BAML provides transparent visibility into what the language model returns and will generate an error if the content cannot be properly parsed to match your schema. This gives you well-defined errors to check for and allows you to create assertions that validate your output. This rigor significantly cuts down iteration cycles and simplifies error handling compared to dealing directly with unstructured or inconsistently formatted LLM outputs.

Q: How does BAML help with testing and debugging AI applications?

Language models are fundamentally generators of probabilistic next tokens – they weren’t designed to provide structured output. BAML bridges this gap by parsing outputs into the structures you’ve defined in your schemas.

Because BAML gives you programmatic objects, you can perform runtime assertions and checks. For example, if I’m extracting colors from an image, I can assert that the returned list contains at least one color. I can use assertions to verify if the output is an instance of the expected type and enforce data integrity constraints. This level of programmatic control and testability significantly simplifies debugging compared to dealing directly with unstructured or inconsistently formatted LLM outputs.

Q: What advantages does BAML’s cross-language compatibility offer in enterprise environments?

BAML’s polyglot nature is a major advantage for enterprises. It’s designed to be a DSL accessible from any language. Even if native support isn’t immediately available for a specific language, BAML provides an API that can be used universally.

Native Rust support is underway, and the architecture is designed for extensibility. This language agnosticism makes BAML readily adaptable to legacy systems and diverse technology stacks common in large organizations. For example, I recently worked on projects requiring Rust, and while BAML doesn’t yet have native Rust output, I could use the API and create stubs that will be easy to port when native support arrives.

Q: How does BAML support the development of agentic AI applications?

BAML is crucial for building sophisticated agentic systems. I’ve successfully migrated agentic systems previously built with LangGraph to BAML. It allows you to create agent registries with defined reasoning engines.

Key features include dynamically changing reasoning engines, language models, and prompts for agents. Furthermore, BAML’s ability to update output schemas at runtime is invaluable for self-optimizing and autonomous agents. For example, if I discover during execution that I need to add a field to a “person” object, I can do that dynamically and continue getting that updated output.

These features are key for agents that can adapt to changing environments and data by dynamically modifying their output structures, reasoning processes, and even underlying models based on runtime observations and performance metrics.

Q: What exciting features are coming to BAML in the near future?

The enhancement of “prompts as functions” is very exciting. BAML is expanding the ability to embed more code logic directly within prompts, including iterations and conditional statements. This will further solidify the concept of prompts as powerful, programmable functions, unlocking significant new capabilities and expressiveness.

The broader community around BAML is also growing, with Donew (agentic apps) and Kùzu (for GraphRAG) integrating with BAML. This ecosystem growth will further enhance BAML’s value proposition as a foundational AI development tool.

Q: For technical teams curious about BAML, how should they get started?

I highly recommend investing an hour to explore BAML. Install the plugin for your IDE (VS Code, Cursor, etc.), follow the quickstart in the BAML docs, and jump into the Playground. Focus on defining the output structure (schema) you want.

If you get stuck, the official BAML Discord is responsive, and the community is growing quickly. It might seem like another tool to learn, but the paradigm shift it offers in working with language models is transformative. If you’ve struggled with brittle prompts or big, monolithic frameworks, you might find BAML’s approach of small, composable “prompt functions” more intuitive. In my experience, it also cuts down on cost and iteration time.

Q: Can you describe the concept of multimodal graph RAG, particularly in comparison to text-only graph RAG?

Multimodal graph RAG incorporates audio and video into a graph structure that can be used at retrieval time. This approach expands the scope and context for language models to work with and provides explainability for results.

Traditional text-only RAG misses a lot of rich latent signal in images. For example, if you have an image of a table with bananas, a bowl, a wooden surface, a lamp, and a telephone, the caption might simply say “bananas on a table,” which doesn’t capture the full context. Multimodal graph RAG addresses this by decomposing images into their components – predominant colors, objects with bounding boxes, and spatial relationships between objects.

This decomposition allows for more sophisticated queries. For instance, after retrieving five similar banana images from a vector database, you could ask which one shows bright yellow bananas that are predominantly featured and positioned to the right of a phone. The graph structure enables this level of detailed filtering based on visual characteristics and spatial relationships.

Q: How is the knowledge graph constructed for multimodal graph RAG? Isn’t that a significant hurdle?

The construction is more straightforward than it might seem. Language models do the heavy lifting by decomposing images into components that can be loaded into a graph database. For my work, I created a pipeline for loading this data into Kuzu (a graph database), which allows for bulk loading via data frames rather than requiring complex Pythonic or Rust-based pipelines.

For automatic knowledge graph construction, I’ve been using BAML (Binding and Modeling Language) with foundation models. BAML provides a type-based system to output exactly what’s needed for graph loading and ingestion. I use it with my Lava agent model to create structured output from the decomposition of images in a deterministic way.

I believe the field is moving toward agentic language model systems because they can handle the crucial semantic aspects better. An agentic system can take a preprocessing step to understand a corpus by developing an ontology, which can then define your schema in the graph database and the classes that BAML will use – all potentially at runtime.

Q: How close are we to being able to automatically create decent knowledge graphs from unstructured data like PDFs?

We can create knowledge graphs now using tools like BAML, but getting them to production-quality really depends on the ontology piece. BAML has the ability to create this if you know your ontology in advance and have defined it. To reach production quality, we need better agents for understanding ontologies.

There are teams working on AI solutions for extracting ontologies from unstructured text. Once these integrate with knowledge graph creation using agents based on BAML, I think we’ll get there relatively quickly – we’re not years away from this capability.

Q: Is BAML still relevant in the multimodal world, or is it primarily designed for text prompting?

BAML is already relevant in the multimodal world – I’m currently using it to pass images to foundation models. BAML can handle multimodal inputs in multiple ways: I can either pass a URL to the images I want the agent to process, or for specific use cases, I can encode images in base64 format which BAML takes as input to pass to a language model.

BAML’s capabilities are primarily constrained by the underlying foundation model, not by its own design limitations. As foundation models continue to advance in multimodal capabilities, BAML can leverage these advancements.

David Hughes on How BAML Transforms AI Development.

Transcript

Share this: