Site icon The Data Exchange

How to Make Your Data Truly AI-Ready

Yoni Leitersdorf on the Semantic Layer, AI for BI, and the Future of Data Analytics.


Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

Yoni Leitersdorf, CEO of Solid, joins the podcast to demystify why simply pointing an LLM at a database for text-to-SQL doesn’t work. He explains the critical need for a semantic layer to provide business context, turning raw data into a “Rosetta Stone” that AI can actually understand. Yoni details how to automate the creation of this layer by leveraging signals from across the enterprise — from dbt repos to Slack conversations — and shares practical advice on setting expectations for AI adoption, emphasizing a human-in-the-loop approach to build trust and achieve real value.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript



Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a heavily edited excerpt, in Question & Answer format.

Technical Foundations and Challenges

Why can’t we just point an LLM at a database and have it generate perfect SQL queries?

While foundational models have become remarkably proficient at SQL generation over the past 8-9 months, they lack the business and technical context specific to an organization’s data. Even with a well-structured star schema, models make educated guesses when encountering ambiguous situations. For example, if a data warehouse has three different revenue tables, the model won’t know which one is the authoritative source for a specific business question. It also won’t understand company-specific acronyms—a column named “FSU” might be interpreted as “Florida State University” instead of an internal business term. The bottleneck has shifted from model capability to supplying sufficient structure and context about your data environment.

What makes enterprise data warehouses more challenging than people assume?

Even in established enterprise warehouses, you typically encounter multiple layers of data quality: raw “landing zone” data straight from source systems like Salesforce, cleaned-up middle layers, and heavily processed “gold” layers. The challenge is that ownership and responsibility for data quality is distributed across many people and systems. Unlike public internet content where creators have direct incentives to maintain quality, enterprise data often lacks clear ownership, especially when employees change roles or leave. Additionally, you’ll find overlapping data marts, legacy artifacts, and inconsistent naming conventions that create ambiguity even within well-designed schemas.

The Semantic Layer Concept

What exactly is a semantic layer in the context of AI applications?

A semantic layer acts as a translation layer—a “Rosetta Stone” between business questions and warehouse reality. It provides the AI with institutional knowledge that human analysts would typically accumulate over months or years. Concretely, it captures:

The primary consumer is the AI system, but humans use it for explainability and trust-building.

In what form does the semantic layer actually exist across different platforms?

The semantic layer takes different forms depending on your data stack:

Automation keeps these formats synchronized as underlying schemas and models evolve to prevent drift.

Building and Automating the Semantic Layer

How do you automate the creation of the semantic layer? What signals and data sources do you use?

Automation is achieved by analyzing a wide range of enterprise signals to infer data quality, relevance, and business context without relying solely on manual documentation. Key sources include:

By building graphs of these interactions, you can identify “star analysts,” understand which datasets are trusted for specific domains, and automatically generate a significant portion of the semantic layer.

What signals indicate trustworthy data in an enterprise environment?

Key trust indicators include:

For new models without usage history, change signals from dbt repositories or Jira tickets indicate intent (e.g., “table X replaces deprecated table Y”).

Implementation and Rollout Strategy

What does a realistic implementation process look like, and how should teams manage expectations about accuracy?

Implementation should be gradual and typically takes 6-8 weeks. The system will not be 100% accurate on day one—expect 75-85% accuracy initially, with the remaining gap closed through human-in-the-loop validation.

The process includes:

  1. Warm-up period (2 weeks): System analyzes historical data from connected sources using lookback periods of SQL logs (~3 months), communication history (~1 year), ticket systems (multi-year), and code repositories (complete history)
  2. Staged rollout:
    • Data engineering teams: Initial validation of technical accuracy
    • Data analysts/modelers: Business context refinement and validation
    • Business stakeholders: Final rollout only after technical teams build trust

This phased approach is essential for maintaining trust. If business users receive incorrect answers early on, they will lose confidence and abandon the tool.

Where should teams start when implementing a semantic layer?

Start narrow and specific—select one or two business domains like sales and marketing, then focus on particular use cases within those domains. You don’t need to “boil the ocean.” Instead, prove value in a focused area and expand gradually. This approach allows you to build expertise and refine processes before tackling broader organizational challenges.

Trust, Accuracy, and Human Validation

How do you build and maintain trust in the system when you know it won’t be perfect?

Trust is established through multiple mechanisms:

Business users may not understand the underlying SQL, but when internal data teams validate outputs and champion the system’s use, broader trust follows.

What role do humans play in maintaining these AI analytics systems?

Humans remain essential for validation and continuous improvement. While AI can achieve 75-80% accuracy initially, humans need to:

Unlike traditional data catalogs where humans had to document everything upfront, AI systems do most of the heavy lifting and request human input only when needed—typically for about 15-20% of semantic understanding.

Platform Integration and Data Stack Considerations

How do you handle permissions and row-level security with AI-generated queries?

Leave security enforcement to the underlying data platform (Databricks, Snowflake, BigQuery). The semantic layer tells the AI how to answer questions, while the data platform applies who can see what. This separation of concerns ensures that existing security policies remain intact and are consistently enforced regardless of how data is accessed.

Can you create one semantic layer across multiple platforms like Snowflake, Databricks, and BigQuery?

While enterprises desire unified semantic layers across platforms, most current success comes from domain-scoped layers per platform, aligned through shared business definitions. Cross-system layers are emerging but remain complicated by governance boundaries and organizational ownership patterns. The practical approach is to maintain consistency in business definitions while allowing platform-specific implementations.

Lessons from Previous Approaches

Data catalogs had similar goals but struggled with adoption. What lessons apply, and what makes semantic layers different?

Traditional data catalogs failed for two key reasons:

  1. Manual documentation burden: They required humans to hand-document everything, creating unsustainable maintenance overhead
  2. Wrong demand source: They were primarily data team initiatives that struggled to gain business traction

The current approach addresses both issues:

Additionally, semantic layers aren’t just about discovery—they enable direct question-answering, providing immediate value rather than requiring users to navigate to separate documentation systems.

User Experience and Education

What user education is required for effective AI analytics adoption?

Users often treat AI chat interfaces like keyword search engines, which reduces effectiveness. Successful implementations require education on:

Better prompting reduces back-and-forth interactions and improves system reliability.

Future Applications and Agentic Workflows

Where does this foundational work on semantic layers lead? What advanced capabilities does it unlock?

Making data accessible through reliable AI interfaces is the critical first step toward more autonomous systems. The long-term vision enables AI agents that can proactively use data to drive business outcomes.

For example, organizations are developing AI agents attached to marketing campaigns that continuously monitor performance by analyzing warehouse data. Based on their analysis, these agents propose campaign adjustments, present them for human approval, and automatically implement approved changes. The semantic layer provides the foundational, context-aware data access that makes such agentic behavior both safe and useful.

Other emerging applications include automated anomaly detection, proactive reporting, and intelligent data quality monitoring—all requiring the same foundational semantic understanding.

Build vs. Buy Considerations

Should teams build semantic layer capabilities internally or work with vendors?

You can build internally, but expect significant ongoing work: assembling signals across systems, maintaining currency as schemas evolve, monitoring chat interactions, and closing feedback loops. Successful internal implementations require treating this as a long-term capability-building exercise rather than a one-time project.

Vendors who observe patterns across multiple deployments tend to reach reliability faster and maintain lower ongoing burdens. However, whether building or buying, plan for the same implementation lifecycle: bootstrap with historical data, stage rollouts, implement human-in-the-loop validation, establish observability, and maintain continuous updates.

How should teams approach ROI evaluation for AI analytics projects?

Set appropriate expectations upfront. Many early pilots failed because organizations expected “magic”—plug-and-play solutions that would work perfectly immediately. Successful implementations treat this as capability-building that requires investment in semantic layers, user education, and ongoing maintenance.

ROI comes from enabling broader access to data insights and reducing bottlenecks on technical analysts, but it requires treating the AI system as a team member that needs training and support rather than a turnkey solution. Organizations that understand this investment model see better outcomes and more sustainable implementations.

Exit mobile version