Building the Knowledge Layer Your Agents Need

Philip Rathle on GraphRAG, Knowledge Graphs, AI Agents, and Agentic Memory.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.

This episode features Philip Rathle, CTO of Neo4j, discussing the real-world state of Graph RAG and its journey from prototype to production in enterprise AI systems. Philip shares success stories from companies like Walmart, Uber, and Adobe while addressing the challenges teams face with graph building tools and knowledge graph creation. The conversation explores why knowledge graphs are becoming essential for AI agents, covering topics like agentic memory, deterministic reasoning, governance, and why enterprises should avoid replicating human organizational structures when designing multi-agent systems.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript

Related content:

- - A video version of this conversation is available on our YouTube channel.
  - Inside the race to build agent-native databases
  - Context is King: Long Live Graph-Based Reasoning
  - GraphRAG: Design Patterns, Challenges, Recommendations
  - Philip Rathle → Supercharging AI with Graphs
  - Semih Salihoglu → The Intersection of LLMs, Knowledge Graphs, and Query Generation

Support our work by subscribing to our newsletter📩

Transcript

Below is a heavily edited excerpt, in Question & Answer format.

Understanding GraphRAG

What is GraphRAG, and why did it emerge?

GraphRAG (Retrieval Augmented Generation) is a technique that enhances standard vector-based RAG by organizing retrieved context within a knowledge graph structure. The core idea originated from Microsoft Research’s observation that standard vector RAG often lacks sufficient context and structure, leading to less accurate or incomplete answers.

The original process involves taking unstructured data (like documents), extracting key entities and their relationships (subject-verb-object patterns), and modeling this as a graph where entities become nodes connected by relationship edges. The system also creates a graph of document chunks, with explicit links showing which entities appear in which chunks. When a query is made, this structured, connected context is fed to the LLM alongside the retrieved text, enabling higher-quality, more accurate answers.

The field has since expanded beyond this original pattern to encompass approximately five different approaches working against different types of graphs. Knowledge graphs can contain multiple layers—from metadata to actual data, from single-application to multi-application scope. The key insight is that seemingly unstructured data actually has implied structure. Humans naturally extract relationships when reading, and knowledge graphs formalize this as “a distillation of relationships between things, which is a form of context for the data that matters.”

What’s the current state of GraphRAG adoption—are teams actually moving from prototype to production?

Yes, there is substantial movement from prototype to production, though it’s still early days. Several major companies have publicly shared production deployments: Walmart built an agentic AI system that processes feedback from 1.6 million employees using a graph with hundreds of data sources. Sanofi achieved a 50x improvement in identifying drug compounds, potentially cutting a year off their 10-year drug discovery cycle. Companies like Uber, Adobe, Intuit, and Comcast are also running production systems.

The common pattern is that teams often start with just LLMs and vector-based RAG to build a promising prototype, but then hit walls around hallucinations, lack of explainability, inability to apply access controls, and insufficient accuracy. The unlock for these successes has been adding a knowledge graph as an AI knowledge layer. This graph becomes the component that often makes the difference between the 70-80% of projects that fail and the ones that succeed in reaching production.

What enterprise requirements does GraphRAG enable that vector RAG alone struggles with?

GraphRAG addresses several critical enterprise requirements that cause standard vector RAG implementations to stall:

Accuracy and precision: Fine-grained context and deterministic reasoning paths reduce hallucinations and improve answer quality
Explainability: You can traverse paths visually and show exactly which relationships and entities were used to arrive at an answer, making decisions auditable
Access controls: Graphs naturally support fine-grained permissions and security policies that can be enforced at the entity and relationship level
Governance: Deterministic graph queries enable policy checks and rule enforcement that can be inspected and validated
Freshness: Knowledge graphs can be updated in real-time, keeping enterprise data current rather than stale
Determinism: For questions that require exact answers rather than probabilistic generation, graph queries provide reliable, repeatable results

These capabilities are essential for moving AI applications from prototype to production in regulated or high-stakes enterprise environments.

Success Factors and Patterns

What separates successful GraphRAG implementations from failed ones?

Successful teams consistently exhibit two key characteristics:

First, they avoid “boiling the ocean”—they don’t spend years trying to build a perfect, comprehensive knowledge graph before deriving any value. Instead, they start with a high-value business problem, figure out how to apply AI to solve it, and realize during that journey that their agents need access to knowledge that is up-to-date, low-latency, and able to handle structured, unstructured, and semi-structured data from multiple applications.

Second, they typically don’t even start with graphs in their initial architecture. The pattern is: teams create a decent prototype with just vector RAG, but without a knowledge graph, they can’t meet enterprise requirements for accuracy, security, access controls, and explainability. They discover the need during the prototype phase when struggling with these requirements. The graph becomes the unlock that bridges the gap between prototype and production.

The philosophy is to pick a high-value use case, stand up a starter knowledge graph, and iterate. Successful teams accept that the graph will evolve but insist on enterprise requirements (freshness, access controls, explainability, determinism) from day one. Teams that wait for a perfect, enterprise-wide ontology usually never ship.

Building Knowledge Graphs

What’s the current state of graph-building tools?

The tools have improved dramatically over the past year, though they continue to evolve rapidly. Modern tools like Neo4j’s open-source LLM Graph Builder can now take unstructured content (like a web page) and with a few clicks perform three key tasks:

Entity and link extraction: Use an LLM to identify key entities (people, companies, concepts) and the relationships between them, structuring this into a domain graph
Vector chunking: Break the text into manageable chunks and create vector embeddings for semantic search
Entity-chunk linking: Map which entities are referenced in which chunks, synchronizing retrieval and reasoning

For structured data sources, the tools have become remarkably efficient. You can now move data from systems like Snowflake into a graph model in less than 10 minutes—a process that used to take a professional services team about a week. The tools use AI to infer referential integrity between tables when it’s not explicitly defined, which is common in data warehouses.

These tools provide visual explainability, allowing you to inspect what the model extracted and why a particular answer was given. This transparency is crucial for debugging and refinement.

How accurate are automated graph-building tools, and how do you control hallucinations?

Out of the box, these tools can achieve around 60-70% accuracy on unstructured data. However, you can significantly improve this accuracy by providing schema-like guidance to the LLM. For example, specifying patterns like “look for a Person who works for a Company” or “Board member on the board of a Company” coerces the model to extract the specific structure you care about and substantially reduces hallucinations.

If you have existing structured datasets with defined ontologies—such as product hierarchies or customer taxonomies—you can reference those during unstructured entity extraction to improve accuracy and consistency. This approach minimizes invented entities and relationships by grounding the extraction process in known, validated concepts.

The key principle is to specify what to look for rather than letting the model extract arbitrary entities. This guided extraction approach provides both better precision and reduced hallucination rates.

Can teams without graph expertise successfully build knowledge graphs today?

Yes, with caveats. Many smart startup founders and teams are self-sufficient and figure it out on their own. The tools have improved to the point where teams can reach a “starter knowledge graph” without deep expertise. However, for production deployments on more serious applications, it’s still recommended to have an expert spend a few hours providing architectural guidance to avoid common modeling and query pitfalls—not a full-time team, just occasional expert input.

The reality check here is important: most developers struggle with SQL and relational databases, let alone graphs. Better tools are essential for accessibility. Within a year, the tools will likely be good enough that teams can achieve production-grade results without any external expertise. The trajectory is toward enabling teams to build graph applications as naturally as other development tasks.

How do graph-building tools handle both structured and unstructured data sources?

For structured data, you can achieve 100% fidelity when bringing data into a graph. The challenge is often that systems like Snowflake don’t have explicit referential integrity defined between tables, so modern tools use AI to infer these relationships based on naming patterns, data types, and content analysis.

For unstructured data, the process involves using models for entity extraction with the guidance mechanisms described earlier—specifying patterns and referencing existing ontologies to minimize hallucinations.

A critical principle is that you don’t need to move 100% of your data into a graph. A common pattern is to take 5-10% of the signal from your various systems—like customer IDs and key facts from eight different customer systems—stitch them together in the graph, and leave the rest of the data where it is. This approach provides the cross-system reasoning capability without the overhead of centralizing all enterprise data. It’s neither a complete data consolidation nor pure federation—it’s selective integration of the most valuable signals.

How do LLMs perform at generating graph queries compared to SQL?

Surprisingly, LLMs are often better at generating complex graph queries (like Cypher) than equivalent SQL, even though they’ve been trained on vastly more SQL code. Multiple production teams at companies like Comcast, Walmart, and Uber have observed that when users ask complex questions requiring multi-way joins, the resulting SQL queries become extremely long and often can’t be generated or executed properly. The equivalent graph queries are much shorter and more reliable.

This advantage stems from the fact that graph models and queries naturally represent and traverse complex relationships. Multi-hop reasoning patterns—”connect the dots” operations—are intuitive to express in graph query languages, whereas they require intricate, multi-level joins in SQL that can become unwieldy. The graph query structure also mirrors the conceptual structure of the question more directly, making it easier for models to reason about.

Graphs for AI Agents

Why are graphs particularly well-suited for AI agents?

Graphs provide several fundamental advantages for AI agent architectures:

First, agents need to answer arbitrary questions, and graphs make query generation easier, especially for complex queries. As discussed, LLMs generate more reliable graph queries than SQL for multi-hop reasoning tasks, even with less training data.

Second, agents often need a unified view across many different systems. A graph provides an ideal bird’s-eye view by stitching together key data from multiple sources without requiring complete data centralization. This isn’t about moving everything into one system—it’s about connecting the critical 5-10% that provides signal.

Third, agents benefit from different forms of reasoning. Graph-based reasoning allows for complex reasoning paths with multiple forks, where answers can be determined through pattern matching or “connecting the dots”—and this is fully deterministic and explainable. Not every question needs only a probabilistic answer; many business-critical decisions require deterministic, auditable reasoning.

Fourth, as agents perform complex tasks, their reasoning chains, actions, and even failures can be modeled as a graph, creating an auditable, traversable record that is invaluable for debugging and understanding agent behavior.

What specific advantages do graphs provide for agent architectures?

Beyond the fundamental fit, graphs naturally support several critical agent capabilities:

Explainability: You can traverse paths visually and make agent decisions auditable. When an agent answers a question, you can show exactly which graph relationships it used to arrive at that answer, providing transparency for users, auditors, and developers.

Multi-hop reasoning: As agents handle more complex tasks requiring cause-and-effect analysis across multiple steps, graphs provide a natural structure for these reasoning chains. The graph structure explicitly represents the logical dependencies between concepts.

Contextual precision: Graphs provide fine-grained distinction between concepts. Unlike vector embeddings where a tennis ball and an orange might appear similar without knowing why, graphs explicitly capture whether similarity is due to roundness, size, color, or other specific attributes. This precision helps agents retrieve the right context and avoid ambiguous matches.

Composability: Within a company, individual employees can each have memory relative to their work. An agent can access the sum total (or a controlled subset) of what groups of people have been working on, with appropriate access controls. This naturally scales from individual to team to company-wide scope, with the graph structure enabling fine-grained permission boundaries.

What role should databases play in agent memory systems?

There’s currently no consensus on agentic memory implementations—teams are trying many different approaches, and numerous startups are focused on this problem. However, the likely evolution is that databases will play a central role in memory systems. The fundamental capabilities of databases—persistence, access controls, uptime, reliability, and structured storage—are exactly what agent memory systems require.

Several agentic memory startups (Mem0, Zep, Cognosys, Letta) are already using databases as core components, with many specifically using graph databases. The different types of memory that agents need—episodic (events and experiences), procedural (how-to knowledge), semantic (facts and concepts), and temporal (state changes over time)—each benefit from database capabilities.

Graph databases have a particular advantage here: the way data is physically structured in a graph (nodes connected through relationships) mirrors how neurons are physically structured in the brain (connected through synapses). Human memory is inherently associative and relational, connecting concepts, events, and procedures. Graphs excel at representing this kind of connected knowledge.

Furthermore, graphs provide natural composability and access control patterns. You can model the memory of individual agents and then compose them to create shared team or organizational memory, with the persistence, access controls, and reliability that a production database provides. You can store memories as nodes and edges with provenance metadata and time-to-live policies, index text with vectors for recall, and get the best of both worlds—structured reasoning and semantic search.

Architecture and Governance

What’s the right approach to agent architecture, and how do you avoid over-complication?

A critical mistake is anthropomorphizing agent architectures by directly mapping human organizational structures to multi-agent systems. If you had 12 people with different roles working on a project, you don’t necessarily need 12 agents. This is “shipping your org chart as software”—a reverse Conway’s Law problem.

The human organizational model can be a useful starting point for decomposing tasks and understanding what skills are needed, but don’t artificially replicate everything a person does. For example, if your designer also drafts emails to senior designers, you don’t need to build an agent that replicates that entire workflow.

Instead, think modularly: start with a single orchestrator model that coordinates different parts of the task, calling deterministic tools when appropriate, and using specialized or fine-tuned models for specific domains. Many capabilities that seem like they need separate “agents” might be better implemented as tools or functions that a single orchestrating system calls. An agent’s “skill” might be better implemented as a reliable, external tool rather than as another agent it has to communicate with.

This approach avoids the complexity of multi-agent communication, which can quickly become unwieldy and difficult to govern and debug. The key principle is that machine workflows don’t need to mirror human workflows. The reasons we have multiple people—scaling limitations or distributed skills—don’t necessarily apply to agents in the same way. Only introduce multiple agents when there’s a clear systems benefit, not to mimic human organizational roles.

How should teams balance deterministic and probabilistic reasoning in their AI systems?

Don’t expect a single large model to handle all reasoning and decision-making. This is especially important for governance and control. For decisions that matter to your business, humans should retain the right to make or understand those decisions, which requires some level of determinism and explainability.

The neuro-symbolic approach combines both: use probabilistic models where appropriate, but also leverage deterministic reasoning (including graph-based reasoning) where you need explainability, accuracy guarantees, and control. Graph databases excel at deterministic “connect the dots” operations—pattern matching across relationships that yields consistent, auditable results.

Think about your AI system as having multiple reasoning components: models for orchestration and probabilistic tasks (natural language understanding, creative generation), deterministic tools including graph queries for structured reasoning (rules, policies, fact verification), and domain-specific fine-tuned models where needed. Each component plays to its strengths, and together they create a more reliable, explainable system than trying to force everything through a single large language model.

The practical implication: when you can solve part of your problem with a deterministic tool you control, you should. Don’t default to asking a large language model to probabilistically reason through something that has a clear, deterministic answer. Use LLMs for what they’re good at (understanding natural language, creative tasks) and graphs for what they’re good at (facts, logic, deterministic reasoning).

How do graphs contribute to AI governance and building more trustworthy systems?

Graphs enable the neuro-symbolic approach described above, which is fundamental to governance. Instead of relying solely on a probabilistic, black-box LLM for all reasoning, you combine it with a deterministic, symbolic system like a knowledge graph. For certain classes of questions, the agent can query the graph and get a fully deterministic, explainable, and auditable answer.

This allows you to build systems where humans retain agency and can understand the basis for critical decisions. You aren’t forced into a binary choice of either trusting a black-box model completely or not using AI at all. You can inspect graph queries, validate reasoning paths, and ensure that business rules and policies are enforced deterministically rather than hoping a model follows instructions.

Policy and rules can be implemented as graph queries that check eligibility, dependencies, access rights, or compliance requirements. These checks are transparent, can be tested independently, and provide audit trails. When an agent makes a decision, you can show both the probabilistic reasoning (LLM generation) and the deterministic checks (graph queries) that validated it, creating a complete, defensible decision record.

Keeping knowledge outside the model (in a graph) rather than solely in model weights also ensures freshness and allows updates without retraining, which is crucial for maintaining accurate, trustworthy systems as business conditions change.

What’s the relationship between semantic layers and knowledge graphs?

A semantic layer is effectively a graph of your schema and business concepts—the metadata that describes how data is organized and what terms mean. Many teams start with a semantic layer to help LLMs generate better SQL by understanding table relationships and business definitions.

The natural evolution is to include not just the schema but also selected data and relationships, transforming the semantic layer into a true knowledge layer. This expanded graph contains both the conceptual model (what entities and relationships exist) and actual instances of those entities with their connections. This knowledge layer supports both deterministic graph reasoning (traversing actual data) and improved RAG (entity-linked retrieval), providing richer capabilities than a schema-only semantic layer.