Supercharging AI with Graphs

Ben Lorica

2 years ago

Neo4j’s Philip Rathle on the Rise of GraphRAG and GQL.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.

Philip Rathle, CTO of Neo4j, joins the podcast to discuss the rising popularity of graph-enhanced retrieval augmented generation (GraphRAG). He shares real-world examples of companies using GraphRAG in production for applications like enterprise search, supply chain risk analysis, and criminal investigations. He also discusses the potential impact of the new GQL graph query language standard. [Link to the demo that Philip showed.]

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Related content:

A video version of this conversation is available on our YouTube channel.
Link to the demo that Philip showed.
Enhancing RAG with Knowledge Graphs: Blueprints, Hurdles, and Guidelines
Is Your Data Strategy Ready for Generative AI?
Emil Eifrem → The Future of Graph Databases
Semih Salihoglu → The Intersection of LLMs, Knowledge Graphs, and Query Generation
Joao Moura → Unleashing the Power of AI Agents

If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Transcript.

Below is a heavily edited excerpt, in Question & Answer format.

What is Neo4j and what does it bring to the AI landscape?

Neo4j is a graph database that powers applications with graphs and knowledge graphs. We’re the only graph database with vector search capabilities built in. This means we can handle both the structured relationship data that graphs excel at while also supporting vector embeddings that are crucial for modern AI applications.

What exactly is graphRAG and why is there growing interest in this area?

GraphRAG (Retrieval Augmented Generation) refers to patterns where knowledge graphs play a role in improving LLM outputs. Over the last year, we’ve seen existing Neo4j customers finding ways to integrate their graphs into their RAG pipelines. Many organizations hit a ceiling with basic LLM implementations, then they fine-tune and hit another ceiling, then add vector-based RAG and hit yet another ceiling. Knowledge graphs and graphRAG often emerge as the next logical step to improve performance further.

The interest is growing because knowledge graphs provide context, relationships, and structure that help ground LLM outputs in factual information, making answers more accurate and reliable.

What patterns are you seeing in how organizations implement graphRAG?

There are multiple patterns emerging. On a functional level, we’re seeing:

Post-filtering – Using vectors to get initial results, then filtering or ranking them with graph-based centrality algorithms (similar to PageRank).
Pre-filtering – Starting with graph navigation to find relevant entities, then using vectors afterward.
Reasoning locus variation – Some implementations bring data back from the graph to inform the LLM, which then makes decisions. Others use the LLM to translate questions into graph queries, let the graph do exact calculations, then have the LLM translate results back to natural language.
Knowledge Graph context injection – Extracting relevant portions of a knowledge graph (e.g., everything three levels out from a particular entity) and injecting this as context for the LLM, allowing for informed creative assistance.

What’s the difference between domain graphs and lexical graphs in the context of RAG?

Domain graphs represent your world model – the entities and relationships in your specific domain (like people, roles, and offices in HR, or computers in a network for cybersecurity).

Lexical graphs, which surprised us in their effectiveness, represent the structure of your documents and chunks. These capture relationships like document → pages → chunks, or the sequence of chunks, or whether content appears in a table, margin, footnote, or header. Microsoft’s research showed substantially better accuracy when representing vectorized content as a graph rather than treating chunks independently.

Our tools create both types of graphs, and they can work together to provide even better results.

How is Neo4j making graphRAG more accessible to developers who aren’t graph experts?

We’ve developed tools that make it easier to get started with graphRAG without deep graph expertise. We recently demonstrated a tool that automatically constructs knowledge graphs from various sources. You can upload PDFs, web pages, YouTube videos, specify entities of interest, and the tool automatically builds both domain and lexical graphs.

This democratizes the technology – developers can try it themselves and compare results between basic RAG and knowledge graph-enhanced RAG approaches. While graph sophisticates can go deeper with custom ontologies and specialized algorithms, our goal is to make the entry point accessible to anyone building AI applications.

What real-world use cases are companies implementing with graphRAG?

Several customers are approaching production or already in production:

Enterprise knowledge search – Companies in oil & gas and fintech are using graphs to augment vector search, both for ranking results (similar to how PageRank improved web search) and for navigating knowledge graphs to find information.
Supply chain risk assessment – A manufacturing customer is democratizing access to supply chain risk information through a text interface, using the LLM to translate questions into graph queries, with the graph handling complex multi-level calculations.
Criminal investigation assistance – Data Squared helps investigative agencies by extracting relevant portions of knowledge graphs around cases, translating this to context for LLMs, and enabling investigators to chat with their evidence to get creative assistance and potential leads.

How does Neo4j’s vector search capability work with graphs?

From a database perspective, vectors are just another property that can be stored on a node, alongside other properties like names or IDs. We’ve added a vector datatype and indexes that can perform similarity calculations.

While most databases now support vectors, having vectors and graphs in the same database enables powerful combinations. You can perform vector searches and graph searches (pattern matching or algorithm-based) in the same place with fewer calls. This is particularly valuable for lexical graphs where the structure of your vectorized content is represented graphically.

Some customers start with separate vector databases and later add Neo4j for graph capabilities, which is also a valid approach. We work well with existing vector stores like Elasticsearch, Pinecone, or Weaviate.

What is GQL and why is it significant for the database world?

GQL (Graph Query Language) is only the second database query language that ISO has deemed important enough to standardize, after SQL. Published just two weeks ago after about five years of development, GQL reaffirms that the graph model is a fundamentally different way of representing data.

This standardization means that graph skills become more reusable, vendors can’t lock customers in, and graphs will become more pervasive as people recognize more problems as graph problems. The standard includes rules for mapping tables into graphs, making it easier to transition between relational and graph models.

What future trends do you see involving graphs and AI?

Two major trends:

Agentic workflows – As AI systems involve more complicated workflows with complex orchestration and logic, graphs provide a natural way to store and manage these workflows, offering resilience through database persistence.
Synthetic data generation – Using knowledge graphs to generate training data, especially for scenarios that would be impossible or dangerous to capture in the real world, while maintaining realistic relationships and structure.

From Ben’s perspective, if “AI is the new electricity,” then graphs will play a crucial role in providing structured knowledge. This could manifest in two extremes: massive graph analytics directly over data lakes without separate graph tools, and small, embedded, ephemeral graphs that run anywhere AI is needed.

Are there any recent developments or partnerships that demonstrate these trends?

We recently announced a partnership with Snowflake that will allow users to take data in Snowflake, spin up Neo4j in memory inside the Snowflake container service, run graph algorithms like PageRank or community detection, and push the results back to annotate the original data. This brings graph capabilities to where data already exists.

Google demonstrated graphRAG (though they didn’t call it that) at Google I/O for complex travel planning scenarios, showing how graphs can help solve intricate real-world problems that involve multiple entities and relationships.

Neo4j’s Philip Rathle on the Rise of GraphRAG and GQL.

Transcript.

Share this: