The Intersection of LLMs, Knowledge Graphs, and Query Generation

Semih Salihoglu on harnessing LLMs – from queries to knowledge graphs and beyond.

Subscribe: AppleSpotify OvercastGoogleAntennaPodPodcast AddictAmazon •  RSS.

Semih Salihoglu  is an Associate Professor at University of Waterloo, and co-creator of Kuzu an open source embeddable property graph database management system. This episode explores the use of large language models (LLMs) for generating queries across different query languages like SQL and Cypher for graphs. It examines the potential and limitations of LLMs in handling complex query constructs like recursion, subqueries, and joins.

Subscribe to the Gradient Flow Newsletter

The episode also covers the integration of knowledge graphs with retrieval-augmented generation (RAG) systems for question answering, and the automated construction of knowledge graphs from text using LLMs. Additionally, it touches on broader observations around combining LLMs with logic-based reasoning, the need for rigorous benchmarking and evaluation, and the role of data modeling in LLM performance. Overall, the episode provides insights into the current state and future directions of leveraging LLMs and knowledge graphs for various data querying, retrieval, and reasoning tasks.

Semih Salihoglu’s blog posts:

Interview highlights – key sections from the video version:

Related content:


If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:


Transcript.

Below is a heavily edited excerpt, in Question & Answer format.

What are the key challenges in using LLMs to generate complex queries across languages like SQL, Cypher, and SPARQL?

There are several challenges when using LLMs to generate complex queries. Most current research hasn’t focused on generating queries that use recursion, which is particularly important for graph query languages like Cypher with specialized syntax. There’s also little work on sub-queries, which traditional text-to-SQL literature studies extensively. Other underexplored areas include unions of joins and arbitrary join paths – capabilities that exist in graph query languages like Cypher but not in SQL. These represent good directions for future research.

The current benchmarks like WikiSQL and Spider aren’t necessarily representative of enterprise scenarios. They focus on simpler tables that aren’t significantly linked to each other. The one more realistic benchmark I found was from data.world, which uses an insurance company database, but generally, we need more benchmarks that represent real enterprise use cases.

How effective are LLMs at generating queries for simple data models like star schemas?

For simple schemas that are easy to understand and queries that aren’t overly complex, LLMs should perform quite well. Out-of-the-box LLMs being able to generate queries on single or two-to-three tables is already impressive. However, for large, complex systems – like a 1,000-table transactional OLTP system – we’re very far from full automation. Any task requiring exact accuracy, like ETL pipelines, will likely need human involvement for the foreseeable future. I’d be surprised if within the next 20 years we see full automation at that scale where people completely trust what LLMs generate.

What’s the current state of research in LLM-based query generation across different query languages?

There’s significantly more work on SQL than other query languages. There’s a bit of work on SPARQL and virtually no rigorous academic work on Cypher. While graph databases like Neo4j and Kuzu have Cypher-LLM chain integrations, deeper technical academic work focusing on the unique capabilities of graph query languages is missing.

The research I’ve seen is primarily empirical, testing LLMs with different prompting techniques on benchmarks like WikiSQL or Spider. These studies examine how important it is to provide schema information, the effect of providing examples, etc.

What interesting research directions would you recommend for making LLM query generation more robust?

One area I’d like to see more research on is the effects of data modeling. If I were to study a topic in this space, I’d look at how different logical models of the same underlying data affect LLM query generation. You could model the same records in different table schemas, or as an ontology in a more graph-structured, object-oriented manner, and then evaluate which approach makes things clearer to the LLM.

This question seems more fundamental than prompting details, as it’s likely to yield longer-term insights. Understanding very complex database schemas with hundreds of tables will continue to be challenging for LLMs, and data modeling should have important effects for the foreseeable future.

How are Knowledge Graphs being integrated into RAG and question answering systems?

Surprisingly, this promising direction doesn’t seem to be where information retrieval experts are focusing their efforts. Since the neural retrieval era began, most work I’ve seen has been on improving standard RAG (taking document chunks, embedding them, and retrieving relevant ones) rather than enhancing it by connecting chunks to a knowledge graph.

Most of the work on integrating knowledge graphs into RAG comes from commercial company blog posts that aren’t technically deep. If you believe knowledge graphs can improve RAG and question answering, this should be subjected to rigorous evaluation similar to how information retrieval papers are evaluated – and that’s currently missing.

The information retrieval community is still focused on improving neural retrieval, re-ranking, chunking, and extraction techniques, which may explain why they haven’t yet explored knowledge graph integration extensively.

What is the potential of automated Knowledge Graph construction for improving RAG systems?

There is promising work on using LLMs to extract triples (subject-predicate-object facts that form the core of knowledge graphs) from text. While the quality isn’t yet on par with specialized models like Rebel, it’s not bad – perhaps B-level rather than A-level.

Surprisingly, nobody seems to be studying the economics of this approach. Using LLMs at scale to extract billions of triples would likely be expensive and slow compared to specialized techniques. For serious large-scale triple extraction, specialized techniques or fine-tuned smaller models would probably be more economical.

If we assume knowledge graphs improve RAG (which is plausible but not yet rigorously proven), then automated knowledge graph construction becomes important. The field needs both rigorous studies showing the value of knowledge graph integration and developer-friendly tools that make it easy to implement.

What would “auto-RAG” systems look like and how might they develop?

The field will likely move toward systems where developers don’t need to manually optimize every component of the RAG pipeline. Similar to how Elasticsearch works – you tell it what to index, and it handles the ranking without requiring you to specify every detail – future RAG systems may automatically handle information extraction, chunking, embedding, retrieval methods, and knowledge graph integration.

This will be driven by the fact that while simple RAG setups can work for low-stakes applications, high-stakes tasks requiring precise, accurate answers need much more optimization. Small teams can’t handle this complexity on their own, creating demand for auto-RAG systems.

For this vision to materialize, we need both automated knowledge graph construction tools and developer-friendly workflows that simplify implementation. The industry might lead this development before academic studies validate the approach, or vice versa.

Will knowledge graphs ultimately prove valuable for RAG systems?

While I don’t want to make a definitive prediction, I’m deeply curious about the potential of linking document chunks through a knowledge graph, mapping them to entities, and enabling more inference-related capabilities. It’s clearly a good topic for further research.

What I will speculate on is that we’ll likely see systems combining logic-based reasoning (often done on knowledge bases or knowledge graphs) with LLMs. The work of the late Stanford professor who founded the OpenCyc project demonstrates how knowledge bases and LLMs could complement each other. To convince us that LLMs are truly intelligent, they’ll need to demonstrate reasoning capabilities, which may drive this integration of knowledge representation with LLM technology.