Site icon The Data Exchange

The Data Layer Enterprise AI Has Been Missing

Andrew Moore on Knowledge Graphs, Entity Resolution, and the Future of Enterprise Agents.

Subscribe: AppleSpotify OvercastPocket CastsYouTube •  AntennaPodPodcast AddictAmazon •  RSS.

Andrew Moore, CEO of Lovelace, former head of Google Cloud AI, and former dean of Carnegie Mellon’s School of Computer Science, joins the podcast to discuss YottaGraph, a knowledge graph growing by a billion facts a week that serves as a context engine for enterprise AI agents. He explains why fully automatic knowledge graph construction is the only viable path at scale, why entity resolution remains a brutal engineering problem, and how graph theory tricks make million-node queries answerable in under a second. The conversation closes with Moore’s views on the future of computer science education and the open-weight model landscape.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript



Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a polished and edited transcript.

Ben Lorica. All right, so today we’re in for a treat. We have a computer science luminary — former head of Google Cloud AI and former dean of Carnegie Mellon’s School of Computer Science — Andrew Moore. Andrew, welcome to the podcast.

Andrew Moore. I am really happy to be here. Thank you.

Ben Lorica. For this conversation, Andrew is wearing the hat of CEO of Lovelace AI — love the name — which you can find at lovelace.ai. The tagline is “the context engine for enterprise agents: we make autonomous agents work for mission-critical analysis.” I’m going to start with something you folks just did around deep research. You published work showing that you can match these commercial deep research tools from OpenAI, Claude, Perplexity, and so on, using a lightweight language model. The results are comparable, but the main thing is it’s so much cheaper. And underneath this, powering it, is a graph. Andrew, is it fair — are you okay if I refer to YottaGraph as a knowledge graph?

Andrew Moore. Yes, absolutely. That’s a very well-established term.

Ben Lorica. So basically this is a retrieval engine grounded by a knowledge graph. Give us some high-level stats and facts about YottaGraph.

Andrew Moore. It’s currently growing by about a billion facts a week, and it really is an attempt to understand everything that’s happening in the world. It understands all companies, big and small, in the world. It understands the locations of all the ships and aircraft in the world, and most importantly, it understands events — things reported in the news and on social media — and it also figures out events just by looking at how things move. So it’s meant to be a system of record of human activity. The reason we’ve built it is that when you’re asking a large language model “what’s happening right now?”, I want to make sure it’s got the answer at its fingertips.

Ben Lorica. You’re implying that YottaGraph is updated regularly. Google has its own knowledge graph and prides itself on crawlers that crawl the web almost instantaneously — if someone posts something, it’s pretty much in their index right away. What’s the latency for YottaGraph?

Andrew Moore. Sometimes, especially when we’re doing things with transportation and watching the movement of vehicles, planes, and so forth, we need to be up to date to within minutes. We’re not doing absolute real time at the moment, but what is very important is — for example, suppose someone is watching a large waterway, or a harbor master in a port says, “Hey, what’s that vessel doing? It shouldn’t be doing that — is there a problem?” The AIs can think very quickly, figure out all the relationships and everything they know about what’s happening, and answer that question immediately. It’s not okay to say, “Oh, let’s wait for our next hourly update of the index and we’ll tell you.”

Ben Lorica. The Google Knowledge Graph is often described as the largest knowledge graph around. I assume you’re not trying to compete with it — your emphasis is on different domains and different data sources. Google uses the public web, so obviously the signal-to-noise can be sketchy, especially on social media. Can you differentiate your data sources and focus from that graph?

Andrew Moore. Yes. As you know, I come from Google, and I very much believe in a lot of Google’s big architectural design decisions — I’m a big fan of them. What we’re doing is something pretty different. So many large organizations — from government to big financial institutions all the way down to fellow startups — need to have their own knowledge graph, and they cannot afford to spend months on massive projects to get all their data consolidated. What we’re really focusing on is, if you like, “knowledge graph in a box” — providing the technology so that someone who wants to link everything about their organization can do so. Maybe it’s folks monitoring wildfires, maybe it’s folks monitoring the safety of shipping, maybe it’s folks looking to prevent money laundering or sex trafficking. All of those people need public knowledge, but they need to join it with their own internal knowledge, which no one — including Google — is able to publicly scan.

Ben Lorica. So for now, let’s focus on YottaGraph. What’s the difference between YottaGraph and the Google Knowledge Graph?

Andrew Moore. We’ve designed YottaGraph to be extremely fast to query. When I say extremely fast, you need to be able to jump over sometimes millions of nodes in less than a second while doing certain inferences. What does a graph give you? A graph gives you huge numbers of entities in the world, and then all the observed relationships and links between them — every fact about them, including facts that change over time. And many questions are not just about one link; they’re about the patterns of what’s going on. A common kind of question people will ask our YottaGraph is, “There’s been a massive problem for this company — who’s going to be affected? What’s the blast radius? What’s the impact?” Or, “We now know that this activity is illegal — are there any purportedly legitimate companies that are going to be badly affected by it?” For those kinds of things, you need to suddenly look at millions of pieces of information at once, and we’ve architected the system to be very good at answering those big, broad queries really quickly.

Ben Lorica. It sounds like YottaGraph focuses on things like companies and prominent people. I’m trying to understand — because you’re not trying to rebuild the Google Knowledge Graph, which is meant to power a search engine. You’re more focused on building a graph that, when I use your platform and tools to build my own internal graph, YottaGraph can supplement what I do. So it sounds like YottaGraph focuses on things that companies might want to supplement their internal data with.

Andrew Moore. Exactly. Let me give you an example. We’ve been asked to help with monitoring waterways, where you’re observing the movements of all kinds of ships. There’s public information about what they’re carrying, who owns them, and who owns the cargo. The unique kind of question people want to ask is, “Hey, this ship seems to be burning extra fuel right now for no apparent reason — what’s going on?” And sometimes that leads to an identification of weapon smuggling, or just inefficient use of resources. What’s interesting is bringing lots of different pieces of information together at the same time. You’re not just looking at the trajectories of individual objects; you’re looking at swarms of objects, figuring out why they’re doing what they’re doing and why their behavior differs from each other. The kinds of discoveries we’ve made include, for example, groups of objects that shouldn’t have any relationship to each other — and we suddenly realize, “Wait, these are all part of a deal between two governments that was never made public, and that’s why they’re acting together.” So when you ask an AI a deep question like, “What’s going on here? Is there any evidence that someone’s working with this entity? Has anything changed in the last 24 hours that indicates a new threat?” — those questions need to spider out to all kinds of new bits of information, from satellite data gathered in the last 30 minutes all the way through to old Wikipedia articles about the founding philosophy of a political organization. The AI needs to figure all that out and, in at most a minute, come up with an answer so that an operator can decide if they need to act.

Ben Lorica. I’ll ask you a bit more about the graph in a second, but what is the current status of YottaGraph? It’s not like DBpedia, which I can download and run — the only way to use YottaGraph is by engaging Lovelace, right? I actually used the chat to research Oracle, CoreWeave, and Nebius, because I’m interested in AI data centers. I wanted to see what it would tell me about the financials and financial prospects, and I guess the headline is not so good for these AI data center build-outs, based on what I could glean from YottaGraph. One of the things about knowledge graphs — I don’t know if you followed this a couple of years ago — after RAG came Graph RAG.

Andrew Moore. Yes.

Ben Lorica. There was a wave of excitement around Graph RAG, and of course reality set in when people realized, “Oh, this is cool, but I don’t have a graph — and not only do I not have a graph, even if I did, it would be hard to maintain and update.” So this is where your company comes in. Is it fair to describe what you do as almost automatic knowledge graph construction?

Andrew Moore. It has to be completely automatic. It’s fine to say we’ll have a human helping us if we’re building some small knowledge graph with 1,000 nodes that people can point at and say “I like that.” Once you’re into the billions, you’re automating the heck out of it. We have whole suites of agents double-checking what they’ve found, and there’s a huge amount of work that goes into corroborating information. If we’ve got five different channels through which we’re understanding something and four of them agree while one doesn’t, our agents will go out and figure out whether there’s misinformation, noise, or a problem. But the scale of trying to understand what the entire human race is doing on any given day means it has to be 99.9999% automatic.

Ben Lorica. Here’s an imaginary example. I’m JPMorgan Chase — a huge multinational bank with offices all over the world and countless IT systems and IT fiefdoms. I’m assuming when you engage with a company, you probably try to focus first: “Let’s build a knowledge graph for this particular group,” and then if that takes off, they can evangelize inside the company and grow from there. But the claim is you can do that using your tools without a human in the loop.

Andrew Moore. We might still end up with humans involved, but they’re very much like Captain Kirk on the Enterprise. They get to see a top-level summary of what the agents have done, and then there’s a conversation: “Are you worried about this? Do you like what we’re doing?” A human might say, “Show me examples of how you did linkages between 10-K and 10-Q form results” — because a skeptical human wants to verify that performance is correct. But everything has to be done at very high scale, where the human can ask the high-level questions. Once we reach a point where a human says, “Let me just double-check whether the name of this company matches the name of that company” — that is death. And that’s why one of the most unpleasant jobs, in my opinion, in the world of large data in corporations is something called Master Data Management.

Ben Lorica. There are people whose whole jobs it is.

Andrew Moore. Yes, exactly. You spend your whole life down in the data mines with regular expressions, trying to make sure that two or three disparate databases can still link to each other. Getting rid of that toil is what we’ve been working on so incessantly, because all these grand proclamations about the wonders of what AI can do in big, high-responsibility enterprises wither and die if no one can actually manage the data. I was so passionate when I started this company precisely because I knew that AI is great for chatbots or advising you what color sneakers to wear, but when you’re trying to find financial crime from people who are trying to hide, or you’re trying to connect the dots before a terrorist attack happens, you’ve got to make sure the AI has a clear and understandable view of the data.

Ben Lorica. What are the typical sources of data when you engage with companies? PDF files, CRM systems, enterprise software systems? A lot of data also resides in databases — structured data, data warehouses, lakehouses, OLTP systems. Are you specifically optimized for unstructured or semi-structured data, or does it not matter?

Andrew Moore. We have to be good at both. It is not going to work if we are only good at one or the other.

Ben Lorica. Even structured data?

Andrew Moore. Yes. We consume terabytes of structured data every day, reading it and processing it into an intermediate form which we call “fetch messages” — the same intermediate form we produce from unstructured data like legal documents or news reports. You can think of the whole process as a massive production line of incoming sources: PDFs, photographs of documents discovered somewhere, beautifully structured relational tables, snippets of JSON from a captured laptop — many things like that. The first part of the process is these production lines of data wranglers turning everything into this more general-purpose form called fetch messages, which are the embryos that will eventually become nodes and relationships in a knowledge graph. The key point is: you cannot build an AI that looks at just one flavor of data. The whole idea of an AI is that it needs to know how the world works, and it needs to understand the context behind every data element.

Ben Lorica. And of course, the dirty secret in enterprise AI right now is that a lot of enterprises don’t have their data ducks in a row. They want to use AI, but AI can’t really access the data or leverage the relationships within it. By the way, Andrew — in structured data, which often resides in relational databases, there’s a schema. I’m assuming you leverage whatever is in the schema and the relationships between the data as well?

Andrew Moore. Yes, but it’s interesting. A common scenario is we might be working with a company that, over the last four years, has merged with seven or eight other companies. They might even be using the same database product, but each company has a very strong “accent” in what its schema means. One subsidiary might use the notion of the time of an event very differently from another — it looks the same from a pure schema perspective. Our statistical algorithms and agents do have to spend a lot of time figuring out, even for something that looks like perfectly structured relational data, how to actually interpret these nuances if they’re going to reason about it across the board.

Ben Lorica. So obviously an important part of this process is entity resolution — something a lot of people are familiar with. You can scratch that itch and build a working entity resolution system over a weekend, or now with Claude, maybe in an hour. But the problems are maintaining it, scaling it, and making sure it’s performant. If you’re in a domain where you want entities resolved quickly, you want your entity resolution engine to adjust to new information — so if I learn more about someone named “Andrew,” I can resolve it to Andrew Moore, and so on. Entity resolution is a big piece of this. Did you folks build your own engine?

Andrew Moore. We have built our own engine, and it was essential to do so. What I appreciate in your question is that entity resolution is a [bleep] nightmare. It is an awful thing to do, but you’ve got to take it seriously. We’ve put a lot of work into scaling up our system and making sure that —

Ben Lorica. What entities is your system particularly strong at? I’m assuming names of people, names of companies, addresses…

Andrew Moore. Yes — places too.

Ben Lorica. If I’m a company you engage with and I have a specific type of entity you’ve never seen before — say, some kind of novel product category — how quickly can your engine start resolving those entities?

Andrew Moore. It’s only in the last two months that we’ve started letting it do this autonomously, but you’re absolutely right that you need to support many different flavors of entities. Your agents quickly get stuck in a corner if they’ve got a finite world to choose from. We now have technologies called “schemaless flavors” and “schemaless properties,” which operate at a meta-level of resolution to determine whether an entity or a relationship is genuinely novel or should be associated with something that already exists. For example, the simplest case: you might have one database that says “Person A works at Company B,” and another that says “Person A is employed by Company B.” Those are semantically equivalent, and our knowledge graph still records them as separate terms — because there might be something critical to a particular use case that needs to know they’re different — but it’s also clustering them, ready to use them both for any aggregate question that just needs the notion of a person working for a company. We’ve long realized we humans cannot be telling the ingestion systems what flavors they’re allowed to use. Early on, we had a notion of “products” — things shipped in container ships, put onto vessels, sold in stores. It became clear that pharmaceuticals are a special kind of product; as any good data modeler would understand, it doesn’t make sense to lump them together, because they have different semantic roles in different cases. Eventually there’s enough statistical evidence to conclude that pharmaceuticals and other products are distinct and need to be recorded separately. The infrastructure supporting the knowledge graph has to be able to handle massive “undos” — realizing, “I wish I’d given these two things different labels from the start.” As it gathers more evidence, it will introduce new flavors by itself.

Ben Lorica. How good is your entity resolution across languages? It might be strong at resolving English names, but what about Arabic or Swahili?

Andrew Moore. We’ve primarily been working on the languages our customers care about, and a lot of those happen to be Middle Eastern as well as Euro-American types of entities. When we encounter a completely new language, we now have a workflow for onboarding it, and where it used to take a week, it now takes hours. In general, we try very hard to score ourselves on entity resolution quality using tough golden test sets of very ambiguous objects, and we’re always monitoring to make sure those numbers are improving. We have a 99.5% entity resolution quality metric that we use — though the trouble is that saying 99.5% is meaningless without knowing relative to what.

Ben Lorica. I only care about my own data, right?

Andrew Moore. Exactly. And honestly, you could get four nines, maybe five nines, if you just asked “Given a random pair of entities, are they matched or not?” — that kind of categorization gives you fantastic precision even if you don’t get great recall. Based on our internal metrics, we believe we are exceeding the quality of anyone else, including our progenitor Google, within our domains. But boy, it is such a battle to maintain that.

Ben Lorica. And of course, data itself is messy — inside these systems there could be a mix of Spanish, English, and Chinese. So your entity resolution engine needs to handle that too.

Andrew Moore. Yes, it has to. One of the unique things about the way we built things is that we don’t just look at lexical information to help with identity. For example, aircraft are often referred to colloquially by different names, and in different parts of the world they may even use different identification numbers. Sometimes we can see in the movement data that an aircraft will suddenly disappear, and a whole new aircraft will appear in almost exactly the same place and then carry on. That is a hint that we can resolve these two records together based on their physical characteristics. This especially helps us with corroboration.

Ben Lorica. You’ve alluded to multiple agents working together, with some agents checking each other’s work.

Andrew Moore. Exactly. There are two kinds of checking. Verifying that you’ve found a connection is one thing, but — and I’m going to sound mathematical here —

Ben Lorica. Go ahead.

Andrew Moore. If your sources of error are independent, getting four corroborating identities suddenly gives you a massive, multi-nines level of confidence — far more than any single pair of entities would give you.

Ben Lorica. Without giving away the secret sauce, let’s say one of the agents is based on the Gemma family of models. I’m assuming the agent checking that agent’s work might be based on another family?

Andrew Moore. Occasionally. We’re not very strict on which families we use, and some customers do want us to stay within a particular family.

Ben Lorica. I see. So when you engage a customer — let’s say JPMorgan Chase, which happens to be a big Anthropic customer — and they say, “You can build the knowledge graph for us, but you have to use our Anthropic models”…

Andrew Moore. Yes. Any small AI company like mine — we do not dictate to customers. We couldn’t get away with it anyway.

Ben Lorica. So your system will work regardless of the model family. That’s interesting, because some of these agents are so prompt-sensitive that if you have an entity resolution agent based on Gemma and someone tells you to migrate it to Claude, the prompts no longer work.

Andrew Moore. I agree with that, but the one thing we’ve got on our side is thorough golden test sets. That’s the real lingua franca for jumping between agents. You say, “Agent, your highest-level task in entity resolution is performing sophisticated queries to check entity commonality. Look at this golden set. Make sure you’ve tuned yourself to get 100% on this massive golden test set.” At that point — without my engineers or any humans being involved — we can do that kind of porting. The key is putting the work into massive golden sets for self-tuning and never saying, “Let’s have a human tweak this next time it moves over,” because even within the same model family, things change every day. We cannot make any assumptions that these AI tools are going to be stationary.

Ben Lorica. By the way, just your entity resolution engine alone could probably be sold standalone, because companies struggle with it — even without the rest of the platform.

Andrew Moore. You’re very incisive. My board has occasionally been kicking me in the head about that — “Just focus on this, it’s gold.” But of course, we’re obsessed with what our customers need right now. They need entity resolution as a means to an end.

Ben Lorica. Means to an end.

Andrew Moore. Dangerous things happening in the world.

Ben Lorica. So, entity resolution is the first step in building the graph. Once the graph is built, I start using it. Basically, the technology is a platform for automatic knowledge graph construction, and assuming you buy the premise — and I’m sure most of our listeners do — that providing this level of context to an AI, whether via Graph RAG or whatever you call it, really increases accuracy. In terms of retrieval, ranking, and the ability to add provenance to reassure the user — can you describe anything you’d consider “don’t try this at home” territory?

Andrew Moore. Oh yes, definitely. Let’s say I already have the graph, and I think I don’t need Andrew anymore — I can just go it alone?

Ben Lorica. Exactly.

Andrew Moore. Well, I would love it if we could achieve that, but you’re right — I’ve stubbed my toe on so many problems even when I have a really good graph. One of them is amendability. Knowledge graph construction is so tough that it’s very tempting to treat it as a massive, one-way, monolithic structure.

Ben Lorica. “I’m done — I don’t want to touch it again.”

Andrew Moore. Exactly. You cannot operate that way. You have to be able to either have a user or — 99.9% of the time — an agent say, “That link you’ve got there is wrong. Here’s all the evidence to prove it.” And you’ve got to be able to undo that without breaking the rest of the graph.

Ben Lorica. That brings up another point, Andrew — you need version control. If I’m a regulator using your system, I run a query, present the results, and the regulator says, “This is wrong,” but you’ve already changed the graph — I can no longer point to what the graph said yesterday, which is why it produced that result. So there has to be version control.

Andrew Moore. There absolutely needs to be complete lineage of what the graph was “thinking.”

Ben Lorica. Yes.

Andrew Moore. You may remember that I mentioned one of the first things we do is turn everything into this lingua franca of fetch messages — independent packages of information. Those never die. Any chain of inference, any big spreading-activation function, any big Bayesian probabilistic model always refers back to all the things it used to make its decision. What can happen is, if we’ve had to do a big re-resolution — because it turned out that three different Andrew Moores were actually the same Andrew Moore —

Ben Lorica. Right, that’s the point.

Andrew Moore. Those underlying fetch messages never change. They’re always there. But our record of which fetch messages are associated with which node in the graph — that may have changed. And we always know where we were when those changes happened.

Ben Lorica. So I can always audit.

Andrew Moore. Yes, you have to be able to. The second issue for knowledge graphs is that when you’re doing a chain of inference on the data, the agent performing that chain has to retain exactly what it used. We have back-pointers to all that information. Combine that with the need to undo mistakes: you cannot have a knowledge graph that can never change its mind about how it’s represented.

Ben Lorica. This is a good point, because before — back when the few companies that actually had knowledge graphs had teams of people hand-building them —

Andrew Moore. Yes.

Ben Lorica. It’s hard to blame those people. They worked hard on building those knowledge graphs, but getting them to change it was a whole process.

Andrew Moore. In one of my roles at Google, I had thousands of people in various parts of the world doing manual updates to things like large product catalogs. It’s just like everything else with artificial intelligence — it takes quite a big intake of breath to realize: if I’m actually going to use AI, I cannot live in a world where I’m counting on lots and lots of humans doing this work. Some customers — I don’t know if you’ve heard this phrase — people talk about whether someone has taken the “AI red pill” or not. Some customers find it really hard to imagine a world where they don’t have an army of humans checking things. Others understand that they now live in a world where powerful AIs need to do that checking for them, because frankly, no humans enjoy the job of being a knowledge graph checker.

Ben Lorica. So it sounds like the “don’t try this at home” items still pertain to the knowledge graph itself and not to retrieval and search ranking. What else is on that list?

Andrew Moore. Here’s another doozy. I know you’ve got an eclectic audience with different interests, so I’m about to describe something that maybe only a handful of listeners will care about — but it really, really matters for performance and answering questions in real time. In a serious knowledge graph, every node is probably going to have hundreds of thousands of links going out to other nodes, time-stamped with things that are no longer relevant or are relevant only to historical queries. If you start querying your knowledge graph the way people did back in 2015 and ask, “Tell me about all the two-hop links,” you suddenly have your LLM looking at 25 billion paths. You have to use graph theory tricks to make sure that every question that needs to look at a lot of data still only ends up examining millions of pieces of information at most. There’s a whole world of graph theory — really fun if you’re a mathematician — around how you figure out how one node in a graph impacts another without massive compute. That’s been really necessary, because it’s heartbreaking if you’ve worked your butt off to build a good graph and the first time you want to ask it a meaningful question, it grinds to a halt.

Ben Lorica. And because you don’t apply the graph theory you alluded to, you end up compromising on the query side. Which brings up something I’ve been meaning to ask: is there an equivalent of a graph database underneath all this? Because a lot of them don’t scale to your needs, right?

Andrew Moore. Yes, that’s right. We have a persistence layer and a storage layer, and some of that is our secret sauce. It involves what you might call very low-level database engineering.

Ben Lorica. In other words, you’re not using an off-the-shelf graph database like Neo4j or something like that?

Andrew Moore. No. I have great respect for those, but they cannot possibly — and this is not a dig at them — they cannot possibly operate at the speed you need when a query is going to touch tens of millions of nodes and you’ve got a fraction of a second to come up with an answer.

Ben Lorica. So the graph theory is designed to reduce the search space. Is this a patent-pending technology?

Andrew Moore. We’re going to keep it secret for now. But I’ll tell you the general form of it. Let’s say we’ve noticed that a particular shipping company in Southeast Asia is violating sanctions by going into Russian territory. A human question might be, “Which legitimate members of the seafood supply chain might face a regulatory risk from this?” If you just take that graph and start doing multiple hops out from the offending vessel, not surprisingly — in the world we live in — after three steps you’ve got a billion things. If instead you treat it as a probabilistic question — “Where is the biggest influence under some probabilistic model of who’s going to be affected?” — then there are tricks involving randomized algorithms and spectral decompositions that allow you to answer that question much more computationally efficiently.

Ben Lorica. Some of those techniques don’t scale, though, right?

Andrew Moore. I’ll be frank: we are willing to commit what I’d call responsible statistical crimes in order to give a good answer. As your listeners are probably aware, for a big Bayesian model there is the mathematically correct and completely intractable approach, and then there are approximate algorithms that can still give you confident answers to questions like “which is the worst?” or “tell me the five biggest dangers.” They will give good answers even if they can’t provide exact probabilities. At every point in the system, we have to make that trade-off between efficiency and accuracy. And here’s the thing I care about most: what I’m competing against is a pure LLM using completely abstract, embedded-space reasoning to make its own probabilistic judgment.

Ben Lorica. Generally speaking, the LLM will rely on its weights — knowledge encoded in the weights — or maybe some additional context in the context window.

Andrew Moore. Yes. It’s almost impossible for an LLM to take probabilistic, contradictory pieces of information and arrive at a rational answer. That’s why you have to use math to do it. It turns out I’m already so much better off using approximate, tractable Bayesian techniques than trying to use some impossibly slow, massive probabilistic algorithm.

Ben Lorica. So the way this system works is: automatic knowledge graph construction for enterprise data, with the end result being a fairly reliable and trustworthy knowledge graph — but the goal is for that knowledge graph to be used by whom? Agents or humans?

Andrew Moore. Ultimately, other agents. Our main thrust right now is helping other agents get fast, precise answers to investigative queries. While doing this, my team and I are also making sure the ultimate applications built on top of this actually work, so I do spend a lot of time on customer user journeys and understanding the needs of humans in investigative roles. But there’s no way a humble company like ours can figure out everything the Coast Guard needs, everything the FAA needs, everything a huge bank needs, and everything a local municipality needs. We have to be putting tools in place for those folks to get their own agents to do these things for them.

Ben Lorica. So the idea is: I’m an analyst at JPMorgan Chase, I go to my chat window, I type a prompt, that prompt gets sent to an agent, and that agent determines that the best way to help me is to query Lovelace’s knowledge graph.

Andrew Moore. That’s correct.

Ben Lorica. So it’s like tool use — but the tool being used is the knowledge graph.

Andrew Moore. Exactly. And it really is a perfect market: agents have to figure out who they can rely on and who they can ask. If they need to ask 100,000 questions a day, they need to be sure it’s a cent a question, not $5 a question. They will be making those trade-offs themselves, but my responsibility is to make sure we give incredibly accurate, bang-for-the-buck answers.

Ben Lorica. What’s incredible about this is that — and I hope most enterprise AI leaders understand — their moat is their data, and that data only becomes a moat if you can unlock it. This is a way to unlock that data and truly build an AI-driven company with a real moat. It’s hard for a general-purpose AI to compete with you if you have all this domain-specific data accumulated over many years.

Andrew Moore. I completely agree. In fact, one of the engineering tasks we’ve had to tackle is making sure we never ask any customer to exfiltrate their data to any other cloud, our own servers, or any of our models. The data is so sensitive that we have to deploy our technology inside their own tenant, and it has to run autonomously without a Lovelace engineer ever seeing what happens. That’s why we spend so much time practicing on massive public data — to make sure we’ve achieved that level of quality before we go in.

Ben Lorica. This reminds me of one of my favorite AI apps right now, Real AI, which you can find at real.ai. It’s built by Fundrise, a real estate investment company. They’ve been investing in real estate for many years, they had all this proprietary data, and they wanted to build a real estate analyst agent — essentially automate the job of a real estate analyst. They took all their internal data, started purchasing additional data, built a high-quality and reliable data source, and pointed their engine at it. I’m sure if they had been aware that Lovelace existed, they would have just pointed Lovelace at all their data. The point is: you can really turn many enterprises into true AI enterprises just by using a tool like this.

Andrew Moore. I’m really grateful to you for saying that. It drives me crazy — and it’s very sad — when you see a good AI company send four deployed engineers into a customer, everyone’s enthusiastic, and then everyone’s hearts get broken when they realize, “It’s going to take us six months to get the data ready before we get to do any AI.” I really, really want to fight that problem.

Ben Lorica. And like I said, the open secret in enterprise AI is that data is the problem, not the model. They have all this data and can’t really do anything with it. By the way, Andrew, is multimodality something you folks care about? Enterprises have datasets across different modalities — images, and so on. I assume your engine is neutral in terms of modality?

Andrew Moore. Yes. I’ll give you one example: on one of our SBIR projects, we had to figure out where ships are going when they suddenly turn off their transponders. At that point we switch over to predicting where they’re likely to be and detecting them with satellite imagery. That’s a perfect example of something where your graph mathematics has to do joins with very fuzzy computer vision data.

Ben Lorica. You mentioned the role of “forward-deployed engineer,” which is of course the hot new job title — the new data scientist. If I engage Lovelace, are you going to send 20 forward-deployed engineers? How hard is it to use the system?

Andrew Moore. With our more recent customers, we’re able to walk in — it obviously depends on a lot of things.

Ben Lorica. You need all the plumbing in place first — the data integration, the pipes.

Andrew Moore. I can say with confidence that projects which five years ago I would have expected my teams to spend six months on, we now do in a couple of days. Not because my team or I are any smarter — it’s because we have this massive level of automation.

Ben Lorica. To what extent, when you work with a company and build their knowledge graph, do they supplement it with YottaGraph? What percentage?

Andrew Moore. The majority — it’s pretty essential, though not necessarily for the reason you’d expect. If our algorithm is reading a bunch of legal documents and some obscure law firm is mentioned as a trustee, it’s really handy if the system reading that already knows everything else about that entity — its history, that it was taken over by a particular law group on a particular date, and so forth. YottaGraph is actually most useful during entity extraction for customers, because as the AI is reading their data, it keeps saying, “Oh yeah, I recognize that — I know those guys,” or, “I recognize that those two things are not the same entity. They happen at different times and have different owners — those are two separate companies with the same name.” That’s why we almost insist on having the system know about the rest of the world while it’s reading a customer’s own data. It’s always one-way: we bring the rest of the world to them. We never exfiltrate their data to the rest of the world.

Ben Lorica. Right — it’s Hotel California, but in a good way. So this is another reason not to try this at home: you won’t have access to YottaGraph. All right, so in closing, I’m going to take advantage of your former position as a computer science professor to talk about the future of CS. I’m reading headlines about enrollment declining across leading CS departments, and I have many professor friends at top programs who are advising their students, “If you can finish your PhD in three or four years, go for it — otherwise, just take that job at OpenAI or Anthropic.” We might be losing the next generation of CS faculty. What are your thoughts on the future of computer science?

Andrew Moore. It is genuinely unpredictable, so I can’t give people a definitive prescription. But I will say that there are plenty of people who are being really adaptable and imaginative. People who actually embrace these tools, mess around with them, and make things happen are going to prosper. One of the things I was doing at Carnegie Mellon, in conjunction with the business school, was really pushing for product management training within higher education — even at the undergraduate level.

Ben Lorica. For CS majors?

Andrew Moore. Yes. The idea is that you no longer need the skills to write a compiler, but you do need to understand computational complexity. You need to understand what GPUs can and can’t do. As a human, you need to understand the full stack if you’re going to make bold decisions about technology. But the other thing you really do need to know is why people are going to want to use what you’re building. I would like to see CS departments move a little more toward product management and design — so they’re not trying to create people whose only skills are the things that Claude is purportedly good at, but who are also good at looking one level above that.

Ben Lorica. Do you think we’re going to see continued declines in CS enrollment?

Andrew Moore. We might, but unless — and I very much doubt this — we’re about to enter a true singularity, we absolutely need people who can think about computational thinking, data, mathematics, probability, and game theory. There’s pretty much no rational probability and game theory being encoded in these agents today. If the United States doesn’t produce those people, other countries will. We saw with DeepSeek an example of another country coming up with really clever tricks because they had a bunch of young engineers asking, “How can we make things cheaper?” — using a combination of their own human brains and coding tools. I could not afford to see the United States stop thinking creatively about those things.

Ben Lorica. That’s a classic example of constraints, right? If you have constraints, you become more inventive. If you don’t — like here — you just scale, scale, scale.

Andrew Moore. That’s exactly right. Sometimes I feel like the biggest foundation model companies are like giant F-16 trucks — they have huge engines because they could afford them — and eventually we’re going to get surpassed by folks who do nimble tricks.

Ben Lorica. Last question: open-weight models. Unfortunately — or fortunately, for some — most sources of open-weight models now are from China, and I think there’s still some hesitation among Western enterprises about adopting Chinese open-weight models, even when they can install them in their own clusters. I think we do need a source of open-weight models in the West. There’s Google Gemma, but Gemma is purposely not as capable as Gemini — it’s more aimed at edge devices. Are you worried that, in terms of open weights, the US is ceding ground to China?

Andrew Moore. It’s possible, but it’s very hard to legislate against.

Ben Lorica. Big companies —

Andrew Moore. Have to produce open-weight models for the common good.

Ben Lorica. Open weights is a business decision, right?

Andrew Moore. Yes. I think there is always going to be a market for domestically produced open-weight models, and so I’m very happy about Gemma and the work going on there. If Google decided not to do Gemma, I know someone else would jump up immediately. It’s a national security issue, and there are enough people with the capability to do this if necessary.

Ben Lorica. And with that, thank you, Andrew.

Andrew Moore. Thank you very much. I really enjoyed every moment of this interview.

Exit mobile version