Building Mathematical Superintelligence

Tudor Achim on Mathematical Superintelligence, Formal Verification, and the Future of Research Mathematics.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • YouTube • AntennaPod • Podcast Addict • Amazon • RSS.

Ben Lorica speaks with Tudor Achim, cofounder of Harmonic, about the fast progress of AI for mathematical reasoning and what it would take to build “mathematical superintelligence.” They discuss contest math, research math, Lean and Mathlib, formal verification, search-based reasoning, and why AI may soon move from being a collaborator for mathematicians to a system that can independently explore and prove new results. The conversation also explores what remains distinctly human: choosing the problems that matter, setting direction, and deciding where mathematical effort should go.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript

Related content:

A video version of this conversation is available on our YouTube channel.
The Agentic Sweet Spot: Where AI Moves Fast and Humans Stay in the Loop
Why Your AI Agents Need Operational Memory, Not Just Conversational Memory
Kay Zhu → Your First AI Employee Is Already Clocking In
Jeff Hawke → World Models Are Here—But It’s Still the GPT-2 Phase
Ethan Ouyang → Beyond Vibe Coding: Building Your Entire Business with AI

Support our work by subscribing to our newsletter📩

Transcript

Below is a polished and edited transcript.

Ben Lorica: All right. Today we have Tudor Achim of Harmonic, which you can find at harmonic.fun. The tagline is, “We are building mathematical superintelligence.” And with that, Tudor, welcome to the podcast.

Tudor Achim: Thanks. Thanks for having me. Great to be here.

Ben Lorica: And so math here, for our listeners, is real math, not arithmetic. There are actually two areas that we’ll cover here. There’s an area roughly called contest math, I guess, like the International Mathematical Olympiad. And then there’s actual research mathematics, where professional mathematicians are trying to make progress in their field, mainly in the proof of theorems, but not necessarily.

So, mathematical superintelligence: for our listeners who are not familiar at all with math, superintelligence usually connotes some sort of foundation model. Is that one of the components that you folks are building? We’ll get into the actual system, but is one of the things some form of a foundation model?

Tudor Achim: Well, mathematical superintelligence in itself doesn’t make any reference to how it’s implemented. We’ve defined it as an AI system as a whole that can do math that’s more advanced than any collection of humans on the planet.

And there are two components, in my opinion, to any math superintelligence system: search and pattern recognition. Search is what you use to explore the frontiers of math to see what works and what doesn’t. And then pattern recognition is the component that compresses the results of search so you can benefit from it the next time you try to solve a problem.

So from that perspective, we of course use large language models as part of our system, but if some better pattern-recognition primitive comes along, we’d be happy to use that too. I wouldn’t say that we think of MSI as tied to any particular model architecture, or whether you call it a foundation model or not.

Ben Lorica: So, for listeners who are not following this field closely, is it fair to say, Tudor, that as far as contest math goes—the International Mathematical Olympiad—there are now several systems that are roughly, I would say, at the gold-medal range? Is that a fair characterization of the state?

Tudor Achim: Well, what I know for sure is that at the IMO in 2025, a couple of companies claimed gold-medal performance with internal systems. So that was OpenAI, DeepMind, which used informal reasoning, and Harmonic, which used formal reasoning. I believe ByteDance, with their Seed prover, announced gold performance a little later.

But yeah, I think at that time snapshot, you had some systems that could do it. I’m curious what happens at the next International Math Olympiad. My view is that testing post hoc after the competition is not really the way to measure things. So I think I’ll have to hold my judgment until next July—or this coming July—to see what happens. But yeah, I think broadly speaking, there are systems capable enough of getting IMO-level results these days.

Ben Lorica: But just to put that in context, two or three years ago, these systems were nowhere near this capability, correct?

Tudor Achim: Yeah. Two or three years ago, I think they were struggling with even basic high school math contests. But I think there were signs of life. Maybe we’ll talk about it at some point, but I think the sparks of life on high school math contests, as well as formal verification, were what led to the creation of Harmonic. So yeah, I think there’s been remarkable advancement in the last three years.

Ben Lorica: So let’s briefly discuss, at a high level, the architecture that people seem to have settled on. I’m not that familiar with your folks’ system, but for example, DeepMind has at least at a high level described their AlphaGeometry system, which is a hybrid system seemingly similar to yours, right? So there’s a neural intuition plus some sort of symbolic verification. Even though the details of the implementation differ from system to system, is that more or less the general consensus that this is how to proceed?

Tudor Achim: Well, I wouldn’t say there’s consensus. I think the world changes rapidly. All I can comment on is the state of the Harmonic system at the time of the IMO, which we published in a tech report, which did have a hybrid system. So you had a kind of fuzzy reasoning system that could do informal reasoning, and then a formal reasoning system that could do formal reasoning. And the interplay between them is what led to the search traces that solved the problem.

But you could easily imagine more advanced systems that don’t even bother with human-interpretable reasoning. I think whatever the consensus was at the time, or is now, I don’t think that’s necessarily indicative of what systems will look like even a year from now.

Ben Lorica: But for listeners who are just familiar with AI systems in general, the interface for mathematicians is similar to what people are familiar with, right? So they talk to these systems using text.

Tudor Achim: Yes.

Ben Lorica: And then they interact with it kind of like vibe coding, right?

Tudor Achim: Yeah, we call it vibe proving, which I think is pretty appropriate for the sector.

Ben Lorica: Yeah. When you’re interacting with a coding agent, it’s almost like a pair-programming kind of interaction, right? So in this case, the mathematicians can spitball ideas with the system, and then the system may even suggest new ideas. In that case, the number of hypotheses is now no longer limited to the imagination or the knowledge of a single mathematician.

For our listeners who are not familiar with research mathematics, research mathematics is somewhat specialized. People who are in one field literally may not even understand the opening paragraph of a paper in another field. But it turns out that a lot of progress in mathematics happens when someone happens to be able to connect, “Hey, there’s an idea from quantum field theory that might apply to topology.” But the problem is there’s only a handful of people who actually scan the literature broadly. And so I think what these systems can do is basically allow you to scan ideas from other fields. Is this a good way to look at it?

Tudor Achim: Yeah, I think even within a field this happens. So in some of the Erdős problems that were solved about two or three months ago, it turned out that there were very similar ideas, if not the same ones, in papers that people had published decades ago. And these ideas would have been completely overlooked if not for the fact that AI found them and effectively made reference to them when solving problems.

So I think that mathematical discovery is a challenge, and AIs help a lot with it. I just wanted to touch on a point you mentioned: right now, AI is a pair programmer for mathematicians. I think that’s a very temporary setup. I think it’s the same thing we saw with so-called centaur systems in chess, where for a while the best chess players were AI plus humans. That’s what we’re seeing now: the best mathematical systems are AI plus humans.

Ben Lorica: Hold on there for a second, because for listeners, there was the Deep Blue breakthrough at IBM where an AI system finally beat a human. And then for a while, the best systems in open competition were machine plus human. And so now what you’re saying is you don’t even need a human anymore.

Tudor Achim: Yeah, adding a human to the mix just hurts the performance of the system. I remember this funny quote by either Bob Mercer or Peter Brown from Renaissance Technologies—or one of them—when they were at IBM back in the day working on speech recognition. That’s actually one of the earliest practical machine learning models, these hidden Markov models. And they had this funny quote that was something like, “Every time I fire a linguist, the performance of my statistical machine translation goes up.”

I think that in the near future, the role of mathematicians will be to define the questions that matter and point these computational systems in the right direction. But I’m not so sure that they’ll be most helpful in giving ideas to the AI, because I think the AI will be able to do a very good job on its own of proposing hypotheses, scoring them, choosing what to prioritize, and allocating its own computational budget to solve them.

Ben Lorica: But then, as you said, taste still matters, right, in terms of what’s an interesting problem?

Tudor Achim: Ultimately, mathematics is a human endeavor and has to be useful to humans, which means it has to be directed by humans. We have to decide—and I’m using the royal “we” here, I’m not a mathematician, I’m a computer scientist—but the mathematical community will have to decide for itself what it cares about. What problems are most important? Do we want to allocate effort toward the Riemann hypothesis or toward finding isomorphisms between QFT and everything else? And I think that’s where the human understanding still has to be there. We have to choose where we put the budget.

Ben Lorica: So that’s looking ahead to the future. For now, at least, the mathematician still plays a role, right? So it interacts with the system.

Tudor Achim: I haven’t yet seen the kind of genius that you have in solving problems like the Poincaré conjecture or FLT or many other problems that are possibly equally hard but not as well known. I just don’t think we’ve seen AIs have that kind of creativity yet. I think they’re extremely strong at combining existing techniques in very strange ways to get results. But this kind of spark of insight and beautiful structure that is found, I just don’t quite see AIs there yet.

Ben Lorica: Yeah, I think actually a group of mathematicians compiled—is it called the First Proof Challenge?—where it’s basically just a bunch of lemmas from their field. And somehow all these systems still can’t tackle these problems. And on the rare occasions where they are able to solve the problem, it turns out that it’s just something they read somewhere. So from that perspective, I think that’s why it’s interesting for you folks: there’s still a lot of building left, right?

Tudor Achim: Yeah, I think it’s a fair question: will continuing to scale reinforcement learning lead to such sparks of insight or not? I think the answer is, more than 50% chance yes, but there is some chance that we’re missing something. For example, maybe you have to train AI systems fully independent of human data to get these kinds of insights. Maybe physical intuition is key and we’re missing that. So I don’t know. I think there’s at least a 30% chance that something different has to happen, but that’s what makes the field exciting.

Ben Lorica: So, for the listeners who are familiar with coding agents, one of the advantages obviously in coding is you can compile or run the code and make sure that it works. It turns out something similar is possible in mathematics, to an extent, because there are formal theorem provers like Lean and Mathlib. But maybe, Tudor, you can explain what these systems are, because a lot of people have never even seen these systems or touched them. Maybe explain it in a way that’s understandable to non-math people. What exactly are these systems?

Tudor Achim: Sure. I will say that they’re much stronger than compilers for programming languages in terms of correctness. But the way to think about Lean is that it’s a programming language, much like C++ or Python, but it has a much more advanced type system which allows you to express very complicated logical notions. Because it has such a rich type system, it turns out you can encode all of mathematics in it.

And because it’s encoded in the type system, that means the compiler can check for itself whether the mathematical statements that are being written follow valid rules of logical deduction. What Lean is not is a language that automatically finds a proof for your theorem. All it does is let you express all of the steps of reasoning one by one in a way that the computer can check in a hallucination-free manner.

We actually settled on using formally verified math for two reasons at the beginning of Harmonic. The first was that we thought it would be useful for humans. So there was a thought experiment at the beginning where, in July 2023, we asked ourselves: “Well, look, if you’ve got the Riemann hypothesis and you ask an AI in 10 years to prove it, based on what systems looked like at the time, you’d get back 100,000 pages of text in some sort of human language.” And I felt that you might as well throw that away, because first of all, it’s probably wrong. And second, even if it’s not wrong, it’s just so much content that you can’t really fathom how it works.

But now imagine that instead of that, you got the proof in Lean. Well, first of all, you know it’s correct because it compiles. But possibly more importantly, it is structured in a way like a codebase that you can jump around in—go to definitions, understand the usage of things, and understand how things interact with each other. And I think that’s very powerful for humans, right? So if you want math that an AI does to be understandable by humans, it felt that Lean is the right language to use.

The second reason we chose it was that we felt that with an airtight reward signal—namely Lean compilation—you’d be able to do reinforcement learning much more efficiently than how people were doing it before. And I think that we did prove that out as well, going from effectively zero in October 2023 to IMO gold performance in 2025. So I think there are a lot of reasons to care about Lean. From an internal perspective, the efficiency is great, but from an external perspective, I just think that this should be the future of mathematics, where people no longer have to worry about correctness but rather about taste and impact.

Ben Lorica: And then there’s an accompanying community around Lean called Mathlib. Is that something that you folks also leverage? And if you do, maybe explain to our listeners what Mathlib is.

Tudor Achim: Yeah, we use Mathlib, we sponsor people to contribute to it, we update our agent to be compatible with it. I mean, we care about it a lot. Mathlib is the world’s largest open-source repository of mathematics. You can think of it as essentially the undergraduate and graduate level of math. Imagine taking all those textbooks—well, 10% of them—and then formalizing them. That’s essentially what Mathlib is.

Every proof that Harmonic outputs tends to depend on Mathlib because you use those basic mathematical notions in any proof, whether you’re doing contest math or research math. The remarkable thing about Mathlib is that it’s managed to get to hundreds of thousands of definitions and multiple millions of lines of code in an entirely open-source and decentralized way. So you can think of it as a Wikipedia for math, but every edit is computationally certified for correctness.

So it’s remarkable that you have all of this math implemented in a way that’s compatible with all the other math. So when we talk about finding isomorphisms between subfields, having something like Mathlib actually makes it easy because you know that everything fits together.

Ben Lorica: So have these systems gotten to the point, Tudor, where you can go into Mathlib and point the system at it and tell it, “Hey, can you find a more elegant way of…”

Tudor Achim: Oh yeah. Yeah.

Ben Lorica: So then there’s kind of a—Mathlib is now also being improved by these systems.

Tudor Achim: Yeah, and I think that humans are going to be in charge of the AIs. So one thing that I would love to see more of from humans is simply agreeing on definitions, right? If humans agree on the definitions for certain parts of math, then you can let the AIs go and prove the theorems, build other structures, lemmas, whatever.

But if you don’t have humans make the structures and the definitions, you might get an AI that creates a definition of a manifold in a way that you wouldn’t necessarily like, or it might create some definitions that aren’t compatible with a lot of human math that’s been created. So I think it’s up to humans to put the structure in place for the AIs to contribute, whether it’s to Mathlib or to other math research efforts in the future.

Ben Lorica: So are there specific fields in math where the community seems to be more open to these tools, or is it hard to stereotype—is it just at the individual level? In other words, is the PDE community, which is where I came from, partial differential equations, more open to it than the number theorists or…

Tudor Achim: I think everyone’s open to it. I think everyone sees this as the future. I just got back from a small conference called Exponentiating Mathematics—it was the kickoff of this DARPA program run by Pat Shafto—and the amount of enthusiasm for AI tools in mathematics was unbelievable. I think everyone sees that they’re the future of reasoning.

I think that when it comes to formally verified mathematics, there are certain parts of math that are more well represented in Mathlib than others. For example, many of the key definitions in algebraic geometry have not even been attempted to be formalized. Or if they have, it’s really these isolated projects that don’t make it into Mathlib. And that means that if you wanted to research an algebraic geometry problem and then check it with a computer, you’re just out of luck. You have to do the basics first.

In contrast, I think in number theory, some areas of graph theory, and PDEs, a lot more math has been formalized, and so you can go off to the races with Aristotle to do math there.

Ben Lorica: Yeah, there’s one area that used to drive me crazy in grad school. The people in—what is it called?—geometric topology, where they would draw pictures and say, “I’ll do surgery here,” and… But then when you look at the papers, there are hardly any equations. It’s all words.

Tudor Achim: Yeah, we’re not quite there yet on formalizing that.

Ben Lorica: Yeah, I’m not sure exactly how easy that would be to formalize, right? So you described the vision for the future, right? But let’s stay with the present. So what’s the current workflow for a professional mathematician right now?

Tudor Achim: I think many people have their own workflows. The way we see people use Aristotle, which is the thing I know most about, is they’ll have a math question and sometimes they’ll go to Aristotle and just ask the question, and Aristotle will do a lot of analysis and get back with a formally verified…

Ben Lorica: So wait, can the question be, “Hey, is this still open?”

Tudor Achim: Well, yeah, you could ask it. I’m not sure what it’ll tell you, but yeah, you could definitely ask it. But the cool thing is, there’s a web UI for Aristotle—we just launched that last week—and you can go ask a question and it’ll just go and think a lot and get back to you.

The other thing we see people doing a lot is that, as they start to get closer to having a more well-defined paper and filling in a bunch of the ideas, they will start using Aristotle just to triple-check that it’s correct, right? So you can also use Aristotle if you’re a mathematician or an economist or a statistician in an auto-formalization mode. So there you upload a PDF or a LaTeX file or whatever…

Ben Lorica: Oh, so yeah, yeah, so that’s what I was going to ask. From a laziness perspective, I can just upload the LaTeX or PDF file, it’ll just read it and do all the things it has to do in order to kind of…

Tudor Achim: Yeah. Yeah, and you can go on Twitter, there are plenty of people doing this all the time and posting. Our vision is that with the Aristotle API, within a couple years you’ll take a photo of your chalkboard that has the Riemann hypothesis on it and the computer will just solve it and tweet about it, you know.

Ben Lorica: Starting from OCR, huh?

Tudor Achim: Exactly, yeah. It’s compact state, and it’s already in Mathlib, for what it’s worth, so you’ve got the statement already.

Ben Lorica: So then you go ask questions and then you’re brainstorming with the system. And then at some point, do you tell the system, “Okay, I think we have a plan, go do it”? Just like when you’re working with a coding agent: “All right, so spec-driven development, this is what we’re going to do,” and then you interact with the agent and then, “Okay, go ahead, write the code,” right?

Tudor Achim: Again, I think this is getting a bit too much to the centaur system. So I think right now, you can absolutely give it a spec or a scaffold or a blueprint and follow it. But I think that we’re quickly marching toward a future where you don’t bother with that. You just give it a LaTeX file or PDF and you say, “Hey, here are some ideas I’ve had,” or, “Here’s a nearly complete paper, just fix it and finish it.”

Ben Lorica: Yeah, yeah, yeah. I mean, it’s the same thing right now with code anyway, right? So you go, “Here’s code, I’m not happy with it, I want it to be able to do this thing. I’m not even sure I’m using the right libraries. Fix it.”

Tudor Achim: Yeah, I mean, I think the difference is that with these coding agents, I highly recommend people review the output before they rely on it. I think with Lean, so long as the statement is correct—and it takes some time to verify, but it’s not that crazy—once the statement is correct, you can rely on the Lean compiler to assure yourself of correctness.

Ben Lorica: So for people who, including me, have never read the proofs generated by this system, as any mathematician will tell you, you can have a proof but it can be convoluted and hard to understand and so on and so forth, as opposed to an elegant proof, right? So how do these systems work right now in terms of beauty?

Tudor Achim: I mean, I think the proofs are beautiful. We’ve printed out our IMO proofs on dot matrix…

Ben Lorica: No, no, no. But I mean, when you show it to professional mathematicians, what’s the reaction?

Tudor Achim: They’re very happy. I mean, again, you don’t have to take my word for it, you can just look on Twitter. People are very happy with using Aristotle. I think the Lean proofs—I kind of think of them as assembly code. It’s very, very pedantic. It’s very low level. I think the proofs, if you just want a high-level understanding of them, are best interpreted via a de-formalized version where you invert the process.

Ben Lorica: Isn’t it possible for the system to go, “Here’s the system will develop a Lean proof, and then for me as a mathematician, it’ll actually write the paper”?

Tudor Achim: One hundred percent you could do that. I think you could easily do that with existing models, yeah.

Ben Lorica: Yeah, yeah. So then I can read the paper and then I can say, “You know what, this proof can be improved by doing this,” right?

Tudor Achim: Yeah, and by the way, you can do that today in Aristotle. So you could take an existing Lean project and say, “Hey, I’ve looked at it, I’ve understood it, this thing is good, this thing could be improved, so go improve that thing.” It will just go and work. It’ll modify your Lean files and correct some things, and you’ll get back a better Lean proof that follows your guidance. So I think it’s a very flexible system.

I think the key point is that whether you’re using it to improve proofs or auto-formalize proofs or find proofs, it’s formally verified in the end. That’s the only constraint for Aristotle.

Ben Lorica: So can you explain to us the role of—I think I read somewhere that part of these systems uses Monte Carlo tree search, Monte Carlo graph search. Can you describe how that works at a high level?

Tudor Achim: Yeah, I mean that’s the most important part of the system. We’re quite open—we have a technical report that’s on the arXiv and has a lot of details there. So a brief overview is: the way people typically think of AI search is via chain of thought. So you have one agent and it thinks in human-interpretable language, like English, and it just keeps thinking in a sequential chain. That’s cool. It makes sense.

But what it doesn’t let you do is easily and efficiently explore many ideas in parallel. So as soon as you say, “Well, I want you to explore two different ideas,” now you have a natural question of how much effort you spend on one idea versus the other. That’s where a value function comes in. And once you have a value function from a neural network, then the question is how you aggregate the statistics of that value function across the entire search tree to make decisions about where to explore. So that’s where Monte Carlo graph search comes in. Because it turns out it’s not quite a tree—there are multiple ways to get to a certain state—and so you have a graph.

Ben Lorica: Why not just search everything? You have parallel computing, right?

Tudor Achim: I mean, if I had infinite money, I would. I would. And that’s the point of tree search, right? So tree search can explore as much as you want in parallel. And yeah, I think doing reinforcement learning at scale gets you to very powerful search systems.

Ben Lorica: And then basically the system is exploring many ideas almost simultaneously. And it could well be that more than one of these ideas will pay off, right?

Tudor Achim: Oh yeah. No, that happens all the time.

Ben Lorica: And then somehow it also arrives at which of these ideas it will present to you.

Tudor Achim: Yeah, I mean, typically you stop the search once you’ve solved the problem. But certainly if you ran it more, you would find solutions with different characteristics. For example, they might take less CPU time, or they might be shorter, or they might be longer. It’s an open question what you get when you run it forever. We tend to just stop once we solve the problem, though.

Ben Lorica: So in terms of novelty, right, I think there’s no question these systems probably can be good at, you know, if you ask them a question about a known result, they probably can handle that. So what do we know about novelty?

Tudor Achim: There are a few examples. My understanding is that the First Proof problems are novel, and I think AI systems can start to approach them. I think we might have published some results on that. There are frontier math challenges. But ultimately I think what matters, for us, is the Millennium Prize problems, you know. Because those you know people have cared about for a while. And I think it’ll be very exciting when AIs start really making progress on that level of unsolved problem.

Ben Lorica: I mean, for some reason, actually, there are even super-applied and practical problems—for example in nonlinear PDEs—where if progress is made, it could have real impact in commercial applications, right? Math has this stereotype of being somehow far removed from applications, but there are some mathematical problems where the applications can be rather quick if progress is made.

Tudor Achim: Well, yeah, I mean, for example, number theory underlies the entire digital economy. I think that’s a big one. Yeah, I think people don’t quite appreciate how much math is pervasive in our lives.

Ben Lorica: So if you rewind many, many years ago, decades ago, there was the four-color problem. I think that was probably the first instance of an open problem that was solved essentially by—is this an unfair description, Tudor?—brute force.

Tudor Achim: I don’t know the details of it, actually.

Ben Lorica: Yeah. But basically, I think they just kind of went through all the possibilities and then, “Here, we verified it.” But I would imagine a problem like that now would be much more approachable.

Tudor Achim: Well, if I understand correctly, if they used brute force, it sounds like combinatorial optimization. So I don’t think we have better algorithms, but probably it’s still going to take a while to find if that’s how they did it. Or there may have been theoretical progress between then and now. Maybe there are better approaches.

But I think the one key difference is that if you’re talking about 20 years ago, we didn’t have Lean, which I think has at this point kind of won the race of theorem provers. So my sense is that if there were any questions about accepting the proof back then because it used some other language, I’m not sure we would face such questions now. I think people would accept it.

Ben Lorica: So in your future—the way you envision the future—what’s the role of the mathematician? I guess, like you said, the mathematician will kind of decide, “Yeah, this is an interesting problem, we should work on this.” But is that it?

Tudor Achim: I think characterizing it as a small role is not quite the way I think about it. It requires mathematical understanding to determine what’s important and why, right? You’ll need the problem-solving skills to understand what’s hard versus what’s easy. You’ll need the taste to understand how much you care about isomorphisms between fields versus solving this one specific problem. So I think it’s a huge role.

I think the role of the mathematician just elevates from doing the pedantic checking and writing to deciding what to prioritize and why.

Ben Lorica: So maybe, Tudor, we can make an analogy to programming, right? In programming, I think people are going to essentially more or less stop writing code manually. But you’ll still need people who understand software, software systems, architecture. Is that kind of a good analogy to what mathematicians will be doing?

Tudor Achim: They might not be doing math manually anymore. Yeah, there is active and, I think, justifiable debate on what the role of software engineers will be in software engineering. So yeah, I think theoretically you could imagine a superintelligent AI that is truly so intelligent that one doesn’t have to look at its output ever. It feels like we’re very far from that, you know, and we use the latest models all the time. It’s hard for me to tell how close we are. They make so many mistakes all the time, even on basic things.

So I think one difference is that in software engineering you do need to understand the details of the code to validate it, whereas in math, because you have the proof system that can be formally verified, it’s not so much of an issue. So it’s hard to tell, but I think the bulk of the work will be in deciding where you spend time for the computer. And to make good decisions there, you will need to have the mathematical understanding and the ability to connect that understanding to what society at large needs from mathematicians.

Ben Lorica: So I’m sure you’ve read a lot of what Terence Tao has written about this topic, right? So one of the good analogies he makes is that if you imagine solving a mathematical problem as hiking from point A to point B, these systems basically drop you directly at point B. Whereas a professional mathematician will hike from point A to point B and along the way get distracted, find interesting ideas. In other words, it’s not a direct route, but along the way maybe they find connections that aren’t there, and that inspires them to go a different way or tackle related problems and so on and so forth. So what do you think about that?

Tudor Achim: I think that’s exactly the right way to think about it. I mean, if I related it to thoughts about…

Ben Lorica: In other words, artifacts. A professional mathematician will produce artifacts along the way that may be useful later, right?

Tudor Achim: Yeah, broadly speaking, yes. I mean, you can think of the mathematical journey as essentially learning, right? Now it happens to be the case that right now the only things that can produce the artifacts are humans. In the near future it’ll also be AIs. But the pedagogical value of going on the hike and learning the math so you can contextualize future math in your new understanding from your hike—I just don’t think that goes away.

So I think, to distill it down, ultimately these AI systems are tools. They’re not the principal agent in the future. I think they’re tools for humans to use. And I just don’t see the need for mathematical understanding ever going away among humans. I think we need to have the mathematical understanding to know how to apply the tools to the problems that society cares about.

Ben Lorica: So winding this down, looking at broader lessons for fields outside of math: one of the things you keep reminding us of, of course, is Lean is kind of a special thing in mathematics, right? Does that mean, Tudor, that many of the things that you folks are building may not be applicable to areas outside of math?

Tudor Achim: Well, I don’t think they’re immediately applicable to, for example, writing a history essay. There’s very little that’s formally verified about that. When you’re making digital art, or when a human’s making art, there’s no point to verification—you just want to make the art and appreciate the beauty.

But I think there are other quantitative things people look into. For physics, for example, it’s not purely mathematical, but there’s a big mathematical component and it’s very helpful to be able to axiomatize parts of physics and reason about them formally in a hallucination-free way. We’ve also found organic usage of our API from software-verification people. So they have ways of encoding software programs in the Lean programming language, and once you’re in that world, Aristotle just goes and cranks through very quickly and either proves the code correct or finds bugs.

So my sense is there may be a future there. There may be a future in the law, right, in formalizing the legal code or the tax code like they have in France, and then making tax filing cheaper. I think it’s hard to predict exactly where hallucination-free logical reasoning will be helpful, but I think it’s safe to say that it’s going to be helpful outside of just pure math.

Ben Lorica: But the key characteristic of this particular domain that you’re working on is the existence of this formal verification tool. Is that…

Tudor Achim: Yeah, and it’s the fact that these domains are built on a logical foundation. So I think my point is that there are many areas outside of pure math that have logical foundations to which these techniques could be applied.

Ben Lorica: And then honestly, Mathlib also is a special thing, right? So not many fields have the equivalent of Mathlib.

Tudor Achim: Yeah, people try to create that for computer science. I think there’s a recent CS-lib effort. I think Mathlib is special because, you know, I started my career in Silicon Valley at a company called Quora, and…

Ben Lorica: Oh yeah, yeah, yeah. Adam D’Angelo.

Tudor Achim: Yes. And you learn about this thing called a network effect. So I think Mathlib is special because it’s the biggest repository of open-source math, therefore people want to contribute to it versus others, which makes it the biggest one, which makes you want to contribute more, and so on. So I think that Mathlib and Lean, in my opinion, have secured the network effect, and those two are the future of formally verified math and quantitative reasoning.

Ben Lorica: So are there lessons from this domain, Tudor, that map at least in the area of human-computer interaction, user interface, to other areas?

Tudor Achim: I think the interesting thing about this area is that there’s verifiable truth. So I think the design patterns, right, when you work in an area that has truth, are different from ones where you don’t. For example, you can let Aristotle run for 24 hours, 48 hours, and more, and you can be confident that at the end you will still get something correct. In non-verifiable domains, you let it run for that long and it’s probably degenerated and done something completely different from what you asked.

I think the HCI lessons are going to be different between those two areas, but certainly with Aristotle, and possibly other similar systems, this is the first time you can even explore that paradigm. So I’m sure we’re going to learn a lot of lessons about how to build products there.

Ben Lorica: Yeah. And with that, thank you, Tudor.

Tudor Achim: Oh, thanks so much.

Tudor Achim on Mathematical Superintelligence, Formal Verification, and the Future of Research Mathematics.

Transcript

Share this:

Like this:

Discover more from The Data Exchange