Coding Agents Meet Data Science

Mikio Braun on Vibe Coding, Data Science Agents, Code Review Bottlenecks, and Building Side Projects with AI.


Subscribe: AppleSpotify OvercastPocket CastsYouTube •  AntennaPodPodcast AddictAmazon •  RSS.

In this episode, Ben Lorica is joined by Mikio Braun, Senior Principal Applied Scientist at Zalando, for a wide-ranging conversation about the practical realities of AI-powered coding agents. They explore the unique challenges of using coding agents for data science workflows, the team-level implications of dramatically increased developer velocity, and what skills will matter most in an AI-augmented workplace. Mikio also shares his side projects—Talk with Ren, an AI-powered conversational language practice tool, and Bjorn the Bouncer, a text adventure experiment—both vibe-coded from scratch.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript



Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a polished and edited transcript.

Ben Lorica: All right, so today we have my good friend Mikio Braun, Senior Principal Applied Scientist at Zalando. Although today we’ll be talking mostly about personal projects and other things, including a site he put out called talkwithren.com. We’ll talk about that in a second. For those of you who don’t know, Mikio is my silent partner on this podcast, and I hope to have him here on a more regular basis. But with that, Mikio, welcome to the podcast.

Mikio Braun: Yes, nice to be here. People are actually asking me, “What is a silent contributor to a podcast?”

Ben Lorica: And I will talk to you about it offline after. All right, so first topic: coding agents, which I’m assuming both of us now use heavily. The first topic that Mikio wanted to talk about is coding agents in the context of data science. So you go, and then I’ll go.

Mikio Braun: Yes, yes. So I’ve started using coding agents. For a long while, I was a bit like, “Ah, all this AI hype.” But then I had a friend who told me, “You really have to try this.” I think last September I did, and I was really amazed at how well it works. Recently, I’ve started to use it for data science. It’s quite funny because you can really tell it was designed to be a coding agent, right? You do something, and if there are no error messages, it’s happy. It tends to jump to conclusions super quickly. It’s like, “Oh, look at this. Found a correlation. This is amazing. Should we commit this?” I really looked into how you can teach it to be more skeptical about data. Everybody who works with data knows that the first time it looks good, there has to be a bug somewhere. You never believe the good results the first time you get them.

Ben Lorica: First of all, if I may ask, what’s your setup? Is it Claude?

Mikio Braun: Yes, mostly Claude Code.

Ben Lorica: Okay. I actually have what I call my “oxygen stat”—O2, Open Code plus Open Router. You can hook up any model on Open Code, so I hook up Claude models there all the time. So, for our listeners, maybe as a bit of background, distinguish data science from coding. What’s the typical workflow for data science?

Mikio Braun: With data science, usually you have some data and then you do some exploratory analysis to understand what is in the data. Then you want to do something classical, like train a model for a forecasting task or a classification task. That’s sort of the workflow. But the thing is, so much can go wrong. You can train a model, and just because the model looks good on the training data doesn’t mean it works well on something else. There’s just a lot of stuff you have to know how to do. You cannot just rely on the code doing the right thing.

Ben Lorica: A couple of questions. First, obviously before the advent of coding agents, there were tools that people broadly called AutoML. AutoML got to the point where, if you knew the data was good—in other words, it was clean and someone had already looked at it—they would use some sort of AutoML tool to fit a model automatically using hyperparameter search algorithms. Those exist. So what you’re talking about, it seems like, is getting the data for the first time. You’re not familiar with it, you’re not even sure if it’s clean or standardized, and you have to do something downstream with it—create a report, build a model, whatever. Is the assumption that this data hasn’t been vetted yet?

Mikio Braun: Yes, it hasn’t been vetted yet. And maybe it’s not even clear what the machine learning problem really is. What is the input? What is the output? How do you evaluate it? What is this data good for? Maybe you have an idea of what your machine learning problem is—which is to improve logistics or forecasting—but you don’t know if this data will help, or whether it’s clean. Now I work in logistics, so let’s say you pull some orders. The question is, is it comparable? Maybe you had lots of orders on the weekend. There are a lot of things that can just go wrong. You have to get to a point where you say clearly, “This is what you want to predict, and here is how you evaluate it.” But before you get there, especially if the data is complex and you need a lot of domain knowledge to understand what’s going on, a lot can go wrong. I found that the coding agent knows how to do Python, it knows how to decompose things, but you can’t really trust it when it says, “Look, I did it. Everything’s good now.” You have to really question, “What did you actually do?” Very often, it makes mistakes you wouldn’t make as a human.

Ben Lorica: It seems like what you’re describing is the fact that you’re actually using a coding agent to do something else, which is data science. What you need is an actual natural-language data science agent—something that understands the data science task more explicitly, which might mean, “Look at this data carefully. Is it standardized? If you’re going to make suggestions to me, did you make sure the model was robust? Did you create synthetic data? Are there edge cases?” In other words, it’s not a tool that the coding agents were specifically built for anyway.

Mikio Braun: Yes, that’s true. Although I think in the training data, there is a lot of information about how to do this properly. The LLM itself has a lot of information about this. But of course, the coding agents have been tuned, the models have been tuned, and they’re not designed for that. I agree.

Ben Lorica: If you look at it at a high level, the training data is likely GitHub repos. But GitHub repos are code; they don’t necessarily explain what happened with the data. I guess you could have a case where you have a collection of Colab notebooks where people annotated, “Okay, I’m loading the data into Pandas, and then, look, I’m doing exploratory data analysis.” If you had a lot of that, maybe it’s possible.

Now, there’s also the question of context. You could point a coding agent to some data, but it may not have enough context. Like, “What is this column? Is there another column in the data warehouse that can help me understand this column better?” I think this problem is largely being solved in the SQL sense. People are building these context stores, or what some call a semantic layer. Databricks, Snowflake, and a bunch of other people have this. Basically, they go into your data lakehouse, understand at a metadata level what your data is, so they can help you formulate your queries better. But that doesn’t help you if the data is completely brand new. You may have to actually provide more context to the agent.

Let’s assume you’re a data scientist and your task is to build a better forecasting model using data we already have in the data warehouse. Someone has already gone through and made sure that data is correct. What’s your sense at that point? Can it build a decent forecasting agent?

Mikio Braun: Yes, I think it can, if you can really iterate on that. What I then try to do is also explain the data if something is missing. It tries to infer from column names what the meaning is, and then you can work with it. It gets to a point where it’s actually quite okay. Then usually I ask Claude Code to just write down what it understood about the data.

But there is still this problem that it’s a bit too happy when everything runs through. I also had problems where it was very often giving timeouts. At some point I said, “You know, there’s a lot of data in the data warehouse. Don’t just query all of it.” And then it started adding a timeout. Then it trains a model, and the model has a 30-second timeout on it, and it’s really hard to say, “Okay, no, this is something that will take an hour to run.” It’s very funny because for coding they’re so good, but something else seems to be missing. Maybe it’s the training data or the way they’ve been set up.

Ben Lorica: A couple of elements that might help someone who wants to build such an agent: one interesting class of tools to look at is what Jure Leskovec at Stanford has built around graph neural networks. It’s now a startup called Kumo.ai. Basically, they built a foundation model for structured data. If your data is completely structured, then if you’re using Kumo, I think you can build forecasting models or all the standard data science models rather quickly. So a data scientist who uses Kumo will be, I don’t know, 10, 20, 50x more productive than before because the model building is going to be pretty seamless.

Mikio Braun: Once you have something, iterating is really like sitting next to somebody who’s really good. You say, “Ah, can you show this graph? Can you filter it by this? Can you aggregate it differently? Group it?” And it just does it all.

Ben Lorica: The secret sauce is their foundation model for structured data, which they built rather cleverly. The second ingredient is work that our mutual friend Chetan is about to announce. The working name right now is Dex, but it’s going to change to Rote—R-O-T-E. Basically, he has this notion of a context file system. In many teams—DevOps, data engineering, data science—you don’t really want to start from scratch necessarily, because someone may actually have done something similar to what you’re trying to do before. The problem is, if you’re just doing some sort of ops or data engineering pipeline, you try to use a coding agent and it will burn through so many tokens trying to understand what kind of tools it needs to use, or you have to feed it so much context. But someone before might have built a similar pipeline. What you want to do is capture that operational knowledge—operational context. He has this notion of a context file system that allows you to do that. The end result is you save a lot of money because you’re not burning through tokens trying to reinvent the wheel each time. If it’s a data science project that someone has done before, you capture that somehow, and your system can leverage it moving forward.

It seems like the fact that data science is a hard thing for coding agents to do is because coding agents were not built for data science. I think it’s just a matter of time before someone will be able to do something like that. The end result might become something more like how research mathematics is done. It’s not completely autonomous. In research mathematics, you have the use of AI, but you still very much have the professional mathematician in the loop to check the results. But what it allows them to do is explore more ideas. Same thing with data science. It may well be the case that the data science agent is not going to be completely autonomous, but it will allow the data scientist to explore a lot more ideas, try more new datasets, and build more models.

Mikio Braun: Yes, and I mean, talking to people, that’s also the tedious part. You’ve done a thousand of these pipelines, and you’re still assembling them by hand. I think many of them would also be happy if that could be taken over by somebody.

Ben Lorica: Oh yeah, you should definitely talk to Chetan. It’s very interesting what he’s building. All right, so that’s data science. The other thing you wanted to talk about is: what do these coding agents mean at a team level?

Mikio Braun: Yes. Right now, many people have started using these coding agents, and many are getting very, very productive with them. But the next question is, what does it mean on a team level? If everybody on a team has a coding agent, how do you coordinate then? Right now, you still have the same processes. You still do pull requests, sprint planning, and all that stuff. Right now, it’s a bit like you’re done in a day or in an hour instead of two days, but then you wait three days for somebody to review your pull request. What do we do about these things? A lot of the processes are actually around coordinating people, but now, if one person becomes much more productive, the rest also somehow needs to adjust. The solution will not be that we just have everyone using a coding agent and they are two to five times more productive. Probably these things also have to change.

Ben Lorica: One way I’ve been thinking about this is that, if everyone is using a coding agent, you shouldn’t think of people as individual contributors. They’ve basically become teams. Each person is a team. That means they’re going to be faster, because it’s no longer just one person; you can think of them as five people. Secondly, if that’s the case, you have to manage a series of teams, not a series of individuals. As you said, one of the immediate adjustments is that velocity increases, so code review is now a primary constraint. People need to think through what the right processes are for doing code review in light of the fact that you now have high-velocity teams.

Also, to some extent, in many ways code is becoming free—free up to the point that you have to pay the inference costs. That means you can move faster, which means you can also throw away code faster. You can rebuild everything faster. But velocity introduces risks. If code review is now your primary constraint, how do agents help you with code review? At the end of the day, human review is going to be one of these higher-order skills that will be highly valued. How do you build tools to really help your best human code reviewers?

Mikio Braun: Yeah, and I think testing, for example, could also be AI-supported. You could have AI create all the test cases. Test infrastructure is now a main architectural concern. If everyone is moving faster, review and testing have to move just as fast.

Ben Lorica: One thing you said is that the cost of doing things becomes much smaller. Right now, we often discuss things for a long time because writing them takes months. You only have one chance, so you want to be sure it’s exactly the right solution. But with AI, one person can try out something for a week, learn from it, and do it differently.

So then the question is—we’re talking about code review and testing—but the entire infrastructure for rolling out and rolling back code changes needs to adapt. Because basically, some of these things may or may not work. Just like human code may or may not work, but now the velocity is higher. All of those supporting tools have to keep up.

I guess we don’t have any concrete answers here; we’re bringing some of these issues to the fore. Many teams are grappling with this. Even within teams, you have disparities between the extreme boosters and the extreme skeptics. This is true in research mathematics too—you’ve got extreme boosters and extreme skeptics—but the reality is always somewhere in the middle. Do you see a tension within teams between extremes?

Mikio Braun: Yeah, definitely. I think it’s very interesting. People have very different experiences. For me, it was an incredible productivity boost. But I also know enough people who said they tried it and, I don’t know, it didn’t click with them. I haven’t quite figured out how they work with it or why. But I think in the future, there might also be skill differences—like how well people can work with AI.

Ben Lorica: That will make you obsolete if you don’t work with AI. The skill set for working with AI is something you have to develop, just like you develop skills for coding.

Mikio Braun: I also think you still have to code by hand for a while, but it will be much accelerated. Even if you learn a craft, the first few times you do it by hand just to get the experience, and then you can move to higher-level tools. I think that will be true for coding as well.

Ben Lorica: I think the thing is, you can become good at describing what you want to build—spec-driven development. You can become good at that, but at the end of the day, you will still need some understanding of whether the AI produced something worth using. Just like when an AI produces a paragraph, you read it and you decide whether or not to use it. But in coding, if you’re not fluent in the programming language… like, I can probably create 20,000 lines of code in Rust using Claude Code after this podcast, and I’ve never really become fluent in Rust. But if you’re not fluent in the language, what skill do you have on the evaluation side? How can you evaluate it? Maybe there are ways for you to develop evaluation skills for the output of a coding agent.

Mikio Braun: I usually try to look for architectural patterns. Recently, I produced a lot of Go code, and I was never fluent in Go. But you understand the architectural patterns and so on. That helps me. I look at it and think, “That feels like this should be just one function, but now I see you reimplementing the same thing three times.” Then you can guide it. But the question is, how do you get this architectural knowledge without having done it yourself a couple of times?

Ben Lorica: Right. You might be able to join a company because you know something about the business—maybe you’re a little more on the junior programmer side, but you happen to know something about what that company is doing, whether it’s payments or logistics. Part of it is you have to develop some sort of systems thinking now—an end-to-end view of the problem you’re trying to solve. If you develop that broad systems thinking—“Okay, I’m supposed to improve this payment workflow”—you understand that payment workflow well, but you might not be the best Python programmer. You might still be able to contribute.

Mikio Braun: Yes, that might become more important.

Ben Lorica: All right. I alluded to spec-driven programming. Any other programming paradigms that you think people should read up on in this world of AI?

Mikio Braun: The interesting thing is, one difference I noticed is I mostly do pair programming now. I discuss a lot with Claude Code. Sometimes I discuss just to get an idea of what I want to build. Sometimes I do this “plan mode” thing when I think it’s a bigger feature. But very often, I’m also just like, “Okay, just build something,” and then I say, “Ah, okay, this isn’t good, let’s do it like this.” It’s very collaborative.

I see many people, especially hardcore software engineers, being more like, “No, no, first I want to write down the specs, everything, and then I run the agent.” They’re not even talking to it; they run it like a pipeline. But I think the cost of doing things again becomes much smaller. Right now, we often discuss things for a long time because writing them takes months, and you want to be sure it’s exactly the right solution. But with AI, one person tries out something for a week, they learn, and they do it differently.

Ben Lorica: So the question is code review and testing—but the entire infrastructure for rolling out and rolling back code changes… wait, we covered that. Let’s talk about throwing code away.

Mikio Braun: Coding itself will be free, in the sense that you build something, and over time you fix it, fix it, get feedback from users, and then you go, “You know what? Forget that. I’m going to start over. Based on what I know now, I’m just going to throw that away and start over.” Because I’ve learned a lot over the last two weeks based on feedback. You can. An hour later, you have it again.

Which is largely what played out over two months before Open Devin joined OpenAI. People were vibe-coding, never reading it. But then you have security companies saying, “Hey, this is bad, this is bad,” and they’ll fix it. You can imagine if he’d never joined OpenAI, he would go, at the end of two months, “You know what? Based on all this feedback from the security people, I’ll start over. I’ll throw this away.”

Ben Lorica: I think the code is not so bad. Sometimes the code is even—like if you build a standard web application—it can be very good. But you have to tell it to apply state-of-the-art security measures. The problem is people build something and they don’t think about security, and then it’s not in there.

All right, so let’s close this by briefly talking about your side project, Talk with Ren, which you can find at talkwithren.com. The tagline is: “Practice conversational fluency in a safe, judgment-free space.” Conversational fluency in Japanese, I guess?

Mikio Braun: It’s Japanese, but also other big languages. I just configured it. For me, the use case was Japanese. My mother is Japanese, I was born in Germany, and I always try to catch up and relearn it. At some point, you come to the point where you’re like, “I have to talk to somebody.”

Ben Lorica: Wait, before you proceed, listeners will say, “Well, there are popular apps like Duolingo.”

Mikio Braun: Yeah, but Duolingo is just these exercises you do over and over. It’s more like a quiz. You don’t actually learn to talk or converse about something, because it’s more like you translate something. Which is good if you want to start out, but at some point you get to a point where you have to have more than just quizzes, and you want to talk.

Ben Lorica: A listener will say, “But Mikio, you are a computer scientist. What do you know about teaching someone conversational fluency in any language?”

Mikio Braun: Yes, okay. So this is where I came to it. I was starting to talk to Gemini or ChatGPT and asking questions about Japanese. I realized they have a lot of knowledge about Japanese. They’re all inherently multilingual. You can just ask them, “How do I say this?” or “What is the difference between this word and this word?” and they give you that. That was already helping me. Then I thought, “Okay, now if there were a vocabulary list integrated into that, and I would just discover new words, and then I could also do flashcard training or something.” This was the basic idea. That’s exactly what I built with Claude Code.

Ben Lorica: And this is meant to—when you say conversational fluency, that means the goal is to walk down the street in Tokyo and be able to do simple transactions.

Mikio Braun: Transactions, or yeah, that’s the thing. You can talk about whatever topic it is. It’s an agent: you configure your language, and the agent talks to you in that language. You can talk about whatever you want. It’s sort of prompted to keep asking and keep the discussion going. It also explains words, and if there’s a new word, it adds it to the vocabulary list. Then it’s up to you. You can ask it to roleplay, or you can just talk about your day. It’s not set situations; it adapts.

Ben Lorica: To develop conversational fluency in a language, is there science behind the modules? How do you go from nothing to being able to walk down the street and talk?

Mikio Braun: I think that’s the gap. Right now, Talk with Ren requires you to already have some basic fluency. What you need is practice actually talking to somebody in the language.

Ben Lorica: Oh, I see. So the assumption here is you’re not starting from zero.

Mikio Braun: Yes, because for things like that, you just do something like Duolingo or something else. That’s a bit more guided.

Ben Lorica: If you hit that plateau where you’re frustrated, and then at some point you’re tired of doing quizzes all day and you want to start generating the language yourself, then that’s what I built that for. What would it take to adjust Talk with Ren to start from zero?

Mikio Braun: For that, I think I would actually need to acquire some IP around how to structure that.

Ben Lorica: Let me ask you this. You said Talk with Ren is not just Japanese; you can use it for other languages. Have you tried it in other languages where you were starting from zero?

Mikio Braun: Not from zero, but for example, French. I had some French in school, but I almost forgot everything. I tried to debug whether it works in other languages, so I tried some of my French. You have to have some fluency. It always starts with, “Hello, I’m Ren, your language tutor. What do you want to talk about today?” And then you have to start talking. I mean, you can always switch back to English, and it would understand you, but the idea is really to have something where you can just talk about any topic you want.

Ben Lorica: Tying this back to AI, does this have anything to do with AI?

Mikio Braun: Yes. I think it’s a bit like prompt design. That’s the interesting part. Right now, I launched it in November, and I had like 50 or 60 people signing up. To be honest, none of them—a few made it to two lessons, most of them just did it once. I think the expectation was a bit like you go there, it has a program, you invest 30 minutes each day, and you get better. But right now, it’s not like that. Right now, it’s really just that you can talk to it about whatever topic you want to talk about, and then you can discuss vocabulary and practice. One thing I found was that it was a bit too boring. It’s a bit like, “What do you want to talk about?”

Ben Lorica: Does it have audio as well? Because pronunciation obviously is important.

Mikio Braun: You can also just have things read to you and talk, like a speech-to-text kind of thing.

Ben Lorica: What model are you using?

Mikio Braun: I’m using Gemini 3 Flash.

Ben Lorica: And then no fine-tuning?

Mikio Braun: No fine-tuning, no. Since then, I started thinking and finding prompts that are more engaging, which are not just like, “I’m a helpful agent,” blah blah. I figured what they need is to have a bit more agency, like a goal that they want to help you get better in the language. If you talk to Claude, it’s almost a bit addictive because it really tries to help you solve your coding problems. I was trying to adapt that to language.

Ben Lorica: What about if it helps you prepare for a very specific transaction? “I want to go to the subway station, to the counter, ask for this, blah blah blah, and I want to practice with Ren several times in advance.”

Mikio Braun: Yeah, you can ask it to roleplay. That’s something the LLM just does by itself. It’s actually quite interesting. But my idea now is that it needs something more. I got sidetracked trying out all these ideas, but I think it needs to be more engaging. It should really care about whether you get better or not. Maybe it also needs an additional layer on top where it talks just about where you are and creates this—it talks to you about, “Okay, how good are you with this language? Where are you struggling? What stuff do you want to learn?” And then it creates sessions you can do. You pass these sessions to another LLM, and then the LLM handles, for example, roleplaying or practicing some grammatical constructions you’re not familiar with.

Ben Lorica: Do you have the notion of assessment? In other words, how do I know that I’m getting better?

Mikio Braun: Not yet. It’s a judgment-free space. No, I’ve been using it for debugging, but also myself. I would say you become more familiar with vocabulary and you use it.

Ben Lorica: And do you think it’s improved your Japanese?

Mikio Braun: Yes, I think a bit. But you have to do it. Also, for me, it wasn’t engaging enough. There came this one thing out of it—I was experimenting with how engaging LLMs can actually be. At some point, I found a prompt that would create a text adventure for you. I did this one thing, if I can say it, called “Bjorn the Bouncer.” It’s a one-scene text adventure where you’re outside a Berlin club and you have to get in. It’s just a small, like an 8-billion-parameter Llama model. The prompt is sort of, “You are a text adventure engine, and this is the scene,” and it tells you what you have to do. You have to somehow get past the bouncer.

Ben Lorica: This is something people can do on their commute. You know how in Asia people engage with their phones—they have all these interactive novels and stuff. This can be something like that, right?

Mikio Braun: Right. Eventually, you could also pull this back into Ren, and then you can sort of have an interactive session where, I don’t know, you’re taking a walk in Shibuya and you buy something to eat or so.

Ben Lorica: Yeah, yeah, this would be better.

Mikio Braun: This would be cool. You can try it out—it’s bjornthebouncer.com. You can try to get past him. Considering how little went into it, it’s really funny what you can do, because it will react to whatever you do. It tries to spin the story, continue with the narrative, even if you do crazy stuff.

Ben Lorica: To put you on the spot, were both these sites vibe-coded?

Mikio Braun: Yes. One hundred percent.

Ben Lorica: In Go?

Mikio Braun: No, that was still my Python phase. The Go is newer. That was all in Python—Python and Vue on the front end.

Ben Lorica: What I want you to do is vibe-code a time-series database or some piece of infrastructure. Everything is possible now; it’s just a matter of how many thousands of dollars you want to spend on inference.

Mikio Braun: I have all my development setup—I migrated my blog. I had a Jekyll blog and a WordPress blog, and I migrated them all onto my own CMS that I now have vibe-coded. I had this “I wanted to do this forever,” and I had the dump from WordPress, but I never found the time to really dig into it. Then I told Claude, “Okay, just look at the data and try to get the data out of there,” and then it did it.

Ben Lorica: Interesting. Hey, by the way, let’s wrap this up. A question for you: obviously before you went to industry, you were a research scientist at a university. If you were still there, and students came up to you and said, “Should I still study computer science? Or if I do study computer science, how do I get that first job?” what do you say to those two questions?

Mikio Braun: I still think computer science is so interesting, right? So yes. But I agree with you: they have to learn the tools, but they also have to learn about the structure of software. It’s no longer about memorizing libraries or syntax, but just understanding how software is built, and using these tools to build a lot of side projects just for fun.

Ben Lorica: What about the notion that getting that first job may require not only having these side projects, but also knowing something about the domain of the company, right? So that might be something you can—if you already have an interest in a specific domain, finance or healthcare or whatever—learn that too, besides learning computer science.

Mikio Braun: Yes, that’s true, because that hasn’t changed. I agree with you. Before, maybe just being able to code was something that people were looking for, but now you sort of have to be broader.

Ben Lorica: Yeah, yeah. And with that, thank you, Mikio.

Mikio Braun: Yeah, thanks for having me.