Chang She on Multimodal Data, Agent Memory, and the Future AI Data Stack.
Subscribe: Apple • Spotify • Overcast • Pocket Casts • YouTube • AntennaPod • Podcast Addict • Amazon • RSS.
Ben Lorica talks with Chang She, co-founder and CEO of LanceDB, about the limitations of traditional analytical data tools like Pandas and Parquet when applied to AI workloads. They discuss the rise of the “multimodal lakehouse,” the unique data storage and retrieval challenges introduced by AI agents, and how LanceDB provides both lightweight embedded memory and massive-scale infrastructure. Chang also shares his perspective on the future of the enterprise data stack and the shifting economics of open-weight models. [This episode originally aired on Generative AI in the Real World, a podcast series I’m hosting for O’Reilly.]
Interview highlights – key sections from the video version:
-
-
- Why Traditional Analytics Tools Fall Short for AI Data
- Why Vector Databases Are Too Narrow for Modern AI Workloads
- The Lance Format: A Lakehouse Format for Multimodal Data
- What “Open” Means for Lance and How It Fits the Data Ecosystem
- How Data Architects Should Think About Lance Alongside Parquet and Iceberg
- Core LanceDB Workloads: Search, Curation, Enrichment, and Training
- Why Multimodal Data Is Not Just a Bay Area Bubble
- The Multimodal Lakehouse and Its Multiple Meanings of Multimodality
- Agents, Scale, and the Explosion in Data Infrastructure Demands
- Agentic Workloads, Ephemeral Systems, and Object Storage as the Source of Truth
- Agent Memory, Lightweight LanceDB, and Large-Scale Retrieval
- OpenDevin, Agent Adoption, and Why LanceDB Fit the Use Case
- The Biggest Data Infrastructure Mistake Enterprises Can Make
- What the Future AI Data Stack Might Look Like
- Open-Weight Models, Enterprise AI Economics, and the Next Major Trend
-
Related content:
- A video version of this conversation is available on our YouTube channel.
- The Rise of the Multimodal Lakehouse
- Trends shaping the future of AI infrastructure
- Inside the race to build agent-native databases
- Richard Garris and Barry Dauber → The Gap Between AI Hype and Enterprise Reality
- Umur Cubukcu → Building the Open Source Alternative to AWS
- Matthew Glickman → The Junior Data Engineer is Now an AI Agent
- Mike Freedman and Ajay Kulkarni → Is Your Database Ready for an Army of AI Agents?
Support our work by subscribing to our newsletter📩
Transcript
Below is a polished and edited transcript.
Ben Lorica: All right, so today we have Chang She, CEO and co-founder of LanceDB, which you can find at lancedb.com. The tagline is “build better models faster.” So Chang, welcome to the podcast.
Chang She: Hey Ben, super excited to be here.
Ben Lorica: All right, so we’ll jump into the core topics. But a bit of background here for our listeners who may not be familiar with you. You worked on Pandas — you were a core member of the Pandas team. You were also very early on with Parquet. And at some point, you became convinced that for AI workloads, those former tools you worked on — Parquet and Pandas — were not enough. So what was the moment of realization for you that these traditional tools, which were foundational for analytics, were lacking?
Chang She: Absolutely. So I worked at a company called Tubi TV, which is video on demand and streaming — movies and TV. It was there that I ended up dealing with a lot of what I would call AI data. We had embeddings for personalization, video assets, image assets, audio, text for subtitles — and none of that really fit into the traditional data stack: Pandas, Spark, Parquet, even Arrow. That was the inspiration for me to start LanceDB.
Ben Lorica: And Chang, at this point, do you think more people are aware of this disconnect between those tools and the kinds of tools they’ll need moving forward?
Chang She: Yeah, absolutely. When I talk to data infrastructure folks who are building and managing that stack for this kind of data, there’s broad recognition that something has to be done — that the existing stack is just not sufficient. And what’s more interesting is that this data is also becoming a lot more valuable because of AI.
Ben Lorica: So obviously, before you came on the scene, there was this wave of vector stores and vector databases, which were optimized for retrieval. Let’s say I’m a listener and all I have is text. Do I need anything beyond a vector database?
Chang She: Absolutely. Even if you just have text and text embeddings, the creation of those embeddings and the management of all those data assets — your metadata, the actual documents, how to serve that — a lot of that falls outside the purview of a vector database. Vector databases tend to be very narrow solutions for a very narrow problem, whereas something like LanceDB takes a broader view: when you have AI data, what are all the things you need to do to it throughout the life cycle of application development or model development? How do you build a tool and a system that simplifies your life by having one system to handle all the major workloads throughout that life cycle?
Ben Lorica: And by the way, for our listeners, there’s LanceDB and then there’s the open Lance file format. I want to ask you about the file format in a second, but you mentioned something about vector databases — you were saying that the vector database folks never really positioned themselves as responsible for creating the embeddings. They just assume you’ll show up with embeddings already in hand.
Chang She: That’s right. But even if you take that narrow view, what we find in enterprises today is that a lot of folks have offline generation processes in the data lake itself, where they chunk up documents, generate the embeddings, and then have what they call an offline store. Then they have to copy that data into a vector database for serving. So there’s a lot of data syncing and data movement, which creates expense and complexity — and that’s even for just text-based workloads, even for pure vector search.
The second issue is that vector databases often don’t pay enough attention to the overall retrieval stack. The task for users is “I want to find the right data in my dataset,” and vector search is just one technique. You have many different techniques — full-text search, SQL queries, filters, regexes — all of which go into a rich and accurate retrieval process. Vector databases generally don’t expand beyond simple semantic or vector search.
Ben Lorica: So I mentioned the Lance open file format, which people often shorthand as “Parquet for AI,” but it’s actually both a file and table format. Maybe give our listeners a high-level description of the Lance format and why it’s become so popular.
Chang She: Yeah, absolutely. Lance is what we call a lakehouse format, and it’s quickly becoming the new open-source standard for multimodal data. What I mean by a lakehouse format is that it spans a couple of different layers.
The first is the file format layer — the equivalent of Parquet in the stack, which is about how you lay out data in a particular file. The innovation in Lance at this layer is that it’s really good for random access without sacrificing scan speed. Our files are actually smaller than Parquet for many AI datasets.
The next layer is what we call the table format, which is occupied today by projects like Iceberg, Delta, and Hudi. Lance brings much better designs and more optimizations for machine learning experimentation at this layer — easy backfills, 2D data evolution, handling really large blob data like videos and images, and a branching strategy that supports true Git-for-data semantics, taking the best of Parquet and Iceberg.
And then there’s a third layer around indexing, enabling fast scans, fast searches, and fast queries. When you put all of that together, that’s what we call the Lance lakehouse format.
Ben Lorica: So I describe Lance as open. Can you clarify what that means?
Chang She: Yeah, absolutely. Number one, Lance format is open source — Apache 2.0 licensed. You can find it on our GitHub. We have community governance and PMCs with lots of external contributors. But beyond that, there’s open source and there’s open source. Lance format is designed for a truly open architecture as well. So not only is it open source, it also plays really well into the rest of the data ecosystem.
When people compare us to Parquet and Iceberg, we’re not designed as a head-to-head competitor. We slot into the same Polaris data catalog, for example, so you can have one unified view across all your datasets, with Parquet/Iceberg under the hood for BI data and Lance for your AI data. And Lance itself plugs in natively to Spark, Pandas, Polars, DuckDB, and any open data tooling you’re already using.
Ben Lorica: So operationally, Chang, if I’m a data architect, should I think of Lance as: I have Parquet and table formats like Delta and Iceberg for my structured data, and then for unstructured data — which could mean video, audio, or even text — I bring in Lance? Is that what happens in practice?
Chang She: Yeah, often. What data infra folks and data engineers interact with is the tooling — their data pipelines, their Spark jobs, their search applications. Those are the jobs that actually interact with the underlying storage. And that data transfer process is actually really easy through Apache Arrow. Most of the time it’s really just a one-line code change — the same Spark code, for example, except instead of writing to Parquet, you write to Lance.
Ben Lorica: So you can completely utilize your overall data pipeline and bring all of your tabular data, metadata, multimodal data, and embeddings all in the same place.
And in terms of workloads, Chang — you alluded to the fact that the previous generation of vector stores excelled at something very specific, maybe retrieval. Is Lance equally specialized, in the sense that it’s great for X but doesn’t excel at other things? Describe the kinds of workloads that teams using Lance are actually running.
Chang She: Yeah, absolutely. At a high level, LanceDB — our enterprise data platform — excels at helping customers manage really large-scale AI data: embeddings for search, adding new features and extracting new columns, enriching datasets, doing data curation and exploration, and then feeding that data to GPUs quickly for distributed training jobs to achieve as high GPU and model FLOPS utilization as possible.
Ben Lorica: So you’ve used the word “multimodal” a few times, and I’ve always been a proponent of making sure data infrastructure is positioned for a multimodal world. But sometimes I question that assumption. Is multimodality a Bay Area bubble thing? If I go to the East Coast and talk to Goldman Sachs or an insurance company, are they still grappling with legacy systems that are mostly structured data, and what they really want is to do fancy AI stuff with agents using the old-school data they already have?
Chang She: Yeah, absolutely. When we talk about multimodal data, what often comes to mind first is video generation, image generation, self-driving cars — cutting-edge, high-tech applications. But if you look at more traditional enterprises, they already have a lot of multimodal data. Insurance companies, for example, have millions of documents, PDFs, and contracts. Insurance especially will have top-down aerial views of houses and property boundaries for risk assessment.
Before AI, it was just really hard to get value out of that data, so they hadn’t paid much attention to it. It’s kind of like when I clean up my house — I just move all the mess into a back room or storage so I don’t have to think about it. My wife yells at me all the time because she opens the storage and everything falls out. I feel like that’s what traditional enterprises have done with multimodal data: they didn’t know what to do with it, so they stuck it in some directory in SharePoint and left it there. But there’s actually a tremendous amount of value in it, and AI is helping them unlock all of that.
In the next few years especially, I think we’re going to see a lot more attention paid to: “Okay, if we can get a lot more value out of this data, how do we actually manage it? How do we work with it? How do we combine it with the rest of our data stack so that it’s governed within a single unified entity?”
Ben Lorica: So the hot thing a few years ago in data infrastructure was the lakehouse — a great term that was introduced.
Chang She: Yeah. I wonder who came up with that one.
Ben Lorica: Yeah. So you folks are starting to use the term “multimodal lakehouse.” Compared to the lakehouse, which I think is now widely adopted — where is the multimodal lakehouse today in terms of maturity, and where does it still need work?
Chang She: Right, absolutely. For the audience who may not be as familiar: the really simplified way I think about the lakehouse is that you have all your data in one place in the data lake, and then you have a combined data warehousing layer on top that provides structured tables and structured ways to run workloads on all of that data.
The way we think about the multimodal lakehouse is in a couple of different ways. First, the data changes — you go from purely tabular or clickstream data to all sorts of multimodal data, from embeddings to all your multimedia types. That changes a lot about how you read and write data efficiently, how you manage it, and how you synchronize it with metadata.
Second, the workloads are also multimodal. You’re not just running SQL and analytics workloads anymore. You’re thinking about search, training, feature engineering, and how your lakehouse interacts with GPU clusters — all things that traditional lakehouses are not very good at.
And then the third dimension where “multimodal” applies is that traditional lakehouses tend to be good only at batch offline processing. If you want to do online serving, you typically need to introduce an OLTP database or a dedicated serving system. With LanceDB, because of the innovations in the format, you can actually do both at the same time. So the online-offline scenario also becomes multimodal in this sense.
Ben Lorica: So if I understand correctly, you’re multimodal in multiple senses: multimodal data types and multimodal workloads. Is that right?
Chang She: That’s right.
Ben Lorica: By the way, some companies are already doing that. Databricks does it — they call it HTAP. And recently another company in this space attempted an acquisition. They’re very strong in Postgres… Neon?
Chang She: Oh, yeah, yeah, yeah.
Ben Lorica: So they have transactions and analytics. What you’re saying is that your vision of the multimodal lakehouse includes hybrid transactional-analytics, multiple multimodal data types, and multimodal workloads. Is that a fair summary?
Chang She: Yep.
Ben Lorica: So surely, Chang, certain aspects of what you just described are more fleshed out than others. What areas do you anticipate you’ll be working hardest on in terms of this vision of multimodality?
Chang She: I think number one is actually scale. Scale is the biggest driving factor this past year and into this year, and a lot of that has been driven by the rise of agents. Because of agents, data volume, query throughput, performance, and latency requirements have all just exploded. That’s the area we find we’re uniquely suited for and that we’re pushing hard on. When we talk to customers, we often say a trillion is the new billion. We have folks who are operating at roughly a thousand times the scale they were at just a year or two ago.
Ben Lorica: So I guess the hack that people will try for some of these things is just to put files in S3 and use a database somehow. Are you still seeing a lot of people trying to do that?
Chang She: Yeah. There are a few approaches doing that, and there’s a general trend — because of data scale — toward object storage as the only cost-effective and scalable storage backend for a lot of these newer data storage systems. The challenge for data infrastructure providers is: how do you maintain scalability and high performance while preserving the cost advantages of S3 and object storage? That’s the difficult problem.
We actually have a recent blog post about how we do that at 10 billion vector scale. At smaller scales, it’s actually easy — you just pull all the data from S3 into a caching system and serve it from there. Tons of open-source projects, including Lance, can help you do that pretty effectively. The real challenge is at scale. If you have 10 billion vectors, object storage is essentially your only cost-effective option. But imagine the query times if you were targeting S3 directly. The indexing, search, caching, and distribution challenges become a big distributed systems problem — and that’s what we solve.
Ben Lorica: So like you said, many data engineering and infrastructure teams are trying to figure out what their infrastructure looks like in a world of agents. Imagine the enterprise equivalent of Open Devin, where a single employee might have 10 AI delegates or AI assistants. Some of the things that come up: identity management, access control. And maybe some of these AI agents don’t need anything permanent — they just want something ephemeral. Spin up a LanceDB for a minute and then tear it down. Are these the kinds of things you’re starting to think about?
Chang She: Yeah, for our cutting-edge customers, that’s already the reality. We specialize a lot in infrastructure for model training. A researcher might have a feature idea with two input features, each with 10 variants, and some output feature that combines them — now you’ve got a hundred different variants. Before, there was a limited number of variants an individual researcher could test manually. Now they can use agents to run all of that automatically overnight. Humans go to sleep, but the agents are putting a lot of load on the underlying data infrastructure.
So this year we’re talking about going from hundreds of queries per second with plain RAG a couple of years ago to 100,000 queries per second in this world of agents. And when it comes to security and compliance, there’s a lot of churn around sandboxing and ephemeral systems. With object storage as the source of truth, this actually works out well — when you have hot data, you cache it, serve it for a time, and then let it go. The cache can expire and be replaced by the next hot workload, without having to pay for expensive memory and NVMe for all of your data.
Ben Lorica: The other thing that comes up with agents right now — the hot topic that seemingly a gazillion people are working on — is memory. If I have a bunch of agents, a multimodal lakehouse, and now a memory system, I have three different systems to maintain. What’s your take on agent memory?
Chang She: Yeah, absolutely. LanceDB open source is actually the main memory plugin for Open Devin and a number of other agent frameworks like Crew AI. For a lot of these agent frameworks, there are a couple of key requirements. Number one is just being lightweight and super easy to use. LanceDB is the only one that supports hybrid search, reranking, and fairly sophisticated retrieval mechanisms without having to run a separate service.
Ben Lorica: Before you continue — this notion of lightweight. On one hand, there’s the multimodal lakehouse, and a lakehouse is never lightweight. But it seems like you’re also positioning yourselves in the DuckDB or SQLite world of very lightweight, embedded tools. Can you clarify what you mean by lightweight when you’re supposedly a lakehouse?
Chang She: What I mean by lightweight in this context is that from an agent’s perspective, it simplifies a lot of things if you don’t have to connect to another service or talk to another system to access your memory and retrieve from it. That’s what we mean. The open-source, lightweight version…
Ben Lorica: But then you’re also large-scale infrastructure. How can you be both?
Chang She: Right — so why would you bring along a big piece of infrastructure if you’re a lightweight agent?
The LanceDB open source is actually very lightweight. There’s no heavy infrastructure involved. This is why it’s perfect for memory — because memory is often very ephemeral. You interact with a session, and when that session ends, you don’t want to retain all of it. At most, you might compress some of it and retain it for downstream historical processing, but most of the time it’s just gone. That’s what we mean by lightweight.
And then for large-scale retrieval — if you have a large historical corpus, if you’re working in a corporate environment, or if you have an agent searching through patent history, for example — that’s where the infrastructure comes in. If you have a petabyte of data to search through, an embedded library isn’t going to cut it. You need a scalable system. But from the agent’s perspective, it’s the same interface — just as easy to use, with a scalable system hidden beneath the surface.
I think for agents, that’s just one of the requirements. The other is having sophisticated enough retrieval that agents can actually find what they’re looking for. Different agents will want to look for data in different ways, and being able to support all of that without a million different plugins for each modality is also very important.
Ben Lorica: By the way, I was playing devil’s advocate there — I actually use LanceDB every day on my laptop. So it really can be something you run entirely in memory on your local machine.
Chang She: Yeah. What we find is that when you make it really easy for agents to use, scale really takes off. The way we look at it is that agents are kind of like an ideal gas — if you make it easy for them, they will expand to fill whatever compute, data, and infrastructure you give them. We’ve seen explosive growth in query throughput, and because of complex agent pipelines, there’s also compression in latency requirements. Agents now want 20 to 100 millisecond latencies.
We also see a lot of data proliferation. One of our largest users told us they’re now managing something like a billion tables, just because they have so many agents and so much data to manage. On any computational or data management dimension you can think of, agents will expand to whatever capacity you give them.
Ben Lorica: So this is a two-part question. LanceDB kind of blew up around the launch of Open Devin. How did the Open Devin community land on Lance, and have they told you what they liked about it?
Chang She: A lot of it is what we just talked about — lightweight, easy to use, the model…
Ben Lorica: But how did it actually happen? How did they land on Lance specifically?
Chang She: My recollection is that initially it came as a recommendation from Claude or something like that. And I think it was the only option out there that met their requirements: embedded, lightweight, sophisticated retrieval, and support for both in-memory, local NVMe, and object storage.
Ben Lorica: Interesting. And since then, has that relationship continued?
Chang She: Yeah, we continue to see strong engagement from the Open Devin community. Our open source continues to grow — at last count, we’re at around 14 million downloads a month across our open-source projects. We’re super excited about supporting that community.
What we’re seeing now is demand for a more file system-like interface. It’s often easier for agents to interact with a file system-style interface — and I’m choosing my words carefully, I don’t mean an actual file system, just an interface. This is something we’re actively exploring: what would it look like to put a file system interface over LanceDB or the Lance format. Based on the usage patterns we see from agents, it’s fairly straightforward to do. If you’re listening and this sounds interesting, we’d love to have early users come check it out and test it with us.
Ben Lorica: It’s interesting — as you were talking, it dawned on me that the various notions of multimodality you described earlier might actually be another reason people landed on Lance. There are other vector search systems you can run embedded or in memory, but if you want to build more capable agents going forward, the multiple dimensions of multimodality you described might come in very handy.
Chang She: Yeah, absolutely. I’ll say this: I’m sort of a multimodal maximalist. My prediction is that in five years, “multimodal” won’t even be a word anymore — it’ll just be “data.” It’ll be multimodal by default. People will just say “data” and it’ll be inclusive of all modalities. And when we talk about data engineering, there won’t be “multimodal data engineering” — it’ll just be multimodal by default.
Ben Lorica: Interesting. Which actually, as we’re winding down, brings me to my next question. If I’m a CXO or an architect at an enterprise, what’s the dumbest data infrastructure decision I can make right now that will hurt my team in the next year?
Chang She: I think we’re already seeing big pain points around creating new AI data silos. One pattern — I wouldn’t call it an anti-pattern, but definitely a pain point — is that if you’re a CIO, CDO, or similar, chances are many of your teams have charged forward with their own AI applications and AI stacks. Now the centralized data platform team is faced with managing 10 different vector databases and maybe five different ways to store AI data — some in images, some just embeddings, many different modalities.
That becomes a big pain point as companies move from “let’s try out AI in this one area” to mass-market AI transformation, where large swaths of the enterprise are AI-assisted or AI-native. So if I were a CIO, CDO, or CTO at a larger enterprise, I would be thinking ahead about how to set up all of my teams across the enterprise for success — how to allow them to move quickly and iterate fast, without creating an untenable challenge for the central platform team. That’s actually exactly what we’re building for at LanceDB.
Ben Lorica: So if your thesis holds — that multimodal data matures over the next few years, and so do agents and everything that comes with them including memory — what does the data stack look like in a few years?
Chang She: In broad strokes, I think the base layers won’t change all that much. The infrastructure layer stays roughly the same — there’s going to be object storage, a storage layer — and then the compute layer will start to change. I think what we’ll see is…
Ben Lorica: Ray! Ray!
Chang She: What I think we’ll see is that the middle layer of data tooling will start to melt away a little bit because of agents.
Ben Lorica: Define data tooling.
Chang She: I don’t want to name names, but there’s a lot of developer middleware for data that sits neither at the infrastructure layer nor at the layer that interfaces directly with agents and users. That middle layer, I think, will melt away or at least be significantly refactored. There’s going to be a lot of churn there.
I think the UI workflow layer will largely go away and be replaced by agents. But useful data models will still be useful and will stick around. Yes, you can have agents talk directly to random bits on S3, but why waste all that intelligence? It’s not worth the token cost. A well-formed data model is the right base layer for agents to interact with. So I think what we’ll see is a melting away and reformatting of that middle layer. When I talk to data builders and AI infrastructure builders today, I think we’re all seeing this at the same time.
Ben Lorica: So what I describe to people as the forward-looking stack has two main parts. One is the multimodal lakehouse built around LanceDB and the Lance format. The other is the AI compute layer — what I call the PARK stack: PyTorch, AI foundation models, Ray, and Kubernetes. I see that combination quite a bit. Do you think of these as complementary?
Chang She: Yeah, absolutely. We have close relationships and native-level integrations with Ray, Spark, and PyTorch. I don’t think any of those are going away. PyTorch interfaces directly with developers, while Spark and Ray are very much infrastructure-layer projects. They’re definitely still around.
Ben Lorica: And Kubernetes is still around.
Chang She: Yeah, Kubernetes is definitely still around.
Ben Lorica: Yeah, yeah. So what big trend are you paying attention to right now that we haven’t talked about yet? This is how we close.
Chang She: What’s been really interesting that we haven’t touched on is the rise of open-source models — or more precisely, open-weight models. I think that’s going to have a big impact starting later this year or next year, as AI adoption within enterprises matures.
Ben Lorica: Open weights — you mean open-weights models. And who’s the main source right now? Because right now the better ones are largely coming from China.
Chang She: That’s right. And I think the gap between proprietary state-of-the-art models and open-weight models that everyone can access and run reasonably cheaply is closing. That makes it easier for large enterprises to adopt generative AI across all their different workloads. It becomes more of a unit economics play — I don’t have to spend a billion dollars a year on API calls to whoever the provider may be.
Ben Lorica: Which is why China is pushing this, right? They can’t afford those costs, and they may not even have access to the top proprietary models. Plus they have compute restrictions, so their models have to be as efficient as possible.
Chang She: Exactly. The economics are very different. In the West, we basically have an oligopoly of a few well-funded major AI labs. Their incentive is to charge forward with state-of-the-art closed models and monetize via API — which is why you see Gemini, Anthropic, and OpenAI licensing data from all over the place.
The Chinese labs tend to do distillation, which means your ceiling is basically the model you’re distilling from. And then there’s the flywheel effect — OpenAI, Anthropic, and Google have a lot of users, so their models get better as more people interact with them.
Ben Lorica: That’s right. But don’t forget the open-weight models from China also have a lot of users.
Chang She: Yeah. Here’s how I think about it: as AI adoption grows exponentially within enterprises, they’re going to be extremely motivated to invest in their own inference on open-weight models, just because of the drastic difference in token costs. Because of that economic incentive, I think there’s going to be a lot more motivation for companies to create better open-weight models.
If you look at the open-weight models coming out of China, the fact that they can produce models of this quality on really limited hardware is telling. A team in the US theoretically should be able to create much better open-weight models given the resources available.
I also don’t think the distillation argument is actually that strong anymore. If you look at the report Anthropic put out, the amount of distillation they accused DeepSeek of doing is actually pretty negligible. MiniMax is a more legitimate offender, but DeepSeek basically didn’t do that much. So I don’t think distillation is a major factor in the quality of open-weight models.
There is a remaining quality gap — maybe a three-to-four-month gap between open-weight models and state-of-the-art. But what’s interesting is that experiments have shown open-weight models are cheaper and much faster. For a coding agent task, you can do a one-shot with a SOTA model, or you can do multiple rounds and iterations with an open-weight model and get the same quality, at lower total token cost, in roughly the same time or even faster.
The barrier right now is really a familiarity and skill gap — doing few-shot prompting is more complex than most people want to deal with right now. So the current pattern is to go into production with SOTA models, and then reach a cost-prohibitive moment where you identify areas that don’t require heavy intelligence but still carry high token costs, and swap those out for open-weight models. I think that will happen more and more across enterprises, and it’s going to be a big trend to watch this year and next.
Ben Lorica: And actually, as you mentioned, maybe my conversations are a product of where we are in the adoption curve — which is still early. At this stage, you deploy with state-of-the-art models because you’re just getting started. As your agent or application gets used, you start paying attention to cost and latency, and then you can think about swapping models. Hopefully we’ll also see some Western labs start cranking on open-weight models again. Meta seems to be off the table for now, the Gemma folks produce models but they seem aimed at on-device use, and maybe there’s an opening there for someone to step in — especially as people get more clever about training, and tools like Lance make training more affordable. We’ll see what happens. And with that, thank you, Chang.
Chang She: Thank you, Ben.

