Advancing AI: Scaling, Data, Agents, Testing, and Ethical Considerations

Andrew Ng on Scaling AI, Data-Centric Approaches, and Agentic Workflows.

Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

Dr. Andrew Ng is a globally recognized AI leader, founder of DeepLearning.AI and Landing AI, General Partner at AI Fund, Chairman and Co-Founder of Coursera, and Adjunct Professor at Stanford University. This episode explores the latest trends and challenges in scaling deep learning models, data-centric AI approaches, agentic workflows with language models, evaluating AI system performance, the open-source vs. proprietary model landscape, the role of reinforcement learning, and ethical considerations around AI training data and intellectual property. [This episode originally aired on Generative AI in the Real World, a podcast series I’m hosting for O’Reilly.]

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

 

Related content:


If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:


Transcript.

Below is a heavily edited excerpt, in Question & Answer format.

What was your most controversial contribution to deep learning in the early days?

The most controversial thing I did was advocating for scaling up deep learning algorithms. When I started Google Brain, I wanted to use Google’s computing resources to scale neural networks, which was very controversial at the time. Senior people in AI would question why I was focusing on scaling rather than algorithms. Fortunately, I had data from my Stanford lab that convinced me larger neural networks with more computing power would drive performance improvements—and that worked out. It’s actually a great compliment when work that was once controversial becomes an accepted fact, like using GPUs for deep learning or scaling neural networks, which are now obvious approaches.

You were an early advocate for using GPUs in machine learning. How did that come about?

I believe my team wrote the first research paper recommending CUDA for scaling deep learning. While we weren’t the first to look at GPUs for AI, CUDA was the tipping point that made general-purpose GPU computing practical. We started publishing papers encouraging everyone to look at CUDA and GPUs for deep learning. Now it’s just an accepted practice. Currently, I’m thinking about whether inference hardware might need to be different from training hardware, as there are multiple businesses developing specialized inference hardware, which is exciting.

Can you explain the concept of data-centric AI that you’ve championed?

I’m pleased to see how many people are working on advancing data-centric AI systematically. At NeurIPS last December, they even had big banners for data-centric AI alongside topics like deep learning and reinforcement learning. I played a small role in bringing together a community of people who have realized—sometimes for decades—that data is really important, sometimes even more important than tuning the model for certain applications. The number of research papers mentioning data-centric AI has been increasing rapidly, which shows it’s an important area with much ongoing work.

You’ve been writing about AI agents recently. What makes the current generation of AI agents different from previous approaches?

What’s new is that model-based agents can do things no prior technology has been able to accomplish. Most people use generative AI by writing a prompt and getting an output in one go—this is like asking someone to write an essay from start to finish without ever using backspace. But that’s not how humans work best. We write outlines, do research, improve outlines, write drafts, critique drafts—it’s an iterative process.

Agentic workflows enable this iterative approach, using LLMs repeatedly to write, think, research, critique, and iterate. This produces much better results. When my team looked at coding benchmarks, we found that the improvement from GPT-3.5 to GPT-4 was significant, but the improvement you get by adding agentic workflows on top of even GPT-3.5 was twice as large. We’ve been seeing this pattern for about half a year to a year now, with agentic workflows substantially outperforming zero-shot prompting.

How do AI agents differ from traditional Robotic Process Automation (RPA)?

Agents can be used for RPA, but they’re actually even bigger than that. Just as many people had a “ChatGPT moment” when they first saw it surpass expectations, I think many will have an “agent moment” when they see a single agent or multi-agent system perform a task in a way they wouldn’t have instructed it to, but that amazes them.

What differentiates agents from RPA is their planning capabilities—they can decide which actions to take without needing explicit instructions for each step. They might choose to do a web search or use a tool that you didn’t specifically tell them to use, but that helps accomplish the task. It’s not perfect 100% of the time as it’s still a maturing technology, but there are moments that are truly impressive.

How are agentic workflows enabling new types of software applications?

With agentic workflows, we can build enterprise applications that were difficult before, like accurately reading complicated documents and drawing inferences for specific business verticals. Even simple design patterns like “reflection”—where an agent generates output, then critiques its own work and uses that feedback to rewrite it—can produce significantly better results than zero-shot prompting.

I’ve been surprised by how much this simple pattern improves accuracy, especially for tricky document understanding tasks. The growing portfolio of agentic design patterns is enabling us to build capabilities that I would struggle to create any other way.

What are your thoughts on copyright issues in AI training?

This is both a legal and a philosophical question. One of the tricky issues society needs to wrestle with is whether it’s okay for generative AI to train on freely published content on the internet and whether that constitutes fair use. Personally, I’d like it to be considered fair use, but that’s for legislators and courts to determine.

I’ve noticed a fundamental difference in how people view AI. I view AI as a tool that I use—if I as a human am allowed to read content, learn from it, and later write my own synthesis in my own words, why can’t I use AI to help me do that more efficiently? Others view AI as something distinct from humans, almost like a different entity or species, which leads to different conclusions about what rights it should have. This philosophical divide affects how people feel about using AI to process and learn from content.

What’s the state of best practices and tooling for deploying generative AI in enterprises?

This is still very early. Companies like Weights & Biases, OpenAI, and Anthropic are all working on evaluation frameworks, but deployment practices are very application-dependent. For low-risk internal applications like email routing, minimal testing might be sufficient. For safety-critical applications like medical triage, much more rigorous evaluation is needed.

One reason we’re falling behind on evaluations is that generative AI makes building applications so easy. If you can build an application in a week, spending 10 weeks collecting data and creating evaluation datasets feels disproportionate. But without good evaluation, it’s hard to know if changing your model or workflow actually improves performance. This is especially challenging for subjective outputs like blog posts where human evaluation is expensive. Better evaluation tools would unlock much faster development.

Should we be concerned about the limited number of suppliers for open source LLMs?

While Meta has done a really nice job with Llama 2 and Llama 3, I’m optimistic others will step into the space. Companies like Cohere released Command R, Alibaba has released open models, and even countries in the Middle East have contributed models like Falcon.

What makes me cautiously optimistic is that training costs are falling rapidly—perhaps by 75% per year according to some estimates. This means models that cost $100 million to train this year might cost $25 million next year. As costs fall, more players can enter the market. If open source models end up slightly behind proprietary ones, that’s probably not the end of the world, but we should continue investing in and supporting open source efforts.

You’ve seemed less enthusiastic about reinforcement learning compared to other AI approaches. Has your view changed?

My PhD thesis actually involved using reinforcement learning to fly autonomous helicopters, so I’ve been involved with RL for a long time. For certain applications like RLHF in foundation models, AlphaGo, and some robotics applications, RL has been very important.

However, I’ve had a harder time seeing a clear path for how RL would scale and drive commercial value across wider business applications compared to other techniques. That said, I think society should absolutely continue working on reinforcement learning, as technologies that don’t seem commercially viable today can suddenly become valuable. I’m seeing encouraging signs in robotics with foundation models, though much of that is using imitation learning rather than pure reinforcement learning.

With research becoming more closed, is it possible the next innovation like Transformers never gets published?

It’s possible, but with so many people working on AI now, it’s actually very difficult to keep technical ideas secret for long, even if some things remain proprietary temporarily. As talent circulates between companies, ideas tend to spread. Data might stay secret longer than technical approaches.

I’m optimistic, but it’s up to all of us to keep pushing for openness and to continue advancing AI together. It’s a collective responsibility to ensure knowledge sharing continues.