2025 Artificial Intelligence Index

Nestor Maslej on Efficiency, Transparency, and Real-world Applications.

Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

Every year, I rely on the Stanford HAI AI Index Report to cut through the noise and understand the real state of artificial intelligence. The 2025 edition continues this vital work, offering rigorously validated data on AI’s expanding influence across society, the economy, and governance. To delve deeper, I recently sat down with the report’s editor-in-chief, Nestor Maslej, Research Manager at HAI.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript



Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a heavily edited excerpt, in Question & Answer format.

AI Models and Capabilities

What are the main developments in AI models over the past year?

The key developments have been reasoning-enhanced models and multi-modality capabilities. Particularly noteworthy is the rise of smaller models delivering impressive performance. We’ve seen a dramatic reduction in parameter count while maintaining performance—from PaLM with 540 billion parameters in May 2022 to Phi-3 Mini with just 4 billion parameters in May 2024, while still achieving above 60% on the MMLU benchmark. As these technical capabilities mature, the field is shifting from purely pursuing technological advancement to focusing on practical integration of these powerful tools into business workflows.

How are businesses approaching these smaller models?

Model selection depends on specific use cases. Legal teams and high-stakes applications prioritize accuracy and are willing to pay more for slower but more thorough models. In contrast, customer-facing applications often prioritize speed and responsiveness, making smaller models more valuable. The clear trend is that models are getting smaller while maintaining strong performance levels, which excites many businesses due to the potential cost savings and efficiency gains.

What advancements have we seen in multi-modal capabilities?

Multi-modal capabilities have advanced, but interaction with graphical user interfaces remains challenging. Benchmarking AI’s agentic capabilities is difficult because tasks are complex and multifaceted. Research suggests AI systems perform better than humans on simple tasks with short time budgets, but humans still outperform AI on complex tasks requiring longer periods. One study on computer science research tasks found that while AI excels with tight time constraints, humans maintain a significant advantage when tasks require several hours of work. The key challenge isn’t just advancing technological capability, but creating interfaces and workflows that make AI truly useful for businesses beyond current chatbot models.

Open Weights and Model Transparency

What trends are emerging around model transparency and open weights?

Two significant trends have emerged. First, open weight models are much stronger than a year ago. According to the Chatbot Arena, the gap between the best closed weight model and the best open weight model has narrowed from about 8 percentage points in January 2024 to just 2% by February 2025. Second, the ecosystem is becoming more transparent according to Stanford’s Foundation Model Transparency Index, though significant room for improvement remains with many providers still not fully disclosing details about their model development and training methods.

How has the competitive landscape changed between closed and open weight models?

The competitive landscape has tightened considerably. While closed weight models from OpenAI, Anthropic, and Google previously held a clear performance advantage, new models like Llama 3, DeepSeek, and Alibaba’s Qwen have effectively closed that gap. We now have a highly competitive ecosystem with 4-5 developers all releasing capable models that score similarly on benchmarks. This shift transforms open weight models into a much more viable and competitive option for businesses seeking to build AI applications.

For a full transcript, see our newsletter.