When AI Agents Need to Talk: Inside the A2A Protocol

Ben Lorica

3 months ago

Heiko Hotz and Sokratis Kartakis on A2A, MCP, Router Agents, Security.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.

Heiko Hotz and Sokratis Kartakis of Google Cloud discuss the Agent-to-Agent (A2A) protocol, a new standard for enabling collaboration between AI agents built on different frameworks. They cover the practical differences between A2A and tool-use protocols like MCP, outlining how multi-agent systems can be architected like microservices for better performance and scalability. The conversation also delves into critical security considerations and the future of agentic systems in the enterprise. [This episode originally aired on Generative AI in the Real World, a podcast series I’m hosting for O’Reilly.]

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript

Related content:

A video version of this conversation is available on our YouTube channel.
Agentic AI Applications: A Field Guide
Why Your Multi-Agent AI Keeps Failing
Beyond RL: A New Paradigm for Agent Optimization
Sagar Batchu → From Human-Readable to Machine-Usable: The New API Stack
Andrew Rabinovich → Why Digital Work is the Perfect Training Ground for AI Agents
Jakub Zavrel → How to Build and Optimize AI Research Agents

Support our work by subscribing to our newsletter📩

Transcript

Below is a heavily edited excerpt, in Question & Answer format.

Protocol Overview and Motivation

Why do we need agent-to-agent (A2A) communication when single-agent systems are still maturing?

The need for A2A arises from the reality of current development practices in large organizations. Teams are already building agents using different frameworks like LangGraph, CrewAI, and Google’s Agent Development Kit (ADK) across various business units. When these teams want to integrate their agents, they face a choice: either modify existing code to accommodate different frameworks, or establish a common protocol that allows agents to communicate seamlessly.

A2A provides this standard “language” that enables agents to interoperate without rewriting application code for each new integration. It’s a forward-looking protocol designed for enterprise AI where one team might develop an agent for financial analysis while another builds one for travel booking. This allows other teams or even external partners to leverage these capabilities through a protocol that goes beyond traditional APIs. Unlike classic stateless request/response APIs, agents handle complex, stateful, and asynchronous tasks—they accept a task, work asynchronously, emit intermediate updates, and return results. A2A standardizes this stateful exchange between agents.

How does A2A differ from Anthropic’s Model Context Protocol (MCP)? Are they competing?

A2A and MCP are complementary rather than competing protocols, solving different problems:

MCP focuses on single-agent tool integration using structured schemas and one-shot interactions. It’s designed for an agent communicating with a tool through fixed, well-typed input/output contracts. Think of it like a vending machine: you provide specific parameters and get a defined output.

A2A connects agents with each other using natural language descriptions and stateful, asynchronous interactions. It’s for agent-to-agent communication handling complex, multi-step tasks with ongoing communication. Think of it like a concierge service: you make open-ended requests and receive updates over time.

The key distinction is whether you’re wrapping a simple, structured function in a tool (use MCP) or a complex, stateful capability in another agent (use A2A). In practice, you may have agents that use MCP tools internally while coordinating with other agents via A2A.

Architecture and Multi-Agent Systems

What does a typical multi-agent architecture look like using A2A?

A common pattern involves a router (or “boss”) agent serving as the primary user entry point. This router agent interprets requests, decomposes them, and delegates to specialized “server” agents. For example, when booking a trip to Paris, the router would coordinate with separate agents for flight booking, hotel reservations, and car rentals.

Each specialist agent has specific capabilities and tools, making them more focused and less likely to make errors compared to a single agent trying to handle all tasks. The router maintains context and ensures information flows between specialist agents as needed—for instance, ensuring the hotel agent knows the flight dates booked by the flight agent. While some frameworks have built-in shared memory concepts, the router can manage this information flow even when different agents use different frameworks or technologies.

Why use multiple specialized agents instead of one powerful “super agent”?

This approach mirrors the shift from monolithic applications to microservices and offers several practical advantages:

Improved performance and reliability: LLMs can get confused when given too many tools or responsibilities. A specialized agent with a limited toolset is more likely to reason correctly and select the right tool for a given task, reducing errors and increasing accuracy.

Easier development and maintenance: Small, dedicated agents are simpler to build, test, evaluate, and deploy independently. Different teams can work in parallel on their respective specialists without blocking each other. Smaller agents are also easier to stage, evaluate, and isolate when faults occur.

Better testability: Focused agents with narrow responsibilities can be thoroughly tested and validated before production deployment, improving overall system reliability.

Scalability: You can create complex systems by composing smaller, reusable agents. Eventually, you might see multi-agent systems communicating with other multi-agent systems (“systems of systems”) for even greater scale and domain expertise.

Do all agents in a multi-agent system have to use the same LLM?

No, and this flexibility is a key advantage. Different tasks require different model capabilities. You can choose the right model per task—lightweight local or open-source models can handle simple steps while heavier proprietary models (like frontier-class models) can be reserved for complex reasoning. This “right-sizing” of models optimizes both cost and performance across the entire system. A2A doesn’t force a central model choice, allowing each agent to use the most appropriate model for its specific responsibilities.

How do agents share information and maintain state across the system?

Information sharing depends on the architecture. In a simple setup, the router agent holds context for all specialist agents. If all agents are built within the same framework (like Google’s ADK or LangChain), shared memory concepts are built-in. However, even when different agents use different frameworks, the router agent can manage information flow by determining what data each specialist agent needs and passing relevant context between them explicitly. The protocol supports both approaches, giving teams flexibility in how they implement state management.

Security, Trust, and Governance

What are the main security considerations when building multi-agent systems with A2A?

Security becomes more complex with multiple agents since each may use different models and have different exposure points. The A2A protocol addresses this with built-in authentication support including API keys, HTTP authentication, OAuth 2.0, and OpenID Connect. Each agent declares its required security method in its Agent Card, ensuring only authenticated clients can interact with it.

However, security is a shared responsibility between the protocol and the developer. The protocol provides the framework, but developers must implement robust guardrails at multiple levels: before and after calling LLMs, before and after calling agents, before and after calling tools, and at the business logic level. The attack surface expands with multiple agents, introducing risks like prompt injection via messages, tool misuse, data exfiltration between agents, and cascading errors.

Mitigate these risks with layered guardrails at each hop, schema validation on inter-agent payloads, least-privilege credentials per agent, and strong observability tracking who called what, with which inputs, and why.

How can organizations establish trust with third-party or untrusted agents?

Vetting is essential, similar to evaluating Python packages or MCP servers. Organizations need processes for agent validation that include maintaining allowlists and registries of approved agents. The Agent Card can signal trustworthiness by including metadata beyond just capabilities—such as uptime statistics (99.5%), success rates (87%), certifications from third-party security audits, pen-test status, and compliance evidence.

Organizations should implement agent registries where vetted agents are cataloged with their security credentials and performance metrics. The A2A protocol includes a concept of extensions that allows developers to define custom “profiles” enforcing specific rules for groups of agents. For example, a company could create a profile mandating certain business-level encryption for all messages, ensuring consistent security standards across their multi-agent ecosystem.

Observability and Large Action Models

What role does observability play in multi-agent systems, especially in regulated environments?

End-to-end observability is non-negotiable, particularly in high-stakes environments like banking or finance. Organizations need complete audit trails of all agent interactions, including every LLM request, decision, tool call, inter-agent message, task timelines, and intermediate updates. This is a regulatory and safety requirement to explain exactly what happened during any agent interaction.

Even if comprehensive logging slows down processes, it’s essential for compliance and debugging. Teams should instrument everything with task IDs, progress events, and tool traces. Some organizations deliberately accept slower throughput to gain the auditability and explainability required for their use cases.

How do Large Action Models (LAMs) that can interact with UIs impact agent development and security?

Large Action Models that can interact with computer screens represent both opportunities and significant challenges. While direct API integrations are preferred when available, many knowledge worker tools lack such integrations, making screen-based interactions necessary. This enables powerful use cases like automating UI testing for developers or providing hands-on technical support for non-technical users.

However, LAMs introduce substantial security and trust challenges. They require enhanced observability—logging every action taken on behalf of users. Organizations should treat screen-control as high-risk: isolate environments, record sessions, enforce explicit user consent, and maintain clear policy disclosures. It’s critical to inform users when they’re interacting with an agentic solution and obtain their consent. Prefer headless/API paths when possible and fall back to screen-driving only with strict controls and complete transparency.

Implementation and Protocol Details

What are the key technical concepts developers need to understand for A2A?

For developers familiar with RPC or REST, the learning curve is gentle—the protocol can be learned in about half a day. The core concepts are intuitive:

Agent Card: Like a business card that describes the agent’s capabilities, security requirements, communication methods (live streaming or messaging), and how to discover/connect/authenticate with the agent.

Task: The central concept where clients send requests to remote agents. This is the stateful unit of work where agents send progress updates asynchronously and eventually return a final result.

Extensions: Allows customization of data structures, methods, and profiles to meet specific organizational requirements. This enables platforms to enforce cross-cutting policies without forking the specification.

Official SDKs are available for JavaScript/TypeScript, Python, Java, and .NET, so developers don’t need to implement the wire protocol themselves. In Google’s ADK, defining an A2A-compatible agent can be accomplished with a single line of code plus an agent card configuration.

Can organizations extend the protocol for their specific enterprise needs?

Yes, A2A supports extensions to add data fields, methods, and profiles. Organizations can define custom requirements like “all agents must use encryption X” or “emit metric Y” for compliance or operational needs. This extensibility lets platforms enforce cross-cutting policies while maintaining compatibility with the core protocol, providing flexibility without fragmentation.

Practical Adoption and Implementation

What’s the current adoption status of A2A?

Interest and adoption are growing, particularly among organizations with multiple business units building agents independently. Dozens of customers are already using it, with major companies like Microsoft announcing their intention to adopt the protocol. Real-world implementations include insurance platforms where broker and underwriter companies create agents that communicate through A2A protocol.

The protocol’s recent move to the Linux Foundation is significant because it removes single-entity control and establishes shared governance across multiple companies. This multi-vendor governance, enterprise trust, and vendor neutrality make the protocol more attractive for enterprise adoption, as no single vendor can unilaterally change the protocol’s direction, providing the long-term stability enterprises require for strategic implementations.

What are the first practical steps for a team exploring A2A?

Identify 2-3 specialist capabilities to expose as agents with clear inputs/outputs and narrow scopes
Stand up a router agent and define minimal shared context or explicit context-passing rules
Publish agent cards with authentication requirements, capability descriptions in natural language, SLAs, and security notes
Add guardrails at each hop including input/output schemas, PII policies, and rate limits
Instrument everything with task IDs, progress events, and tool traces for audit and debugging
Pilot with one cross-team workflow such as quote-to-bind in insurance or travel booking subflows
Iterate by splitting overloaded agents, right-sizing models per task, and promoting vetted agents to a registry

When should teams stick with a single agent plus tools instead of multi-agent systems?

If the job consists mostly of atomic tool calls with deterministic schemas and minimal orchestration, you’ll achieve simpler operations and fewer failure modes with one agent using tools. Single-agent systems are appropriate when there’s no need for cross-team reuse, tasks are relatively simple and short-running, and there are no clear boundaries between specialized responsibilities.

Reach for A2A and multi-agent architectures once you see cross-team reuse requirements, long-running or complex multi-step tasks, clear boundaries between specialized domains, or when different teams need to maintain and evolve their agents independently.