State of the art: MCP and agentic protocols, the complete map for builders

Anthropic’s Model Context Protocol (MCP) has become the de facto standard for connecting AI agents with external tools and data, with more than 97 million monthly SDK downloads and adoption by OpenAI, Google, and Microsoft. But its rapid expansion has exposed critical security vulnerabilities and design limitations that every builder should understand. This report comprehensively documents the academic papers, technical specifications, implementations, and critical analyses needed to understand MCP in depth: how it works, where it fits in the multi-agent ecosystem, and what risks and opportunities it presents for 2026.


1. Official specification and technical architecture of MCP

MCP was launched by Anthropic on November 25, 2024 and donated to the Agentic AI Foundation (AAIF) under the Linux Foundation on December 9, 2025. Its creators are David Soria Parra and Justin Spahr-Summers. The protocol uses JSON-RPC 2.0 over UTF-8, defines a Client–Host–Server architecture with capability negotiation, and exposes three server primitives (tools, resources, prompts) along with two client features (roots, sampling).

The specification has evolved through five versions:

VersionDateKey changes
2024-11-05Nov 2024Public launch. stdio + HTTP+SSE, core primitives
2025-03-26Mar 2025Streamable HTTP replaces SSE, session management, OAuth 2.1
2025-06-18Jun 2025Structured outputs, Elicitation, improved authorization
2025-11-25Nov 2025Tasks (async), Extensions framework, enterprise auth, icons

Documentation and specification

  • MCP Specification (latest 2025-11-25) — MCP Core Maintainers — Nov 2025 — https://modelcontextprotocol.io/specification/2025-11-25 — Full specification including Tasks (asynchronous “call-now, fetch-later” operations), the Extensions framework, Client ID Metadata Documents for OAuth, and URL-mode elicitation. This is the canonical reference for the protocol.
  • MCP Architecture — MCP Core Maintainers — 2024–2025 — https://modelcontextprotocol.io/specification/2025-11-25/architecture — Defines the Client–Host–Server model. The Host acts as a container; each Client has a 1:1 relationship with a Server. Bilateral capability negotiation happens during initialization. Sessions are stateful over JSON-RPC 2.0.
  • MCP Transports (Streamable HTTP) — MCP Core Maintainers — Mar 2025 — https://modelcontextprotocol.io/specification/2025-03-26/basic/transports — Specifies stdio (local subprocess, newline-delimited JSON-RPC) and Streamable HTTP (single endpoint, optional SSE, session management via the Mcp-Session-Id header, resumability via Last-Event-ID).
  • MCP Authorization — MCP Core Maintainers (with contributions by Aaron Parecki) — Nov 2025 — https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization — OAuth 2.0 framework with protected resource metadata discovery, client registration, and enterprise authorization extensions.

Official Anthropic blog posts

Third-party technical writeups


2. Ecosystem, implementations, and industrial adoption

MCP has achieved what few open protocols do: near-universal adoption in under a year. Every major AI provider — Anthropic, OpenAI, Google, Microsoft — actively supports MCP. The Agentic AI Foundation under the Linux Foundation provides vendor-neutral governance, with Platinum members including AWS, Bloomberg, Cloudflare, Google, and Microsoft.

Official SDKs

SDKLanguageCo-maintainerGitHub StarsURL
TypeScript SDK v2TS/JSAnthropic~11.7K ⭐https://github.com/modelcontextprotocol/typescript-sdk
Python SDK (FastMCP)PythonAnthropic~21.8K ⭐https://github.com/modelcontextprotocol/python-sdk
C# SDKC#/.NETMicrosoft~3.9K ⭐https://github.com/modelcontextprotocol/csharp-sdk
Go SDKGoGooglehttps://github.com/modelcontextprotocol/go-sdk
Java SDKJavaSpring/VMwarehttps://github.com/modelcontextprotocol/java-sdk
+ Kotlin, Swift, Rust, PHP, RubyVariousCommunityhttps://github.com/modelcontextprotocol

The reference servers repository (https://github.com/modelcontextprotocol/servers) has ~79.4K ⭐ — one of the most popular repos in the AI ecosystem. MCP Inspector (https://github.com/modelcontextprotocol/inspector, ~8.8K ⭐) enables visual debugging of servers.

Adoption by large enterprises

  • OpenAI adopted MCP in March 2025. Sam Altman: “People love MCP and we are excited to add support across our products.” MCP is integrated into the Agents SDK, Responses API, ChatGPT Desktop, and Codex. OpenAI co-founded the AAIF.
  • Google/DeepMind confirmed MCP support in Gemini (Demis Hassabis, April 2025). Google Cloud launched managed MCP servers for Google Workspace, BigQuery, GCE, GKE, and Maps. Co-maintains the Go SDK.
  • Microsoft released Playwright-MCP for browser automation, integrated MCP into VS Code, Semantic Kernel, and Azure. Published “MCP for Beginners” (https://github.com/microsoft/mcp-for-beginners), an open-source 11-module curriculum.
  • Block deployed MCP to 12,000 employees via the Goose agent. Employees report 50–75% time savings. Integrated with Snowflake, Jira, Slack, Google Drive.
  • Bloomberg migrated from an internal alternative to MCP, reducing time-to-production from days to minutes.
  • Cloudflare was the first platform to offer one-click deployment for remote MCP servers. Released McpAgent for Cloudflare Workers.

IDE/client integrations

Claude Desktop, Claude Code, Cursor, VS Code, ChatGPT Desktop+Web, OpenAI Codex, Gemini, Microsoft Copilot, JetBrains IDEs (2025.2), Windsurf, Zed, and Goose support MCP natively. Notably, Cursor enforces a 40-tools-per-session limit.

Performance benchmarks

  • MCP Server Performance Benchmark — Thiago Mendes / TM Dev Lab — 2025 — https://www.tmdevlab.com/mcp-server-performance-benchmark.html — 3.9M requests in Go, Java, Node.js, Python. Go: 0.855ms latency, 1,600+ req/s (best balance). Python/FastMCP is ~93× slower in heavy configurations.
  • MCPBench — Zhiling Luo et al. / ModelScope (Alibaba) — Apr 2025 — https://github.com/modelscope/MCPBench — Evaluation framework measuring task completion accuracy, latency, and token usage with real MCP servers.
  • MCP-Atlas — Scale Research — 2025 — https://scale.com/leaderboard/mcp_atlas — Leaderboard of 500 tasks evaluating LLMs on multi-step workflows with real MCP servers in Docker. Even top models fail many tasks.
  • MCP-Bench — Accenture Research — 2025 — https://github.com/Accenture/mcp-bench — 28 MCP servers, evaluation via LLM-as-judge (o4-mini), multi-round execution with retry logic.

3. Academic papers on tool use in LLMs

MCP’s theoretical foundations draw on a research line about tool use and function calling that began in 2022–2023.

Foundational papers

  • “ReAct: Synergizing Reasoning and Acting in Language Models” — Shunyu Yao, Jeffrey Zhao, Dian Yu et al. — Oct 2022 — ICLR 2023https://arxiv.org/abs/2210.03629 — Foundational paper proposing interleaving reasoning traces and executable actions. Establishes the reasoning+acting paradigm behind modern agentic systems. +34% success rate on ALFWorld, +10% on WebShop.
  • “Toolformer: Language Models Can Teach Themselves to Use Tools” — Timo Schick, Jane Dwivedi-Yu et al. (Meta AI) — Feb 2023 — NeurIPS 2023https://arxiv.org/abs/2302.04761 — Self-supervised method teaching LMs when/how to call external APIs (calculator, search, translation). The model autonomously decides which API to invoke and with what arguments.
  • “Gorilla: Large Language Model Connected with Massive APIs” — Shishir Patil, Tianjun Zhang et al. (UC Berkeley) — May 2023 — NeurIPS 2024https://arxiv.org/abs/2305.15334 — LLaMA fine-tuned to outperform GPT-4 at writing API calls. Introduces Retriever Aware Training and APIBench (1,600+ ML APIs). Hallucination metrics based on AST.
  • “ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs” — Yujia Qin, Shihao Liang et al. (Tsinghua) — Jul 2023 — ICLR 2024 Spotlighthttps://arxiv.org/abs/2307.16789 — End-to-end framework with ToolBench (16,464 real REST APIs), DFSDT reasoning, and ToolEval. ToolLLaMA matches ChatGPT in tool use.
  • “Tool Learning with Foundation Models” — Yujia Qin, Shengding Hu et al. — Apr 2023 — ACM Computing Surveys 2024https://arxiv.org/abs/2304.08354 — Definitive survey on tool learning. Systematic taxonomy covering cognitive origins, tool selection, execution, and evaluation benchmarks.
  • “Tool Learning with Large Language Models: A Survey” — Changle Qu, Sunhao Dai et al. — May 2024 — Frontiers of Computer Science 2024https://arxiv.org/abs/2405.17935 — Survey focused on why tool learning benefits LLMs (6 aspects) and how it’s implemented (4 stages: task planning, tool selection, tool calling, response generation).

Function-calling benchmarks

  • “Berkeley Function Calling Leaderboard (BFCL)” — Shishir Patil, Fanjia Yan, Tianjun Zhang et al. (UC Berkeley) — 2024, now v4 — https://gorilla.cs.berkeley.edu/leaderboard.html — First comprehensive benchmark for function calling in LLMs using AST evaluation. 2,000+ question-function pairs in Python, Java, JS, REST. It has become the de facto standard for evaluating function calling.

4. Papers on multi-agent frameworks and inter-agent communication

Multi-agent frameworks with formal papers

  • “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation” — Qingyun Wu, Gagan Bansal et al. (Microsoft Research) — Aug 2023 — ICLR 2024https://arxiv.org/abs/2308.08155 — Open-source framework of conversable agents combining LLMs/humans/tools. Supports static and dynamic topologies (hierarchical, group chats). Effective in math, coding, Q&A, decision-making.
  • “MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework” — Sirui Hong, Mingchen Zhuge et al. (KAUST) — Aug 2023 — ICLR 2024https://arxiv.org/abs/2308.00352 — Incorporates SOPs in multi-agent collaboration. Assigns roles (Product Manager, Architect, Developer) in an assembly-line paradigm. 85.9% on HumanEval, 87.7% on MBPP, 100% task completion for software development.
  • “CAMEL: Communicative Agents for ‘Mind’ Exploration of Large Language Model Society” — Guohao Li, Hasan Abed Al Kader Hammoud et al. (KAUST) — Mar 2023 — NeurIPS 2023https://arxiv.org/abs/2303.17760 — Role-playing as a cooperation framework. Uses inception prompting to guide AI user/assistant pairs. Addresses role flipping, infinite loops, and termination.
  • CrewAI — João Moura — 2024 — https://github.com/joaomdmoura/crewAI — Open-source framework to orchestrate autonomous agents with defined roles. No formal academic paper, but widely referenced in multi-agent literature.
  • LangGraph — LangChain Inc. (Harrison Chase et al.) — Late 2023 — https://github.com/langchain-ai/langgraph — Low-level orchestration framework modeling LLM apps as graphs (inspired by Pregel/Apache Beam). Supports durable execution, human-in-the-loop, and comprehensive memory. No formal paper; used in production by Klarna, Replit, Elastic.

Papers on inter-agent communication

  • “Enhancing MCP with Context-Aware Server Collaboration” — Meenakshi Amulya Jayanti et al. — Jan 2026 — arXiv — https://arxiv.org/abs/2601.11595 — Proposes Context-Aware MCP (CA-MCP) with a Shared Context Store to improve multi-agent coordination. Specialized MCP servers read/write a shared context memory. Reduces LLM calls and failures on TravelPlanner and REALM-Bench.
  • “Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems” — Bingyu Yan, Xiaoming Zhang et al. — Feb 2025 — arXiv — https://arxiv.org/abs/2502.14321 — Comprehensive survey from a communication perspective. Two-level framework: system-level communication (architecture, protocols) and internal communication (strategies, paradigms). Covers flat, hierarchical, team, society, and hybrid architectures.
  • “A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers” — Various authors — Dec 2024 — arXiv — https://arxiv.org/abs/2412.17481 — Broad survey on LLM-MAS for task-solving and simulation. Covers multi-stage frameworks, collective decision-making, self-refine, communication optimization.
  • “The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling” — Tula Masterman, Sandi Besen et al. — Apr 2024 — arXiv — https://arxiv.org/abs/2404.11584 — Survey of emerging agent architectures combining reasoning, planning, and tool-calling. Reviews single-agent and multi-agent patterns.

Surveys on agent protocols

  • “A Survey of AI Agent Protocols” — Yingxuan Yang et al. — Apr 2025 — arXiv — https://arxiv.org/abs/2504.16736First comprehensive survey of agentic protocols. Two-dimensional classification: context-oriented vs inter-agent, general-purpose vs domain-specific. Covers MCP, A2A, ANP, ACP. Comparative analysis of security, scalability, latency.
  • “A Survey of Agent Interoperability Protocols: MCP, ACP, A2A, and ANP” — Abul Ehtesham, Aditi Singh et al. — May 2025 — arXiv — https://arxiv.org/abs/2505.02279 — Comparative survey of four protocols. Includes ProtocolBench for evaluation and ProtocolRouter for automatic protocol selection. Side-by-side comparison table.
  • “A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures” — Dezhang Kong et al. — Jun 2025 — arXiv — https://arxiv.org/abs/2506.19676 — Security-focused survey of agent communication. Layered security framework (transport, messaging, semantic interpretation). Threat and defense taxonomy.

5. Security: MCP’s biggest open threat

Security is MCP’s weakest point in its current state. Multiple real-world breaches occurred in the first months after mass adoption, and academic benchmarks show attack success rates of 50–72% even against top models.

Key security research

  • “MCP Security Notification: Tool Poisoning Attacks” — Invariant Labs (Luca Beurer-Kellner, Marc Fischer) — Apr 2025 — https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacksFirst discovery of Tool Poisoning Attacks (TPA). Malicious instructions embedded in MCP tool descriptions are invisible to the user but visible to the model. Includes “Shadowing Attacks” (malicious servers override trusted tools) and “Rug Pulls” (benign tools updated with harmful logic).
  • “MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits” — Brandon Radosevich, John Halloran — Apr 2025 — arXiv — https://arxiv.org/abs/2504.03767 — Shows leading LLMs can be coerced into using MCP tools maliciously: code execution, remote control, credential theft.
  • “MCP: Landscape, Security Threats, and Future Research Directions” — Xinhe Hou, Yinzhi Zhao et al. — Mar 2025 — arXiv — https://arxiv.org/abs/2503.23278 — Ecosystem survey covering architecture, adoption, security/privacy risks, and mitigation strategies.
  • “MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers” — Zhiqiang Wang et al. — Aug 2025 — arXiv — https://arxiv.org/abs/2508.14925 — First systematic benchmark of tool poisoning on 45 real MCP servers. o1-mini achieved a 72.8% attack success rate; more capable models are more susceptible. Claude-3.7-Sonnet had the highest rejection rate at under 3%.
  • “Systematic Analysis of MCP Security (MCPLIB)” — Y. Guo, P. Liu et al. — Aug 2025 — arXiv — https://arxiv.org/pdf/2508.12538 — Identifies 31 MCP-specific attack types. Attacks via tool return achieve significantly higher success rates than attacks via webpages or datasets.
  • “MCPShield: A Security Cognition Layer for Adaptive Trust Calibration” — arXiv — Feb 2026 — https://arxiv.org/html/2602.14281v1 — Proposes a security cognition layer inside the agent, since external defenses are insufficient against evolving inter-phase attacks.
  • “MCPSecBench: A Systematic Security Benchmark” — arXiv — Aug 2025 — https://arxiv.org/pdf/2508.13220 — Formalizes MCP attack surfaces across four primary vectors: user interaction, client, transport, and server. 17 representative attacks.
  • “Security Threat Modeling for Emerging AI-Agent Protocols” — arXiv — Feb 2026 — https://arxiv.org/html/2602.11327 — First comparative threat model across MCP, A2A, Agora, and ANP. MCP is the most mature but exposes the largest attack surface due to adoption.
  • “Model Context Protocol has prompt injection security problems” — Simon Willison — Apr 9, 2025 — https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/ — Seminal post on prompt injection in MCP: rug pulls, cross-server interception, confused deputy attacks.
  • “New Prompt Injection Attack Vectors Through MCP Sampling” — Palo Alto Networks Unit 42 — 2025 — https://unit42.paloaltonetworks.com/model-context-protocol-attack-vectors/ — Identifies resource theft (draining compute quotas), conversation hijacking, and covert tool invocation as critical vectors.
  • “MCP Guardian” — Kumar et al. — 2025 — arXiv — https://arxiv.org/pdf/2504.12757 — Middleware for authentication, rate limiting, WAF scanning, and logging for MCP without disrupting workflows.
  • OWASP MCP Top 10 — OWASP Foundation — 2025 — https://owasp.org/www-project-mcp-top-10/ — Ranks Tool Poisoning, schema poisoning, and lack of RBAC as top MCP vulnerabilities.

Documented real-world breaches

A consolidated timeline (https://authzed.com/blog/timeline-mcp-breaches) documents incidents including: WhatsApp history exfiltration via tool poisoning (Apr 2025), private GitHub repo data heist via prompt injection (May 2025), CVE-2025-49596 (RCE in MCP Inspector), CVE-2025-6514 (command injection in mcp-remote, 437K+ downloads affected), Smithery breach compromising 3,000+ apps (Oct 2025), and a sandbox escape in Anthropic’s official filesystem server.


6. Known limitations and critiques

Beyond security, MCP faces structural limitations that builders should anticipate.

Context window as the bottleneck. MCP does not solve the fundamental context window problem. Servers with 30+ tools cause hallucinations and timeouts; “selector agents” are needed to constrain candidates to ~4 functions. Anthropic proposed code execution as a mitigation, achieving a 98.7% token reduction. After ~30 documents, agents often return partial results.

Enterprise readiness lags. The OAuth spec went through multiple painful revisions. Christian Posta (Solo.io) argues the authorization spec is “a non-starter for enterprise”: anonymous DCR conflicts with monitoring/auditing, there is no clear pattern for delegated authorization, and most “MCP servers” are desktop plugins, not enterprise-grade services. Critical missing features include cost attribution, distributed tracing, standard rate limiting, and audit trails compatible with GDPR/SOX.

MCP does not replace RAG. The protocol is a communication pipe, not an intelligent retrieval system. Smart retrieval works alongside MCP; it is not replaced by it. The quality of published tools is uneven: many servers were rushed out with vague or incomplete descriptions.

Relevant resources on limitations:


7. MCP versus alternatives: technical comparisons

MCP vs OpenAI Function Calling

Function calling is tied to specific platforms (OpenAI uses parameters, Anthropic uses input_schema). MCP is model-agnostic: the same server works with Claude, GPT, and Gemini without rewriting integrations. MCP’s real innovation is at the transport layer (stdio, Streamable HTTP), enabling cross-application reuse.

MCP vs Google A2A (Agent-to-Agent)

They are complementary, not competitors. MCP solves agent-to-tool (structured invocation); A2A solves agent-to-agent (coordination, multi-turn conversation). MCP is white-box (access to internal files/tools/resources); A2A is gray-box (it doesn’t share reasoning but defines task states). A2A uses Task as a core concept (more abstract than Tools/Resources). Both are under the Linux Foundation. IBM’s ACP merged with A2A in August 2025 — a strong signal of consolidation.

MCP vs LangChain/CrewAI

LangChain creates a developer-facing standard (its Tool class); MCP creates a model-facing standard (runtime discovery). They are complementary: MCP for tool integration, LangChain for orchestration, CrewAI for multi-agent coordination. LangChain provides an MCP adapter so they can work together. Most production systems combine all three.


8. The emerging protocol stack and the future

The industry is converging on a layered protocol architecture:

LayerProtocolFunction
Identity & securityW3C DID, OAuth 2.1Authentication, tokens
Tool/Context accessMCPAgent ↔ tools/data
Agent-to-AgentA2AAgent ↔ agent
Payments/CommerceAP2Autonomous transactions
Capabilities/SkillsAgent Skills, AGENTS.mdProcedural knowledge
User interactionAG-UIAgent ↔ human frontend

Complementary protocols

  • A2A (Google) — Apr 2025 — https://a2a-protocol.org — Agent-to-agent protocol. v0.3 with gRPC support. 150+ organizations. Under the Linux Foundation since June 2025.
  • Agent Skills (Anthropic) — Oct 2025 — https://agentskills.io — Portable folders with SKILL.md, instructions, scripts. Adopted by VS Code, GitHub, Cursor, Goose. v0.9 published; v1.0 expected H2 2026.
  • AGENTS.md (OpenAI) — Aug 2025 — Markdown convention to provide project-specific guidance to coding agents. Adopted by 60,000+ open-source projects. Donated to AAIF.
  • ANP (Agent Network Protocol) — 2024–2025 — https://github.com/agent-network-protocol/AgentNetworkProtocol — Three-layer architecture for decentralized agent networks using W3C DID. “Agentic web” Web3 vision.

MCP roadmap

The MCP blog documents future priorities (https://modelcontextprotocol.io/development/roadmap): MCP Registry GA, discovery via .well-known URLs (aligned with A2A Agent Cards), compliance test suites, and domain-specific profiles. Transport evolution (https://blog.modelcontextprotocol.io/posts/2025-12-19-mcp-transport-future/) points toward making the protocol stateless (replacing the initialize handshake with per-request metadata, moving sessions from transport to the data model). SEPs completed in Q1 2026 for a spec around ~June 2026.

Signals of convergence

Gartner reports a 1,445% increase in queries about multi-agent systems from Q1 2024 to Q2 2025. It projects that 40% of enterprise applications will integrate AI agents by the end of 2026. The ACP→A2A merger and the creation of the AAIF with all major players confirm the industry is consolidating, not fragmenting.


Conclusion: what every builder should know

MCP has won the first battle for standardizing the agent–tool interface. Its adoption is real, massive, and accelerating. However, three realities define the current state for those building on the protocol.

  1. Security is not a future problem but a present one: tool poisoning, prompt injection, and supply-chain attacks have already caused real breaches, and defenses remain insufficient.
  2. MCP is not a complete framework but one piece of a broader stack that includes A2A for inter-agent coordination, Agent Skills for procedural knowledge, and identity/observability layers that are still maturing.
  3. Context window limitations are fundamental — not a protocol issue, but a limitation of the LLMs consuming it — and patterns like Anthropic’s code execution are necessary mitigations, not optional.

The winning architecture for 2026 combines MCP for tool integration, A2A for agent-to-agent coordination, and frameworks like LangGraph/AutoGen for orchestration — all under AAIF governance. The competitive differentiator won’t be who adopts MCP (everyone will), but who best solves security, observability, and context efficiency in production.

© 2026 dontfail.is. Infrastructure: Agentic Protocols | Synthesis: MCP Spec 2025-11-25 | Layer: dontfail!