The Million-Agent Vision: Why Discovery is the Critical Infrastructure Gap
A million AI agents collaborating, discovering each other, composing capabilities. We’re nowhere near. Today’s agents are integrated by hand, one at a time, on whatever protocol the vendor picked that week.
This is a discovery problem. We solved it for websites with DNS, for microservices with service meshes. For agents, we haven’t. Whoever does owns the next layer: an Agent Registration System, an Agent Naming Service, an Agent Gateway.
The interesting threshold sits at 10,000 agents. Below it, networks behave linearly and you can muscle through with manual integration. Above it, they self-organise. GPT Store crossed that line in January 2024 and growth went exponential — same platform, different physics.
Network effects, but doubled
Metcalfe says network value scales with n². Agents bring two multipliers, not one. Each agent carries learned behaviour and specialised knowledge. Each agent uses tools. 100 agents and 50 tools is 5,000 capability combinations, not 150 features.
That’s why agent networks don’t scale smoothly. There are phases:
1–1,000 agents. Hub-and-spoke works. Manual config is tedious but tractable. Looks like an early corporate intranet.
1,000–10,000 agents. Coordination cost explodes. The hub buckles. Agents can’t find the right partners. Most agent networks die here — not because the tech is wrong, but because the infrastructure doesn’t exist.
10,000–100,000 agents. Self-organisation kicks in. Agents discover specialists without a central planner. Capabilities emerge that nobody designed. Ecologists call this the Allee threshold — populations that thrive once they cross critical mass.
100,000+ agents. Agents start hiring agents. New economic models. The network becomes a substrate for things we can’t predict from here.

collaborative agents
What’s actually missing
We’re trying to build million-agent networks on stone-age infrastructure. Three pieces are missing.
Agent Registration. No central place where agents announce themselves. Every platform has its own registry, if any, with its own format and no interop. We need a curated repository, multi-level access control (public, org, team), digital signing for trust, and a standard way to describe capabilities. Without it, agents are invisible to each other. A phone with no phone book.
Agent Naming Service (ANS). DNS gave us google.com instead of an IP. Agents need more than name lookup — they need semantic, capability-based discovery. “Find agents that process financial PDFs.” “Which agents talk to Salesforce with 99.9% uptime?” “Show me agents specialised in German tax law with English interfaces.” Vector embeddings and LLM-powered search, not keyword match. Intent resolution, not lookup.
Agent Gateway. Even with registration and naming, agents need a reliable way to connect. Name resolution. Authn/authz, rate limiting. Retries, circuit breakers, fallbacks. Tracing and debugging. The lesson from microservices: don’t expose services directly, put a smart proxy in front. Same applies here.
These three pieces are the floor, not optional features.
Lessons we already paid for
DNS and the web. The internet started with IPs. DNS made it human-usable. Agents need more than name-to-address: capability search, dynamic updates as agents evolve, semantic understanding, trust signals.
Service mesh. Netflix and Google ran into the same wall with microservices. Istio, Envoy — automatic discovery, load balancing, mTLS, observability. The patterns translate to agents almost directly. Call it an Agent Mesh and let developers focus on capabilities instead of plumbing.
Hybrid architecture. 99% of operations are high-frequency, low-latency, and want millisecond response. Boring, proven infra: API gateways, queues, distributed databases. The other 1% — identity, trust, economic settlement — wants consensus and verification. That’s where a crypto layer earns its keep. Not blockchain everywhere. Blockchain where it does work that nothing else can.
Early implementations exist. FastAPI-based registries demonstrate the basics. The Model Context Protocol (MCP) is shaping up as the USB-C of AI connectivity. Projects like B4rega2a show what semantic discovery looks like in practice. None of these are production yet. The building blocks are there, waiting to be assembled.
What to build, in order
Months 1–3 — foundation. Stand up a basic Agent Registry. REST + JSON metadata is fine. Add health checks and an approval workflow. Drop in an Agent Gateway on Envoy. Get 10 agents talking to each other reliably.
Months 4–6 — intelligence. Bolt semantic search onto the registry with a vector store. Speak MCP. Federate across registries so nobody’s siloed. Start measuring reliability and constraint adherence.
Months 7–12 — scale. Push toward 10,000 agents. ML-assisted discovery. Automated test/improvement pipelines. Enterprise features: compliance, governance.
The standards will converge
MCP for connections, A2A for authentication, ACP for capabilities, ANP for naming. They won’t all win. They’ll converge into something that looks like an Agent Protocol Suite, the way HTTP, HTML, and CSS converged for the web. Enterprise adoption will force it. When Fortune 500s deploy thousands of agents, they’ll demand standards, and the protocols with the best security/compliance/observability story will eat the rest.
Design for emergence
The mindset shift: traditional software assumes you know the use cases. Agent networks don’t work that way. Their value comes from combinations nobody planned.
That means:
- platforms where agents build tools for other agents
- agents forming and dissolving teams based on the task
- economic primitives that let agents hire each other
- infrastructure that supports behaviour you haven’t imagined yet
At a million agents, specialisation goes places we can’t see from here. Teams form and dissolve. Economic systems emerge. None of it happens without the discovery layer underneath.
The technical problems are real and solvable. The patterns are sitting in DNS, service meshes, and distributed systems, waiting to be ported. Early prototypes prove the concepts. What’s missing is the will to build it.
The infrastructure gap won’t fill itself.
References and Further Reading
Core Concepts and Vision
- Agentic Networks: The Future of Human-AI Collaboration - Slavak Kurilyak’s foundational piece on agent network effects and the trillion-agent vision
- Agent Discovery, Naming and Resolution - The Missing Pieces to A2A - Solo.io’s analysis of infrastructure gaps in agent-to-agent communication
Technical Implementation Resources
- Building an AI Agent Registry Server with FastAPI - Practical implementation guide for agent registries
- B4rega2a Project - Open source implementation of an agent registry
- Model Context Protocol - Universal standard for AI application connectivity
- Understanding Sessions in Agent-to-Agent Communication - Deep dive into context and state management
Reliability and Team Coordination
- Engineering Reliable Agents - Comprehensive guide to building verifiable, trustworthy agents
- Agno Framework Documentation - Teams - Dynamic team coordination patterns
- Agno Framework Documentation - Workflows - Building production agent workflows
Related Technologies and Standards
- Service Mesh Patterns - Istio and Envoy documentation for microservices discovery patterns
- OpenTelemetry - Observability standards applicable to agent networks
- OAuth 2.0/3.0 Specifications - Security patterns for agent authentication
Industry Examples and Case Studies
- GPT Store - OpenAI’s marketplace demonstrating the 10,000 agent threshold in practice
- Enterprise Agent Deployments - Various case studies from early adopters (specific examples under NDA)
Academic and Theoretical Foundations
- Metcalfe’s Law - Network value proportional to n²
- Allee Effect - Ecological concept of critical population thresholds
- Network Effects in Digital Platforms - Economic theory applied to agent networks
Community and Open Source
- A2A Protocol Specification - Agent-to-agent communication standards
- ACP (Agent Communication Protocol) - Emerging standard for agent capabilities
- ANP (Agent Naming Protocol) - Proposed naming conventions for agent networks
Tools and Frameworks
- FastAPI - High-performance Python framework for building APIs
- Consul/etcd - Distributed consensus and service discovery
- Weaviate/Pinecone - Vector databases for semantic search
- Envoy Proxy - High-performance service proxy