The Million-Agent Vision: Why Discovery is the Critical Infrastructure Gap

Christoph Görn included in AI Software Development Innovation

2025-09-29 1211 words 6 minutes

Contents

A million AI agents collaborating, discovering each other, composing capabilities. We’re nowhere near. Today’s agents are integrated by hand, one at a time, on whatever protocol the vendor picked that week.

This is a discovery problem. We solved it for websites with DNS, for microservices with service meshes. For agents, we haven’t. Whoever does owns the next layer: an Agent Registration System, an Agent Naming Service, an Agent Gateway.

The interesting threshold sits at 10,000 agents. Below it, networks behave linearly and you can muscle through with manual integration. Above it, they self-organise. GPT Store crossed that line in January 2024 and growth went exponential — same platform, different physics.

Network effects, but doubled

Metcalfe says network value scales with n². Agents bring two multipliers, not one. Each agent carries learned behaviour and specialised knowledge. Each agent uses tools. 100 agents and 50 tools is 5,000 capability combinations, not 150 features.

That’s why agent networks don’t scale smoothly. There are phases:

1–1,000 agents. Hub-and-spoke works. Manual config is tedious but tractable. Looks like an early corporate intranet.

1,000–10,000 agents. Coordination cost explodes. The hub buckles. Agents can’t find the right partners. Most agent networks die here — not because the tech is wrong, but because the infrastructure doesn’t exist.

10,000–100,000 agents. Self-organisation kicks in. Agents discover specialists without a central planner. Capabilities emerge that nobody designed. Ecologists call this the Allee threshold — populations that thrive once they cross critical mass.

100,000+ agents. Agents start hiring agents. New economic models. The network becomes a substrate for things we can’t predict from here.

What’s actually missing

We’re trying to build million-agent networks on stone-age infrastructure. Three pieces are missing.

Agent Registration. No central place where agents announce themselves. Every platform has its own registry, if any, with its own format and no interop. We need a curated repository, multi-level access control (public, org, team), digital signing for trust, and a standard way to describe capabilities. Without it, agents are invisible to each other. A phone with no phone book.

Agent Naming Service (ANS). DNS gave us google.com instead of an IP. Agents need more than name lookup — they need semantic, capability-based discovery. “Find agents that process financial PDFs.” “Which agents talk to Salesforce with 99.9% uptime?” “Show me agents specialised in German tax law with English interfaces.” Vector embeddings and LLM-powered search, not keyword match. Intent resolution, not lookup.

Agent Gateway. Even with registration and naming, agents need a reliable way to connect. Name resolution. Authn/authz, rate limiting. Retries, circuit breakers, fallbacks. Tracing and debugging. The lesson from microservices: don’t expose services directly, put a smart proxy in front. Same applies here.

These three pieces are the floor, not optional features.

Lessons we already paid for

DNS and the web. The internet started with IPs. DNS made it human-usable. Agents need more than name-to-address: capability search, dynamic updates as agents evolve, semantic understanding, trust signals.

Service mesh. Netflix and Google ran into the same wall with microservices. Istio, Envoy — automatic discovery, load balancing, mTLS, observability. The patterns translate to agents almost directly. Call it an Agent Mesh and let developers focus on capabilities instead of plumbing.

Hybrid architecture. 99% of operations are high-frequency, low-latency, and want millisecond response. Boring, proven infra: API gateways, queues, distributed databases. The other 1% — identity, trust, economic settlement — wants consensus and verification. That’s where a crypto layer earns its keep. Not blockchain everywhere. Blockchain where it does work that nothing else can.

Early implementations exist. FastAPI-based registries demonstrate the basics. The Model Context Protocol (MCP) is shaping up as the USB-C of AI connectivity. Projects like B4rega2a show what semantic discovery looks like in practice. None of these are production yet. The building blocks are there, waiting to be assembled.

What to build, in order

Months 1–3 — foundation. Stand up a basic Agent Registry. REST + JSON metadata is fine. Add health checks and an approval workflow. Drop in an Agent Gateway on Envoy. Get 10 agents talking to each other reliably.

Months 4–6 — intelligence. Bolt semantic search onto the registry with a vector store. Speak MCP. Federate across registries so nobody’s siloed. Start measuring reliability and constraint adherence.

Months 7–12 — scale. Push toward 10,000 agents. ML-assisted discovery. Automated test/improvement pipelines. Enterprise features: compliance, governance.

The standards will converge

MCP for connections, A2A for authentication, ACP for capabilities, ANP for naming. They won’t all win. They’ll converge into something that looks like an Agent Protocol Suite, the way HTTP, HTML, and CSS converged for the web. Enterprise adoption will force it. When Fortune 500s deploy thousands of agents, they’ll demand standards, and the protocols with the best security/compliance/observability story will eat the rest.

Design for emergence

The mindset shift: traditional software assumes you know the use cases. Agent networks don’t work that way. Their value comes from combinations nobody planned.

That means:

platforms where agents build tools for other agents
agents forming and dissolving teams based on the task
economic primitives that let agents hire each other
infrastructure that supports behaviour you haven’t imagined yet

At a million agents, specialisation goes places we can’t see from here. Teams form and dissolve. Economic systems emerge. None of it happens without the discovery layer underneath.

The technical problems are real and solvable. The patterns are sitting in DNS, service meshes, and distributed systems, waiting to be ported. Early prototypes prove the concepts. What’s missing is the will to build it.

The infrastructure gap won’t fill itself.

References and Further Reading

Core Concepts and Vision

Agentic Networks: The Future of Human-AI Collaboration - Slavak Kurilyak’s foundational piece on agent network effects and the trillion-agent vision
Agent Discovery, Naming and Resolution - The Missing Pieces to A2A - Solo.io’s analysis of infrastructure gaps in agent-to-agent communication

Technical Implementation Resources

Building an AI Agent Registry Server with FastAPI - Practical implementation guide for agent registries
B4rega2a Project - Open source implementation of an agent registry
Model Context Protocol - Universal standard for AI application connectivity
Understanding Sessions in Agent-to-Agent Communication - Deep dive into context and state management

Reliability and Team Coordination

Engineering Reliable Agents - Comprehensive guide to building verifiable, trustworthy agents
Agno Framework Documentation - Teams - Dynamic team coordination patterns
Agno Framework Documentation - Workflows - Building production agent workflows

Service Mesh Patterns - Istio and Envoy documentation for microservices discovery patterns
OpenTelemetry - Observability standards applicable to agent networks
OAuth 2.0/3.0 Specifications - Security patterns for agent authentication

Industry Examples and Case Studies

GPT Store - OpenAI’s marketplace demonstrating the 10,000 agent threshold in practice
Enterprise Agent Deployments - Various case studies from early adopters (specific examples under NDA)

Academic and Theoretical Foundations

Metcalfe’s Law - Network value proportional to n²
Allee Effect - Ecological concept of critical population thresholds
Network Effects in Digital Platforms - Economic theory applied to agent networks

Community and Open Source

A2A Protocol Specification - Agent-to-agent communication standards
ACP (Agent Communication Protocol) - Emerging standard for agent capabilities
ANP (Agent Naming Protocol) - Proposed naming conventions for agent networks

Tools and Frameworks

FastAPI - High-performance Python framework for building APIs
Consul/etcd - Distributed consensus and service discovery
Weaviate/Pinecone - Vector databases for semantic search
Envoy Proxy - High-performance service proxy