Building Production-Grade AI Agents with MCP: A Complete Guide for 2026
Source: nebula gg on dev.to
EN: ## Why MCP Beats Custom Integrations ZH: ## Why MCP Beats Custom Integrations
EN:
EN: Before MCP, connecting an AI agent to three external systems meant three custom integrations. Each with its own auth flow, error handling, retry logic, and data parser. When one API changed its response format, your agent broke silently. ZH: Before MCP, connecting an AI agent to three external systems meant three custom integrations. Each with its own auth flow, error handling, retry logic, and data parser. When one API changed its response format, your agent broke silently.
EN:
EN: MCP standardizes this at the protocol level. Every MCP server speaks JSON-RPC 2.0. Every tool declares its input and output schema. The agent discovers capabilities at runtime — no hardcoded endpoint lists, no stale documentation. Add a new tool to your server, and every connected agent can use it immediately. ZH: MCP standardizes this at the protocol level. Every MCP server speaks JSON-RPC 2.0. Every tool declares its input and output schema. The agent discovers capabilities at runtime — no hardcoded endpoint lists, no stale documentation. Add a new tool to your server, and every connected agent can use it immediately.
EN:
EN: The three primitives cover every integration pattern: ZH: The three primitives cover every integration pattern:
EN:
EN: Tools — executable functions the agent invokes (API calls, database queries, file writes) ZH: Tools — executable functions the agent invokes (API calls, database queries, file writes)
EN:
EN: Resources — read-only context the agent consumes (configs, logs, documentation) ZH: Resources — read-only context the agent consumes (configs, logs, documentation)
EN:
EN: Prompts — reusable workflow templates that structure interactions ZH: Prompts — reusable workflow templates that structure interactions
EN:
EN: The distinction between tools and resources matters more than most guides acknowledge. Tools change state — they're your agent's verbs. Resources provide context — they're your agent's nouns. When an agent conflates the two, you get agents that try to "read" a database by calling write functions, or worse, agents that mutate state thinking they're just gathering information. ZH: The distinction between tools and resources matters more than most guides acknowledge. Tools change state — they're your agent's verbs. Resources provide context — they're your agent's nouns. When an agent conflates the two, you get agents that try to "read" a database by calling write functions, or worse, agents that mutate state thinking they're just gathering information.
EN:
EN: ## Choosing Your Transport: The stdio vs Streamable HTTP Decision ZH: ## Choosing Your Transport: The stdio vs Streamable HTTP Decision
EN:
EN: MCP defines two transport mechanisms. Picking the wrong one causes architecture headaches later. ZH: MCP defines two transport mechanisms. Picking the wrong one causes architecture headaches later.
EN:
EN: stdio runs locally. Your client spawns the MCP server as a child process and talks through stdin/stdout. Zero infrastructure, zero network config, perfect isolation. Use stdio for CLI tools, desktop applications, and local dev workflows. ZH: stdio runs locally. Your client spawns the MCP server as a child process and talks through stdin/stdout. Zero infrastructure, zero network config, perfect isolation. Use stdio for CLI tools, desktop applications, and local dev workflows.
EN:
EN: Streamable HTTP is for everything else. The client sends JSON-RPC messages via HTTP POST, and the server responds with either an SSE stream or a JSON object. It works through firewalls, load balancers, and CDNs. Multi-client deployments — your agent server serving hundreds of concurrent agent sessions — require Streamable HTTP. ZH: Streamable HTTP is for everything else. The client sends JSON-RPC messages via HTTP POST, and the server responds with either an SSE stream or a JSON object. It works through firewalls, load balancers, and CDNs. Multi-client deployments — your agent server serving hundreds of concurrent agent sessions — require Streamable HTTP.
EN:
EN: The old SSE transport was deprecated in the June 2025 spec revision. If you're starting a new project, use Streamable HTTP. Full stop. ZH: The old SSE transport was deprecated in the June 2025 spec revision. If you're starting a new project, use Streamable HTTP. Full stop.
EN:
EN: ## Building a Production MCP Server in Python ZH: ## Building a Production MCP Server in Python
EN:
EN: The Python SDK's FastMCP pattern is the fastest path from zero to working server. But toy examples skip the patterns that matter in production. ZH: The Python SDK's FastMCP pattern is the fastest path from zero to working server. But toy examples skip the patterns that matter in production.
EN:
EN: Here's a server template that handles real-world requirements: ZH: Here's a server template that handles real-world requirements:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN: Run it with uv run weather_server.py and any MCP client can connect. The server exposes get_forecast as a tool and weather://current/{city} as a dynamic resource. ZH: Run it with uv run weather_server.py and any MCP client can connect. The server exposes get_forecast as a tool and weather://current/{city} as a dynamic resource.
EN:
EN:
EN:
EN: What Makes This Different from Tutorial Examples ZH: What Makes This Different from Tutorial Examples
EN:
EN: Three patterns that separate production servers from demos: ZH: Three patterns that separate production servers from demos:
EN:
EN: 1. Validation at startup. If WEATHER_API_KEY isn't set, the server crashes before accepting any connection. Silent missing-config failures during deployment are harder to debug than explicit startup errors. ZH: 1. Validation at startup. If WEATHER_API_KEY isn't set, the server crashes before accepting any connection. Silent missing-config failures during deployment are harder to debug than explicit startup errors.
EN:
EN: 2. Input validation in tools. The days and units parameters are validated before any external call. Bad input gets a structured ValueError — the MCP framework translates it to a proper JSON-RPC error. Your agent gets a clear error message instead of a 500 from the upstream API. ZH: 2. Input validation in tools. The days and units parameters are validated before any external call. Bad input gets a structured ValueError — the MCP framework translates it to a proper JSON-RPC error. Your agent gets a clear error message instead of a 500 from the upstream API.
EN:
EN: 3. Timeouts on every external call. The timeout=10.0 on httpx.get prevents a slow upstream API from hanging your MCP server. In production, you'd add retry logic with exponential backoff, but the timeout is the minimum safety net. ZH: 3. Timeouts on every external call. The timeout=10.0 on httpx.get prevents a slow upstream API from hanging your MCP server. In production, you'd add retry logic with exponential backoff, but the timeout is the minimum safety net.
EN:
EN: ## Advanced Tool Design: What Good MCP Servers Get Right ZH: ## Advanced Tool Design: What Good MCP Servers Get Right
EN:
EN: The most common failure mode in MCP deployments isn't the server code — it's tool descriptions. Agents rely entirely on tool names and descriptions to decide which tool to call. Vague descriptions produce wrong tool selections, and wrong tool selections produce wasted tokens and confused users. ZH: The most common failure mode in MCP deployments isn't the server code — it's tool descriptions. Agents rely entirely on tool names and descriptions to decide which tool to call. Vague descriptions produce wrong tool selections, and wrong tool selections produce wasted tokens and confused users.
EN:
EN:
EN:
EN: The Description Formula ZH: The Description Formula
EN:
EN: Every tool description should answer three questions: ZH: Every tool description should answer three questions:
EN:
EN: When should the agent call this tool? ZH: When should the agent call this tool?
EN:
EN: What data does it need as input? ZH: What data does it need as input?
EN:
EN: What structure will it get back? ZH: What structure will it get back?
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN:
EN:
EN: Multi-Tool Orchestration ZH: Multi-Tool Orchestration
EN:
EN: Production agents rarely call just one tool. The sequence matters, and your server should make correct sequences obvious through tool design. ZH: Production agents rarely call just one tool. The sequence matters, and your server should make correct sequences obvious through tool design.
EN:
EN: Consider a deployment pipeline where the agent needs to: check the current state, build, then deploy. If these are three separate tools with no hints, the agent might deploy without checking state first. ZH: Consider a deployment pipeline where the agent needs to: check the current state, build, then deploy. If these are three separate tools with no hints, the agent might deploy without checking state first.
EN:
EN: The solution is a composite tool that encapsulates the workflow: ZH: The solution is a composite tool that encapsulates the workflow:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN: The agent calls one tool. The server orchestrates four steps. The agent gets a single structured result. This pattern reduces token usage, prevents partial-state errors, and makes the agent's behavior predictable. ZH: The agent calls one tool. The server orchestrates four steps. The agent gets a single structured result. This pattern reduces token usage, prevents partial-state errors, and makes the agent's behavior predictable.
EN:
EN: ## Authentication: OAuth 2.1 for MCP Servers ZH: ## Authentication: OAuth 2.1 for MCP Servers
EN:
EN: Every MCP server that accesses protected resources needs authentication. The MCP spec supports OAuth 2.1 for remote servers, and this is the pattern production deployments should use. ZH: Every MCP server that accesses protected resources needs authentication. The MCP spec supports OAuth 2.1 for remote servers, and this is the pattern production deployments should use.
EN:
EN: The flow works like this: ZH: The flow works like this:
EN:
EN: Client connects to your MCP server via Streamable HTTP ZH: Client connects to your MCP server via Streamable HTTP
EN:
EN: Server responds with 401 Unauthorized and an WWW-Authenticate header pointing to the authorization endpoint ZH: Server responds with 401 Unauthorized and an WWW-Authenticate header pointing to the authorization endpoint
EN:
EN: Client redirects the user through the OAuth consent flow ZH: Client redirects the user through the OAuth consent flow
EN:
EN: User grants permission, returns with an authorization code ZH: User grants permission, returns with an authorization code
EN:
EN: Client exchanges the code for an access token ZH: Client exchanges the code for an access token
EN:
EN: All subsequent requests include Authorization: Bearer <token> ZH: All subsequent requests include Authorization: Bearer <token>
EN:
EN: Implementing this correctly means: ZH: Implementing this correctly means:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN: The key insight: token validation happens at the transport layer, not inside individual tools. Every tool call arrives already authenticated. Your tool code can assume the identity of the caller is known. ZH: The key insight: token validation happens at the transport layer, not inside individual tools. Every tool call arrives already authenticated. Your tool code can assume the identity of the caller is known.
EN:
EN: ## Deployment Patterns ZH: ## Deployment Patterns
EN:
EN: Single-Server Architecture ZH: Single-Server Architecture
EN:
EN: For small teams and single-tenant deployments, a single MCP server process works fine: ZH: For small teams and single-tenant deployments, a single MCP server process works fine:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN:
EN:
EN: Multi-Server Orchestration ZH: Multi-Server Orchestration
EN:
EN: Production systems with multiple MCP servers need orchestration. This is where platforms like Nebula come in — you can spin up a unified agent workspace that connects to multiple MCP servers, manages their lifecycle, and provides a single interface for the agent to discover and call tools across all of them. ZH: Production systems with multiple MCP servers need orchestration. This is where platforms like Nebula come in — you can spin up a unified agent workspace that connects to multiple MCP servers, manages their lifecycle, and provides a single interface for the agent to discover and call tools across all of them.
EN:
EN: The architecture looks like: ZH: The architecture looks like:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN: Each MCP client connection is isolated. If the database server goes down, the GitHub server continues working. The agent degrades gracefully instead of crashing entirely. ZH: Each MCP client connection is isolated. If the database server goes down, the GitHub server continues working. The agent degrades gracefully instead of crashing entirely.
EN:
EN:
EN:
EN: Scaling Streamable HTTP Servers ZH: Scaling Streamable HTTP Servers
EN:
EN: Streamable HTTP servers can handle thousands of concurrent connections, but you need proper session management: ZH: Streamable HTTP servers can handle thousands of concurrent connections, but you need proper session management:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode
EN:
EN:
EN: For high-traffic deployments, put your Streamable HTTP servers behind a load balancer with sticky sessions. MCP sessions maintain state — a given agent should always connect to the same server instance unless you're storing session state in Redis. ZH: For high-traffic deployments, put your Streamable HTTP servers behind a load balancer with sticky sessions. MCP sessions maintain state — a given agent should always connect to the same server instance unless you're storing session state in Redis.
EN:
EN: ## Common Pitfalls and How to Avoid Them ZH: ## Common Pitfalls and How to Avoid Them
EN:
EN: Too Many Tools ZH: Too Many Tools
EN:
EN: I've seen MCP servers with 47 registered tools. Agents get confused. They call the wrong tool, or they can't find the right one among 47 options. ZH: I've seen MCP servers with 47 registered tools. Agents get confused. They call the wrong tool, or they can't find the right one among 47 options.
EN:
EN: Rule of thumb: a single MCP server should expose 3-10 tools. If you need more, split into multiple servers. One server for GitHub operations. Another for database queries. A third for monitoring and alerting. ZH: Rule of thumb: a single MCP server should expose 3-10 tools. If you need more, split into multiple servers. One server for GitHub operations. Another for database queries. A third for monitoring and alerting.
EN:
EN:
EN:
EN: Missing Error Contracts ZH: Missing Error Contracts
EN:
EN: Every tool should return structured errors, not raw exceptions: ZH: Every tool should return structured errors, not raw exceptions:
EN:
EN: [CODE BLOCK] ZH: [CODE BLOCK]
EN:
EN: Enter fullscreen mode ZH: Enter fullscreen mode
EN:
EN:
EN: Exit fullscreen mode ZH: Exit fullscreen mode