跳转到内容

MCP Builder:构建高质量 MCP 服务器

mcp-builder 指导开发者创建高质量的 MCP(Model Context Protocol)服务器,让 Claude 能够发现并使用外部工具和服务。

  • 📡 MCP 服务器开发最佳实践(Python + Node.js)
  • 🔌 连接管理和工具注册模式
  • ✅ 内置评估框架,验证 MCP 服务器质量
  • 📚 参考文档涵盖协议规范和实现模式

mcp-builder guides developers in creating high-quality MCP (Model Context Protocol) servers, enabling Claude to discover and use external tools and services.

  • 📡 MCP server development best practices (Python + Node.js)
  • 🔌 Connection management and tool registration patterns
  • ✅ Built-in evaluation framework for MCP server quality validation
  • 📚 Reference docs covering protocol spec and implementation patterns

mcp-builder 是一个脚本驱动型 Skill,围绕 2 个核心 Python 脚本 + 4 个参考文档组织。

mcp-builder is a script-driven Skill, organized around 2 core Python scripts + 4 reference documents.

约 200 行的 SKILL.md 采用清晰的四阶段工作流结构:

  1. Phase 1:Deep Research and Planning(第 20-75 行)—— API Coverage vs Workflow Tools 设计、协议文档研究、框架学习、实现规划
  2. Phase 2:Implementation(第 78-124 行)—— 项目结构设置、核心基础设施、工具实现(输入/输出 Schema、描述、注解)
  3. Phase 3:Review and Test(第 127-148 行)—— 代码质量检查、构建验证、MCP Inspector 测试
  4. Phase 4:Create Evaluations(第 151-192 行)—— 创建 10 个评估问题、XML 输出格式

语言推荐为 TypeScript(理由:SDK 支持好、AI 生成 TypeScript 代码质量高、静态类型检查),传输层推荐 Streamable HTTP

connections.py 提供底层 MCP 传输层(stdio/SSE/HTTP),evaluation.py 在其之上构建评估框架:

~200 line SKILL.md using a clear four-phase workflow structure:

  1. Phase 1: Deep Research and Planning (lines 20-75) — API Coverage vs Workflow Tools design, protocol research, framework study, implementation planning
  2. Phase 2: Implementation (lines 78-124) — Project setup, core infrastructure, tool implementation (input/output schemas, descriptions, annotations)
  3. Phase 3: Review and Test (lines 127-148) — Code quality checks, build verification, MCP Inspector testing
  4. Phase 4: Create Evaluations (lines 151-192) — Create 10 evaluation questions, XML output format

Language recommendation is TypeScript (rationale: better SDK support, AI generates higher quality TypeScript, static typing), and transport recommendation is Streamable HTTP.

connections.py provides the low-level MCP transport layer (stdio/SSE/HTTP), and evaluation.py builds the evaluation framework on top of it:

mcp-builder 模块关系图

graph TD
  SKILL[SKILL.md] -->|指导| Claude
  Claude -->|调用| eval[evaluation.py]
  eval -->|import| conn[connections.py]
  conn -->|MCP Transport| Stdio[stdio]
  conn -->|MCP Transport| SSE[sse]
  conn -->|MCP Transport| HTTP[streamable HTTP]
  eval -->|工具发现| MCP_Server[MCP Server]
  eval -->|运行评估| Claude_API[Anthropic Claude]
  eval -->|输出| Report[Markdown Report]

  subgraph scripts [scripts/]
      conn
      eval
  end

  subgraph reference [reference/]
      BP[mcp_best_practices.md]
      PY[python_mcp_server.md]
      TS[node_mcp_server.md]
      EG[evaluation.md]
  end

  SKILL --> reference

  style SKILL fill:#4fc3f7,stroke:#0288d1,color:#000
  style conn fill:#81c784,stroke:#388e3c,color:#000
  style eval fill:#81c784,stroke:#388e3c,color:#000
  style Claude_API fill:#ffb74d,stroke:#f57c00,color:#000
  style Report fill:#ce93d8,stroke:#7b1fa2,color:#000

| 脚本 | 语言 | 行数 | 复杂度 | 功能 | |------|------|------|--------|------| | connections.py | Python | ~150 | ⭐⭐⭐ | MCP 连接管理:支持 stdio/SSE/HTTP 三种传输协议 | | evaluation.py | Python | ~373 | ⭐⭐⭐⭐ | 评估框架:使用 Claude 运行测试问题并生成报告 |


connections.py 提供了一组可插拔的 MCP 连接类。核心设计是抽象基类 MCPConnection + 三种具体实现(stdio/SSE/HTTP),配合工厂函数 create_connection() 统一创建。

connections.py provides a set of pluggable MCP connection classes. The core design is an abstract base class MCPConnection + three concrete implementations (stdio/SSE/HTTP), with a factory function create_connection() for unified creation.

connections.py — MCP 连接管理器 ↗ 源文件
1 """Lightweight connection handling for MCP servers.""" 2 3 from abc import ABC, abstractmethod 4 from contextlib import AsyncExitStack 5 from typing import Any 6 7 from mcp import ClientSession, StdioServerParameters 8 from mcp.client.sse import sse_client 9 from mcp.client.stdio import stdio_client 10 from mcp.client.streamable_http import streamablehttp_client 11 12 13 class MCPConnection(ABC): 14 """Base class for MCP server connections.""" 15 16 def __init__(self): 17 self.session = None 18 self._stack = None 19 20 @abstractmethod 21 def _create_context(self): 22 """Create the connection context based on connection type.""" 23 24 async def __aenter__(self): 25 """Initialize MCP server connection.""" 26 self._stack = AsyncExitStack() 27 await self._stack.__aenter__() 28 29 try: 30 ctx = self._create_context() 31 result = await self._stack.enter_async_context(ctx) 32 33 if len(result) == 2: 34 read, write = result 35 elif len(result) == 3: 36 read, write, _ = result 37 else: 38 raise ValueError(f"Unexpected context result: {result}") 39 40 session_ctx = ClientSession(read, write) 41 self.session = await self._stack.enter_async_context(session_ctx) 42 await self.session.initialize() 43 return self 44 except BaseException: 45 await self._stack.__aexit__(None, None, None) 46 raise 47 48 async def __aexit__(self, exc_type, exc_val, exc_tb): 49 """Clean up MCP server connection resources.""" 50 if self._stack: 51 await self._stack.__aexit__(exc_type, exc_val, exc_tb) 52 self.session = None 53 self._stack = None 54 55 async def list_tools(self) -> list[dict[str, Any]]: 56 """Retrieve available tools from the MCP server.""" 57 response = await self.session.list_tools() 58 return [ 59 { 60 "name": tool.name, 61 "description": tool.description, 62 "input_schema": tool.inputSchema, 63 } 64 for tool in response.tools 65 ] 66 67 async def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any: 68 """Call a tool on the MCP server with provided arguments.""" 69 result = await self.session.call_tool(tool_name, arguments=arguments) 70 return result.content 71 72 73 class MCPConnectionStdio(MCPConnection): 74 """MCP connection using standard input/output.""" 75 76 def __init__(self, command: str, args: list[str] = None, env: dict[str, str] = None): 77 super().__init__() 78 self.command = command 79 self.args = args or [] 80 self.env = env 81 82 def _create_context(self): 83 return stdio_client( 84 StdioServerParameters(command=self.command, args=self.args, env=self.env) 85 ) 86 87 88 class MCPConnectionSSE(MCPConnection): 89 """MCP connection using Server-Sent Events.""" 90 91 def __init__(self, url: str, headers: dict[str, str] = None): 92 super().__init__() 93 self.url = url 94 self.headers = headers or {} 95 96 def _create_context(self): 97 return sse_client(url=self.url, headers=self.headers) 98 99 100 class MCPConnectionHTTP(MCPConnection): 101 """MCP connection using Streamable HTTP.""" 102 103 def __init__(self, url: str, headers: dict[str, str] = None): 104 super().__init__() 105 self.url = url 106 self.headers = headers or {} 107 108 def _create_context(self): 109 return streamablehttp_client(url=self.url, headers=self.headers) 110 111 112 def create_connection( 113 transport: str, 114 command: str = None, 115 args: list[str] = None, 116 env: dict[str, str] = None, 117 url: str = None, 118 headers: dict[str, str] = None, 119 ) -> MCPConnection: 120 """Factory function to create the appropriate MCP connection.""" 121 transport = transport.lower() 122 123 if transport == "stdio": 124 if not command: 125 raise ValueError("Command is required for stdio transport") 126 return MCPConnectionStdio(command=command, args=args, env=env) 127 128 elif transport == "sse": 129 if not url: 130 raise ValueError("URL is required for sse transport") 131 return MCPConnectionSSE(url=url, headers=headers) 132 133 elif transport in ["http", "streamable_http", "streamable-http"]: 134 if not url: 135 raise ValueError("URL is required for http transport") 136 return MCPConnectionHTTP(url=url, headers=headers) 137 138 else: 139 raise ValueError(f"Unsupported transport type: {transport}. Use 'stdio', 'sse', or 'http'")
代码解读
L3 从标准库导入 ABC 和 abstractmethod——Python 抽象基类模式的标志性写法。 L4 AsyncExitStack 是 async with 上下文管理器的堆栈版本,允许动态添加和清理多个上下文管理器——这是管理 MCP 连接生命周期(stdio、SSE、HTTP 协议栈)的关键。 L6 ClientSession 是 MCP Python SDK 的核心——包装传输通道并实现 MCP 协议的消息交换。 L13 MCPConnection 使用 ABC 定义抽象基类。这是 "Template Method" 设计模式——_create_context() 是模板方法,子类提供具体实现。 L20 @abstractmethod 装饰器强制子类实现 _create_context()。这是 "依赖倒置" 原则——高层策略(生命周期管理)不依赖具体实现。 L22 __aenter__ 是 async context manager 协议的一部分。使用 AsyncExitStack 管理嵌套上下文的生命周期——先创建一个堆栈,再依次进入传输上下文和会话上下文。 L31 兼容两种上下文返回格式:stdio_client 返回 (read, write),sse_client 返回 (read, write, event_handler)。这种 len() 检查使基类能同时兼容不同传输。 L39 创建 ClientSession 并将其推入堆栈,然后调用 initialize() 完成 MCP 协议握手。 L55 list_tools() 封装 MCP 协议的工具发现功能,从 SDK 的 Tool 对象中提取 name、description、inputSchema。 L64 call_tool() 封装工具调用。这两个方法构成了 MCP 连接的核心接口——任何子类都自动继承它们。 L73 MCPConnectionStdio:用于本地进程的 stdio 传输。接收 command、args、env 参数,通过 StdioServerParameters 封装启动信息。 L88 MCPConnectionSSE:用于远程服务器的 Server-Sent Events 传输。支持自定义 HTTP headers。 L100 MCPConnectionHTTP:用于远程服务器的 Streamable HTTP 传输。最新 MCP 规范,推荐用于远程场景。 L112 create_connection() 工厂函数:通过 transport 参数决定实例化哪个具体类。这是一种 "Simple Factory" 模式,将创建逻辑集中管理。 L121 工厂函数统一了所有传输类型的参数,只有参数有效时才创建对应实例。无效参数/组合会在工厂层抛出 ValueError。

evaluation.py 是 mcp-builder 的评估驱动引擎。它使用 connections.py 连接到 MCP 服务器,通过 Claude 运行测试问题,并生成 Markdown 评估报告。

evaluation.py is the evaluation engine of mcp-builder. It uses connections.py to connect to MCP servers, runs test questions through Claude, and generates Markdown evaluation reports.

evaluation.py — 评估框架(关键逻辑) ↗ 源文件
1 """MCP Server Evaluation Harness""" 2 import argparse, asyncio, json, re, sys, time, traceback 3 import xml.etree.ElementTree as ET 4 from pathlib import Path 5 from typing import Any 6 7 from anthropic import Anthropic 8 from connections import create_connection 9 10 11 EVALUATION_PROMPT = """...""" # System prompt instructing Claude to use XML tags 12 13 14 def parse_evaluation_file(file_path: Path) -> list[dict[str, Any]]: 15 """Parse XML evaluation file with qa_pair elements.""" 16 try: 17 tree = ET.parse(file_path) 18 root = tree.getroot() 19 evaluations = [] 20 for qa_pair in root.findall(".//qa_pair"): 21 question_elem = qa_pair.find("question") 22 answer_elem = qa_pair.find("answer") 23 if question_elem is not None and answer_elem is not None: 24 evaluations.append({ 25 "question": (question_elem.text or "").strip(), 26 "answer": (answer_elem.text or "").strip(), 27 }) 28 return evaluations 29 except Exception as e: 30 print(f"Error parsing evaluation file {file_path}: {e}") 31 return [] 32 33 34 def extract_xml_content(text: str, tag: str) -> str | None: 35 """Extract content from XML tags.""" 36 pattern = rf"<{tag}>(.*?)</{tag}>" 37 matches = re.findall(pattern, text, re.DOTALL) 38 return matches[-1].strip() if matches else None 39 40 41 async def agent_loop(client, model, question, tools, connection): 42 """Run the agent loop with MCP tools.""" 43 messages = [{"role": "user", "content": question}] 44 45 response = await asyncio.to_thread( 46 client.messages.create, 47 model=model, max_tokens=4096, 48 system=EVALUATION_PROMPT, 49 messages=messages, tools=tools, 50 ) 51 messages.append({"role": "assistant", "content": response.content}) 52 tool_metrics = {} 53 54 while response.stop_reason == "tool_use": 55 tool_use = next(block for block in response.content 56 if block.type == "tool_use") 57 tool_name = tool_use.name 58 tool_input = tool_use.input 59 60 tool_start_ts = time.time() 61 try: 62 tool_result = await connection.call_tool(tool_name, tool_input) 63 tool_response = json.dumps(tool_result) if isinstance(tool_result, (dict, list)) else str(tool_result) 64 except Exception as e: 65 tool_response = f"Error: {str(e)} 66 {traceback.format_exc()}" 67 tool_duration = time.time() - tool_start_ts 68 69 if tool_name not in tool_metrics: 70 tool_metrics[tool_name] = {"count": 0, "durations": []} 71 tool_metrics[tool_name]["count"] += 1 72 tool_metrics[tool_name]["durations"].append(tool_duration) 73 74 messages.append({ 75 "role": "user", 76 "content": [{ 77 "type": "tool_result", 78 "tool_use_id": tool_use.id, 79 "content": tool_response, 80 }] 81 }) 82 83 response = await asyncio.to_thread( 84 client.messages.create, 85 model=model, max_tokens=4096, 86 system=EVALUATION_PROMPT, 87 messages=messages, tools=tools, 88 ) 89 messages.append({"role": "assistant", "content": response.content}) 90 91 response_text = next( 92 (block.text for block in response.content 93 if hasattr(block, "text")), None) 94 return response_text, tool_metrics 95 96 97 async def run_evaluation(eval_path, connection, model): 98 """Run evaluation with MCP server tools.""" 99 client = Anthropic() 100 tools = await connection.list_tools() 101 qa_pairs = parse_evaluation_file(eval_path) 102 results = [] 103 104 for i, qa_pair in enumerate(qa_pairs): 105 result = await evaluate_single_task( 106 client, model, qa_pair, tools, connection, i) 107 results.append(result) 108 109 # Generate report 110 correct = sum(r["score"] for r in results) 111 accuracy = (correct / len(results)) * 100 if results else 0 112 total_duration = sum(r["total_duration"] for r in results) 113 # ... build markdown report string ... 114 return report
代码解读
L3 import connections——这是关键依赖:evaluation.py 通过 import create_connection 复用 connections.py 的传输层,体现了代码复用的分层设计。 L12 EVALUATION_PROMPT 定义 Claude 在评估中的行为——包括 XML 标签格式要求(<summary>, <feedback>, <response>),是 Claude-as-judge 模式的核心。 L17 parse_evaluation_file() 使用 xml.etree.ElementTree 解析 XML 评估文件。评估文件格式是 <evaluation>/<qa_pair>/<question> + <answer>。 L35 extract_xml_content() 使用正则从 Claude 响应中提取 XML 标签内容。简单但有效——因为 EVALUATION_PROMPT 强制 Claude 使用固定标签格式。 L42 agent_loop() 是评估核心:运行 Claude + MCP 工具的 agentic 循环,支持多轮工具调用。这是一个标准的 "ReAct" 循环实现。 L46 asyncio.to_thread() 将同步的 Anthropic SDK 调用转换为异步——保持事件循环不被阻塞。这是将同步 SDK 集成到异步代码中的标准技巧。 L53 while response.stop_reason == "tool_use": 是 agent 循环的关键——只要 Claude 要求调用工具,就继续循环,直到它给出最终回答。 L59 connection.call_tool() 来自 connections.py——通过 MCP 协议执行工具。tool_metrics 收集每次调用的耗时,用于性能分析。 L89 run_evaluation() 是顶层协调函数:连接 MCP 服务器、发现工具、加载评估用例、遍历执行、生成报告。

mcp-builder 脚本依赖图

graph LR
  A[evaluation.py] -->|import create_connection| B[connections.py]
  B -->|MCP| C[MCP Server via stdio/sse/http]
  A -->|Anthropic SDK| D[Claude API]
  A -->|parse| E[eval XML File]
  A -->|generates| F[Markdown Report]
  B -->|MCPClientSession| G[list_tools / call_tool]

  style A fill:#81c784,stroke:#388e3c,color:#000
  style B fill:#81c784,stroke:#388e3c,color:#000
  style D fill:#ffb74d,stroke:#f57c00,color:#000
  style F fill:#ce93d8,stroke:#7b1fa2,color:#000
  1. ABC 可插拔传输层MCPConnection 抽象基类定义了统一接口,三种具体子类通过工厂函数实例化——这是典型的 Strategy 模式
  2. Claude-as-Judge 评估:evaluation.py 使用 Claude 本身来评估 MCP 服务器的效果,通过结构化 XML 标签(summary/feedback/response)控制输出格式
  3. 脚本分层:connections.py 是纯”基础设施”,evaluation.py 是”业务流程”——职责分离清晰
  4. AsyncExitStack 生命周期:确保 MCP 连接在异常时也能正确清理资源

“如果你想构建其他 API 的连接器…”

  1. 保留 ABC 模式:提取抽象基类 APIConnection + 工厂函数——这是最通用的模式
  2. 替换传输层:将 stdio/SSE/HTTP 替换为你的协议(gRPC、WebSocket、REST)
  3. 保留评估框架:evaluation.py 的 Claude-as-Judge 模式可以通用于任何工具评估
  4. 替换评估 XML:将 MCP 测试用例替换为你的 API 测试用例
  5. 移除 language-specific 参考:替换 reference/ 中的语言指南为你的技术栈文档

⚠️ AsyncExitStack 必须正确清理: __aexit__ 中的异常会覆盖原始异常——connections.py 中用了 try/except BaseException 是最佳实践

⚠️ SDK 兼容性: MCP Python SDK 的 ClientSession API 可能变化——确保 requirements.txt 锁定版本

⚠️ 评估提示词设计: EVALUATION_PROMPT 中如果 XML 标签描述不够精确,Claude 的输出可能无法解析

⚠️ 不要在生产 MCP 服务器上运行评估: evaluation.py 会调用工具并可能产生副作用——确保测试环境是隔离的

  1. ABC Pluggable Transport Layer: MCPConnection abstract base class defines a unified interface, with three concrete subclasses instantiated via factory function — a classic Strategy pattern
  2. Claude-as-Judge Evaluation: evaluation.py uses Claude itself to evaluate MCP server effectiveness, controlling output format through structured XML tags (summary/feedback/response)
  3. Script Layering: connections.py is pure “infrastructure”, evaluation.py is “business logic” — clear separation of concerns
  4. AsyncExitStack Lifecycle: Ensures MCP connections are properly cleaned up even on exceptions

“If you want to build connectors for other APIs…”

  1. Keep ABC Pattern: Extract APIConnection abstract base class + factory function — the most universal pattern
  2. Replace Transport Layer: Swap stdio/SSE/HTTP for your protocol (gRPC, WebSocket, REST)
  3. Keep Evaluation Framework: The Claude-as-Judge pattern in evaluation.py is generic for any tool evaluation
  4. Replace Evaluation XML: Swap MCP test cases for your API test cases
  5. Remove language-specific references: Replace reference/ language guides with your tech stack docs

⚠️ AsyncExitStack must clean up properly: Exceptions in __aexit__ can mask original exceptions — connections.py’s try/except BaseException is best practice

⚠️ SDK Compatibility: MCP Python SDK’s ClientSession API may change — pin versions in requirements.txt

⚠️ Evaluation Prompt Design: If XML tag descriptions in EVALUATION_PROMPT are not precise enough, Claude’s output may be unparseable

⚠️ Don’t run eval on production MCP servers: evaluation.py calls tools and may cause side effects — ensure test environments are isolated

模式说明适用于...
ABC + Factory抽象基类定义接口 + 工厂函数统一创建任何需要支持多种传输/实现的连接库
Claude-as-Judge用 Claude 评估输出,带 XML 标签控制格式AI 输出质量评估场景
Plug-and-Play Transport替换传输实现(stdio/SSE/HTTP)而不改业务代码需要多环境部署的连接工具
评估驱动开发先用 XML 定义测试用例,再实现服务器MCP 服务器等 API 开发
PatternDescriptionApplies to...
ABC + FactoryAbstract base class defines interface + factory creates instancesAny connection library needing multiple transport implementations
Claude-as-JudgeUse Claude to evaluate output, with XML tag format controlAI output quality evaluation scenarios
Plug-and-Play TransportSwap transport implementation without changing business codeConnection tools needing multi-environment deployment
Evaluation-Driven DevelopmentDefine test cases in XML first, then implement serverAPI development like MCP servers