MCP Builder：构建高质量 MCP 服务器

一句话总结

mcp-builder 指导开发者创建高质量的 MCP（Model Context Protocol）服务器，让 Claude 能够发现并使用外部工具和服务。

核心能力

📡 MCP 服务器开发最佳实践（Python + Node.js）
🔌 连接管理和工具注册模式
✅ 内置评估框架，验证 MCP 服务器质量
📚 参考文档涵盖协议规范和实现模式

文件清单

One-Line Summary

mcp-builder guides developers in creating high-quality MCP (Model Context Protocol) servers, enabling Claude to discover and use external tools and services.

Core Capabilities

📡 MCP server development best practices (Python + Node.js)
🔌 Connection management and tool registration patterns
✅ Built-in evaluation framework for MCP server quality validation
📚 Reference docs covering protocol spec and implementation patterns

File Inventory

mcp-builder
- SKILL.md 主入口
- LICENSE.txt Apache 2.0
- scripts
  - connections.py 连接管理
  - evaluation.py 评估框架
  - requirements.txt Python 依赖
- reference

目录结构分析

mcp-builder 是一个脚本驱动型 Skill，围绕 2 个核心 Python 脚本 + 4 个参考文档组织。

Directory Structure Analysis

mcp-builder is a script-driven Skill, organized around 2 core Python scripts + 4 reference documents.

mcp-builder
- SKILL.md 主入口 · ~200 行
- LICENSE.txt Apache 2.0
- scripts Python 脚本 · 3 个文件
  - connections.py ~150 行 · ⭐⭐⭐ · MCP 连接管理
  - evaluation.py ~373 行 · ⭐⭐⭐⭐ · 评估框架
  - requirements.txt Python 依赖
  - example_evaluation.xml 评估用例示例
- reference 参考文档 · 4 个文件
  - mcp_best_practices.md MCP 设计最佳实践
  - python_mcp_server.md Python FastMCP 实现指南
  - node_mcp_server.md TypeScript MCP SDK 实现指南
  - evaluation.md 评估创建指南

SKILL.md 结构解析

约 200 行的 SKILL.md 采用清晰的四阶段工作流结构：

Phase 1：Deep Research and Planning（第 20-75 行）—— API Coverage vs Workflow Tools 设计、协议文档研究、框架学习、实现规划
Phase 2：Implementation（第 78-124 行）—— 项目结构设置、核心基础设施、工具实现（输入/输出 Schema、描述、注解）
Phase 3：Review and Test（第 127-148 行）—— 代码质量检查、构建验证、MCP Inspector 测试
Phase 4：Create Evaluations（第 151-192 行）—— 创建 10 个评估问题、XML 输出格式

语言推荐为 TypeScript（理由：SDK 支持好、AI 生成 TypeScript 代码质量高、静态类型检查），传输层推荐 Streamable HTTP。

模块关系

connections.py 提供底层 MCP 传输层（stdio/SSE/HTTP），evaluation.py 在其之上构建评估框架：

SKILL.md Structure Analysis

~200 line SKILL.md using a clear four-phase workflow structure:

Phase 1: Deep Research and Planning (lines 20-75) — API Coverage vs Workflow Tools design, protocol research, framework study, implementation planning
Phase 2: Implementation (lines 78-124) — Project setup, core infrastructure, tool implementation (input/output schemas, descriptions, annotations)
Phase 3: Review and Test (lines 127-148) — Code quality checks, build verification, MCP Inspector testing
Phase 4: Create Evaluations (lines 151-192) — Create 10 evaluation questions, XML output format

Language recommendation is TypeScript (rationale: better SDK support, AI generates higher quality TypeScript, static typing), and transport recommendation is Streamable HTTP.

Module Relationships

connections.py provides the low-level MCP transport layer (stdio/SSE/HTTP), and evaluation.py builds the evaluation framework on top of it:

mcp-builder 模块关系图

graph TD
  SKILL[SKILL.md] -->|指导| Claude
  Claude -->|调用| eval[evaluation.py]
  eval -->|import| conn[connections.py]
  conn -->|MCP Transport| Stdio[stdio]
  conn -->|MCP Transport| SSE[sse]
  conn -->|MCP Transport| HTTP[streamable HTTP]
  eval -->|工具发现| MCP_Server[MCP Server]
  eval -->|运行评估| Claude_API[Anthropic Claude]
  eval -->|输出| Report[Markdown Report]

  subgraph scripts [scripts/]
      conn
      eval
  end

  subgraph reference [reference/]
      BP[mcp_best_practices.md]
      PY[python_mcp_server.md]
      TS[node_mcp_server.md]
      EG[evaluation.md]
  end

  SKILL --> reference

  style SKILL fill:#4fc3f7,stroke:#0288d1,color:#000
  style conn fill:#81c784,stroke:#388e3c,color:#000
  style eval fill:#81c784,stroke:#388e3c,color:#000
  style Claude_API fill:#ffb74d,stroke:#f57c00,color:#000
  style Report fill:#ce93d8,stroke:#7b1fa2,color:#000

脚本全量清单

| 脚本 | 语言 | 行数 | 复杂度 | 功能 | |------|------|------|--------|------| | connections.py | Python | ~150 | ⭐⭐⭐ | MCP 连接管理：支持 stdio/SSE/HTTP 三种传输协议 | | evaluation.py | Python | ~373 | ⭐⭐⭐⭐ | 评估框架：使用 Claude 运行测试问题并生成报告 |

connections.py — MCP 连接管理

connections.py 提供了一组可插拔的 MCP 连接类。核心设计是抽象基类 MCPConnection + 三种具体实现（stdio/SSE/HTTP），配合工厂函数 create_connection() 统一创建。

connections.py provides a set of pluggable MCP connection classes. The core design is an abstract base class MCPConnection + three concrete implementations (stdio/SSE/HTTP), with a factory function create_connection() for unified creation.

connections.py — MCP 连接管理器 ↗ 源文件

1 """Lightweight connection handling for MCP servers.""" 2 3 from abc import ABC, abstractmethod 4 from contextlib import AsyncExitStack 5 from typing import Any 6 7 from mcp import ClientSession, StdioServerParameters 8 from mcp.client.sse import sse_client 9 from mcp.client.stdio import stdio_client 10 from mcp.client.streamable_http import streamablehttp_client 11 12 13 class MCPConnection(ABC): 14 """Base class for MCP server connections.""" 15 16 def __init__(self): 17 self.session = None 18 self._stack = None 19 20 @abstractmethod 21 def _create_context(self): 22 """Create the connection context based on connection type.""" 23 24 async def __aenter__(self): 25 """Initialize MCP server connection.""" 26 self._stack = AsyncExitStack() 27 await self._stack.__aenter__() 28 29 try: 30 ctx = self._create_context() 31 result = await self._stack.enter_async_context(ctx) 32 33 if len(result) == 2: 34 read, write = result 35 elif len(result) == 3: 36 read, write, _ = result 37 else: 38 raise ValueError(f"Unexpected context result: {result}") 39 40 session_ctx = ClientSession(read, write) 41 self.session = await self._stack.enter_async_context(session_ctx) 42 await self.session.initialize() 43 return self 44 except BaseException: 45 await self._stack.__aexit__(None, None, None) 46 raise 47 48 async def __aexit__(self, exc_type, exc_val, exc_tb): 49 """Clean up MCP server connection resources.""" 50 if self._stack: 51 await self._stack.__aexit__(exc_type, exc_val, exc_tb) 52 self.session = None 53 self._stack = None 54 55 async def list_tools(self) -> list[dict[str, Any]]: 56 """Retrieve available tools from the MCP server.""" 57 response = await self.session.list_tools() 58 return [ 59 { 60 "name": tool.name, 61 "description": tool.description, 62 "input_schema": tool.inputSchema, 63 } 64 for tool in response.tools 65 ] 66 67 async def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any: 68 """Call a tool on the MCP server with provided arguments.""" 69 result = await self.session.call_tool(tool_name, arguments=arguments) 70 return result.content 71 72 73 class MCPConnectionStdio(MCPConnection): 74 """MCP connection using standard input/output.""" 75 76 def __init__(self, command: str, args: list[str] = None, env: dict[str, str] = None): 77 super().__init__() 78 self.command = command 79 self.args = args or [] 80 self.env = env 81 82 def _create_context(self): 83 return stdio_client( 84 StdioServerParameters(command=self.command, args=self.args, env=self.env) 85 ) 86 87 88 class MCPConnectionSSE(MCPConnection): 89 """MCP connection using Server-Sent Events.""" 90 91 def __init__(self, url: str, headers: dict[str, str] = None): 92 super().__init__() 93 self.url = url 94 self.headers = headers or {} 95 96 def _create_context(self): 97 return sse_client(url=self.url, headers=self.headers) 98 99 100 class MCPConnectionHTTP(MCPConnection): 101 """MCP connection using Streamable HTTP.""" 102 103 def __init__(self, url: str, headers: dict[str, str] = None): 104 super().__init__() 105 self.url = url 106 self.headers = headers or {} 107 108 def _create_context(self): 109 return streamablehttp_client(url=self.url, headers=self.headers) 110 111 112 def create_connection( 113 transport: str, 114 command: str = None, 115 args: list[str] = None, 116 env: dict[str, str] = None, 117 url: str = None, 118 headers: dict[str, str] = None, 119 ) -> MCPConnection: 120 """Factory function to create the appropriate MCP connection.""" 121 transport = transport.lower() 122 123 if transport == "stdio": 124 if not command: 125 raise ValueError("Command is required for stdio transport") 126 return MCPConnectionStdio(command=command, args=args, env=env) 127 128 elif transport == "sse": 129 if not url: 130 raise ValueError("URL is required for sse transport") 131 return MCPConnectionSSE(url=url, headers=headers) 132 133 elif transport in ["http", "streamable_http", "streamable-http"]: 134 if not url: 135 raise ValueError("URL is required for http transport") 136 return MCPConnectionHTTP(url=url, headers=headers) 137 138 else: 139 raise ValueError(f"Unsupported transport type: {transport}. Use 'stdio', 'sse', or 'http'")

代码解读

L3 从标准库导入 ABC 和 abstractmethod——Python 抽象基类模式的标志性写法。 L4 AsyncExitStack 是 async with 上下文管理器的堆栈版本，允许动态添加和清理多个上下文管理器——这是管理 MCP 连接生命周期（stdio、SSE、HTTP 协议栈）的关键。 L6 ClientSession 是 MCP Python SDK 的核心——包装传输通道并实现 MCP 协议的消息交换。 L13 MCPConnection 使用 ABC 定义抽象基类。这是 "Template Method" 设计模式——_create_context() 是模板方法，子类提供具体实现。 L20 @abstractmethod 装饰器强制子类实现 _create_context()。这是 "依赖倒置" 原则——高层策略（生命周期管理）不依赖具体实现。 L22 __aenter__ 是 async context manager 协议的一部分。使用 AsyncExitStack 管理嵌套上下文的生命周期——先创建一个堆栈，再依次进入传输上下文和会话上下文。 L31 兼容两种上下文返回格式：stdio_client 返回 (read, write)，sse_client 返回 (read, write, event_handler)。这种 len() 检查使基类能同时兼容不同传输。 L39 创建 ClientSession 并将其推入堆栈，然后调用 initialize() 完成 MCP 协议握手。 L55 list_tools() 封装 MCP 协议的工具发现功能，从 SDK 的 Tool 对象中提取 name、description、inputSchema。 L64 call_tool() 封装工具调用。这两个方法构成了 MCP 连接的核心接口——任何子类都自动继承它们。 L73 MCPConnectionStdio：用于本地进程的 stdio 传输。接收 command、args、env 参数，通过 StdioServerParameters 封装启动信息。 L88 MCPConnectionSSE：用于远程服务器的 Server-Sent Events 传输。支持自定义 HTTP headers。 L100 MCPConnectionHTTP：用于远程服务器的 Streamable HTTP 传输。最新 MCP 规范，推荐用于远程场景。 L112 create_connection() 工厂函数：通过 transport 参数决定实例化哪个具体类。这是一种 "Simple Factory" 模式，将创建逻辑集中管理。 L121 工厂函数统一了所有传输类型的参数，只有参数有效时才创建对应实例。无效参数/组合会在工厂层抛出 ValueError。

evaluation.py — 评估框架

evaluation.py 是 mcp-builder 的评估驱动引擎。它使用 connections.py 连接到 MCP 服务器，通过 Claude 运行测试问题，并生成 Markdown 评估报告。

evaluation.py is the evaluation engine of mcp-builder. It uses connections.py to connect to MCP servers, runs test questions through Claude, and generates Markdown evaluation reports.

evaluation.py — 评估框架（关键逻辑） ↗ 源文件

1 """MCP Server Evaluation Harness""" 2 import argparse, asyncio, json, re, sys, time, traceback 3 import xml.etree.ElementTree as ET 4 from pathlib import Path 5 from typing import Any 6 7 from anthropic import Anthropic 8 from connections import create_connection 9 10 11 EVALUATION_PROMPT = """...""" # System prompt instructing Claude to use XML tags 12 13 14 def parse_evaluation_file(file_path: Path) -> list[dict[str, Any]]: 15 """Parse XML evaluation file with qa_pair elements.""" 16 try: 17 tree = ET.parse(file_path) 18 root = tree.getroot() 19 evaluations = [] 20 for qa_pair in root.findall(".//qa_pair"): 21 question_elem = qa_pair.find("question") 22 answer_elem = qa_pair.find("answer") 23 if question_elem is not None and answer_elem is not None: 24 evaluations.append({ 25 "question": (question_elem.text or "").strip(), 26 "answer": (answer_elem.text or "").strip(), 27 }) 28 return evaluations 29 except Exception as e: 30 print(f"Error parsing evaluation file {file_path}: {e}") 31 return [] 32 33 34 def extract_xml_content(text: str, tag: str) -> str | None: 35 """Extract content from XML tags.""" 36 pattern = rf"<{tag}>(.*?)</{tag}>" 37 matches = re.findall(pattern, text, re.DOTALL) 38 return matches[-1].strip() if matches else None 39 40 41 async def agent_loop(client, model, question, tools, connection): 42 """Run the agent loop with MCP tools.""" 43 messages = [{"role": "user", "content": question}] 44 45 response = await asyncio.to_thread( 46 client.messages.create, 47 model=model, max_tokens=4096, 48 system=EVALUATION_PROMPT, 49 messages=messages, tools=tools, 50 ) 51 messages.append({"role": "assistant", "content": response.content}) 52 tool_metrics = {} 53 54 while response.stop_reason == "tool_use": 55 tool_use = next(block for block in response.content 56 if block.type == "tool_use") 57 tool_name = tool_use.name 58 tool_input = tool_use.input 59 60 tool_start_ts = time.time() 61 try: 62 tool_result = await connection.call_tool(tool_name, tool_input) 63 tool_response = json.dumps(tool_result) if isinstance(tool_result, (dict, list)) else str(tool_result) 64 except Exception as e: 65 tool_response = f"Error: {str(e)} 66 {traceback.format_exc()}" 67 tool_duration = time.time() - tool_start_ts 68 69 if tool_name not in tool_metrics: 70 tool_metrics[tool_name] = {"count": 0, "durations": []} 71 tool_metrics[tool_name]["count"] += 1 72 tool_metrics[tool_name]["durations"].append(tool_duration) 73 74 messages.append({ 75 "role": "user", 76 "content": [{ 77 "type": "tool_result", 78 "tool_use_id": tool_use.id, 79 "content": tool_response, 80 }] 81 }) 82 83 response = await asyncio.to_thread( 84 client.messages.create, 85 model=model, max_tokens=4096, 86 system=EVALUATION_PROMPT, 87 messages=messages, tools=tools, 88 ) 89 messages.append({"role": "assistant", "content": response.content}) 90 91 response_text = next( 92 (block.text for block in response.content 93 if hasattr(block, "text")), None) 94 return response_text, tool_metrics 95 96 97 async def run_evaluation(eval_path, connection, model): 98 """Run evaluation with MCP server tools.""" 99 client = Anthropic() 100 tools = await connection.list_tools() 101 qa_pairs = parse_evaluation_file(eval_path) 102 results = [] 103 104 for i, qa_pair in enumerate(qa_pairs): 105 result = await evaluate_single_task( 106 client, model, qa_pair, tools, connection, i) 107 results.append(result) 108 109 # Generate report 110 correct = sum(r["score"] for r in results) 111 accuracy = (correct / len(results)) * 100 if results else 0 112 total_duration = sum(r["total_duration"] for r in results) 113 # ... build markdown report string ... 114 return report

代码解读

L3 import connections——这是关键依赖：evaluation.py 通过 import create_connection 复用 connections.py 的传输层，体现了代码复用的分层设计。 L12 EVALUATION_PROMPT 定义 Claude 在评估中的行为——包括 XML 标签格式要求（<summary>, <feedback>, <response>），是 Claude-as-judge 模式的核心。 L17 parse_evaluation_file() 使用 xml.etree.ElementTree 解析 XML 评估文件。评估文件格式是 <evaluation>/<qa_pair>/<question> + <answer>。 L35 extract_xml_content() 使用正则从 Claude 响应中提取 XML 标签内容。简单但有效——因为 EVALUATION_PROMPT 强制 Claude 使用固定标签格式。 L42 agent_loop() 是评估核心：运行 Claude + MCP 工具的 agentic 循环，支持多轮工具调用。这是一个标准的 "ReAct" 循环实现。 L46 asyncio.to_thread() 将同步的 Anthropic SDK 调用转换为异步——保持事件循环不被阻塞。这是将同步 SDK 集成到异步代码中的标准技巧。 L53 while response.stop_reason == "tool_use": 是 agent 循环的关键——只要 Claude 要求调用工具，就继续循环，直到它给出最终回答。 L59 connection.call_tool() 来自 connections.py——通过 MCP 协议执行工具。tool_metrics 收集每次调用的耗时，用于性能分析。 L89 run_evaluation() 是顶层协调函数：连接 MCP 服务器、发现工具、加载评估用例、遍历执行、生成报告。

脚本间关系图

mcp-builder 脚本依赖图

graph LR
  A[evaluation.py] -->|import create_connection| B[connections.py]
  B -->|MCP| C[MCP Server via stdio/sse/http]
  A -->|Anthropic SDK| D[Claude API]
  A -->|parse| E[eval XML File]
  A -->|generates| F[Markdown Report]
  B -->|MCPClientSession| G[list_tools / call_tool]

  style A fill:#81c784,stroke:#388e3c,color:#000
  style B fill:#81c784,stroke:#388e3c,color:#000
  style D fill:#ffb74d,stroke:#f57c00,color:#000
  style F fill:#ce93d8,stroke:#7b1fa2,color:#000

设计亮点

ABC 可插拔传输层：MCPConnection 抽象基类定义了统一接口，三种具体子类通过工厂函数实例化——这是典型的 Strategy 模式
Claude-as-Judge 评估：evaluation.py 使用 Claude 本身来评估 MCP 服务器的效果，通过结构化 XML 标签（summary/feedback/response）控制输出格式
脚本分层：connections.py 是纯”基础设施”，evaluation.py 是”业务流程”——职责分离清晰
AsyncExitStack 生命周期：确保 MCP 连接在异常时也能正确清理资源

可复用模式

移植思路

“如果你想构建其他 API 的连接器…”

保留 ABC 模式：提取抽象基类 APIConnection + 工厂函数——这是最通用的模式
替换传输层：将 stdio/SSE/HTTP 替换为你的协议（gRPC、WebSocket、REST）
保留评估框架：evaluation.py 的 Claude-as-Judge 模式可以通用于任何工具评估
替换评估 XML：将 MCP 测试用例替换为你的 API 测试用例
移除 language-specific 参考：替换 reference/ 中的语言指南为你的技术栈文档

常见坑

⚠️ AsyncExitStack 必须正确清理： __aexit__ 中的异常会覆盖原始异常——connections.py 中用了 try/except BaseException 是最佳实践

⚠️ SDK 兼容性： MCP Python SDK 的 ClientSession API 可能变化——确保 requirements.txt 锁定版本

⚠️ 评估提示词设计： EVALUATION_PROMPT 中如果 XML 标签描述不够精确，Claude 的输出可能无法解析

⚠️ 不要在生产 MCP 服务器上运行评估： evaluation.py 会调用工具并可能产生副作用——确保测试环境是隔离的

Design Highlights

ABC Pluggable Transport Layer: MCPConnection abstract base class defines a unified interface, with three concrete subclasses instantiated via factory function — a classic Strategy pattern
Claude-as-Judge Evaluation: evaluation.py uses Claude itself to evaluate MCP server effectiveness, controlling output format through structured XML tags (summary/feedback/response)
Script Layering: connections.py is pure “infrastructure”, evaluation.py is “business logic” — clear separation of concerns
AsyncExitStack Lifecycle: Ensures MCP connections are properly cleaned up even on exceptions

Reusable Patterns

Porting Guide

“If you want to build connectors for other APIs…”

Keep ABC Pattern: Extract APIConnection abstract base class + factory function — the most universal pattern
Replace Transport Layer: Swap stdio/SSE/HTTP for your protocol (gRPC, WebSocket, REST)
Keep Evaluation Framework: The Claude-as-Judge pattern in evaluation.py is generic for any tool evaluation
Replace Evaluation XML: Swap MCP test cases for your API test cases
Remove language-specific references: Replace reference/ language guides with your tech stack docs

Common Pitfalls

⚠️ AsyncExitStack must clean up properly: Exceptions in __aexit__ can mask original exceptions — connections.py’s try/except BaseException is best practice

⚠️ SDK Compatibility: MCP Python SDK’s ClientSession API may change — pin versions in requirements.txt

⚠️ Evaluation Prompt Design: If XML tag descriptions in EVALUATION_PROMPT are not precise enough, Claude’s output may be unparseable

⚠️ Don’t run eval on production MCP servers: evaluation.py calls tools and may cause side effects — ensure test environments are isolated

模式	说明	适用于...
ABC + Factory	抽象基类定义接口 + 工厂函数统一创建	任何需要支持多种传输/实现的连接库
Claude-as-Judge	用 Claude 评估输出，带 XML 标签控制格式	AI 输出质量评估场景
Plug-and-Play Transport	替换传输实现（stdio/SSE/HTTP）而不改业务代码	需要多环境部署的连接工具
评估驱动开发	先用 XML 定义测试用例，再实现服务器	MCP 服务器等 API 开发

Pattern	Description	Applies to...
ABC + Factory	Abstract base class defines interface + factory creates instances	Any connection library needing multiple transport implementations
Claude-as-Judge	Use Claude to evaluate output, with XML tag format control	AI output quality evaluation scenarios
Plug-and-Play Transport	Swap transport implementation without changing business code	Connection tools needing multi-environment deployment
Evaluation-Driven Development	Define test cases in XML first, then implement server	API development like MCP servers