BibiGPT 让 AI Agent 获得「看视频、听音频」的能力——粘贴一个链接,自动生成结构化总结、字幕提取、图文改写甚至 TikTok 风格 MV。
- 🎬 智能总结:对任意音视频 URL 一键生成 AI 结构化摘要
- 📋 分章节拆解:带时间戳的章节级深度分析,支持追问 Q&A
- 📝 字幕提取:拉取完整字幕/转写文本,支持说话人识别
- ✍️ 图文改写:视频 → 公众号图文 / 小红书 / 博客 / Twitter 线程
- 🎵 短视频 MV:将长视频内容创作成 TikTok 风格的竖屏音乐视频
- 📚 批量处理:多 URL 队列处理 + 跨源综合研究
- 💾 笔记导出:一键保存到 Notion/Obsidian/本地 Markdown
- 🔍 画面分析:解析视频画面内容、PPT、屏幕文字(Pro)
- 📡 频道订阅:订阅 YouTube/B站频道,获取最新内容动态
- 💰 智能支付:支持支付宝 AI 收协议(HTTP 402),Agent 可自主完成按次付费
用户说”总结这个视频”、“帮我看看这个 B 站视频讲了什么”、“提取字幕”、“把这期播客改写成公众号文章”、“把这个视频做成 TikTok MV”、“批量总结这 5 个视频”、“导出到 Notion”等任何涉及音视频内容理解的请求时触发。
One-Line Summary
Section titled “One-Line Summary”BibiGPT gives AI Agents the ability to “watch videos and listen to audio” — paste a URL and get structured AI summaries, transcript extraction, article rewrites, or even TikTok-style music videos.
Core Capabilities
Section titled “Core Capabilities”- 🎬 Smart Summarization: One-click AI summary for any video/audio URL
- 📋 Chapter Breakdown: Timestamped chapter-level deep analysis with Q&A
- 📝 Transcript Extraction: Full subtitles/transcripts with speaker identification
- ✍️ Article Rewrite: Video → blog post / newsletter / Twitter thread / social copy
- 🎵 Short-form MV: Turn long-form video content into TikTok-style vertical music videos
- 📚 Batch Processing: Multi-URL queue processing + cross-source research synthesis
- 💾 Notes Export: One-click save to Notion/Obsidian/local Markdown
- 🔍 Visual Analysis: Analyze on-screen content, slides, and text (Pro)
- 📡 Channel Subscriptions: Subscribe to YouTube/Bilibili channels for latest content
- 💰 Smart Payment: Alipay AI收 protocol (HTTP 402) for autonomous per-call agent payment
Trigger Scenarios
Section titled “Trigger Scenarios”When the user says “summarize this video”, “what’s this video about”, “extract subtitles”, “turn this podcast into an article”, “make a TikTok MV from this”, “batch summarize these 5 videos”, “export to Notion”, or any request involving video/audio content understanding.
File Inventory
Section titled “File Inventory”- bibigpt
- README.md
- README_zh.md
- CLAUDE.md
- COZE.md
- .gitignore
- skills/bibi
- SKILL.md
- scripts
- references
- workflows
目录结构分析
Section titled “目录结构分析”BibiGPT 的技能结构遵循经典的 意图路由 + 工作流分发 模式。核心入口 SKILL.md 不包含具体执行逻辑——它只是一个「路由表」,根据用户的自然语言意图匹配到 16 个独立工作流文件之一。这种设计的一个关键优势是:新增功能只需添加一个新的 workflow 文件并在路由表中注册,不影响现有工作流。
references/ 目录包含 6 个参考文档,覆盖了 CLI 命令、API 端点、安装配置、平台支持、以及独特的支付宝 AI 收计费协议。scripts/ 目录只有一个环境检测脚本,负责判断当前环境是否安装了 bibi CLI 或配置了 API Token——这是整个 skill 的「模式开关」。
SKILL.md 结构解析
Section titled “SKILL.md 结构解析”SKILL.md 采用了非常清晰的四层结构:
- 环境检测层:首先运行
bibi-check.sh确定可用模式(CLI 还是 API),这是后续所有工作流的前提 - 意图路由表:一张 17 行的表格,将用户的各种自然语言意图精确映射到对应的工作流文件。路由表的设计遵循「宁可问清楚也不错判」的原则——如果意图模糊,要求 Agent 向用户追问
- 消歧规则:明确定义了「匹配多个」→ 追问、「匹配零个」→ 追问、「只发了链接没说话」→ 默认快速总结 的处理逻辑
- 引用索引:用表格形式列出所有参考文档及其内容概要,方便 Agent 按需跳转
Directory Structure Analysis
Section titled “Directory Structure Analysis”BibiGPT’s skill structure follows the classic Intent Router + Workflow Dispatch pattern. The core entry point SKILL.md contains no execution logic — it’s purely a “routing table” that maps natural language user intents to one of 16 independent workflow files. A key advantage: adding a new feature only requires a new workflow file and a routing table entry, without touching existing workflows.
The references/ directory holds 6 reference documents covering CLI commands, API endpoints, installation & auth, platform support, and the unique Alipay AI收 billing protocol. The scripts/ directory has a single environment detection script that determines whether the bibi CLI is installed or an API token is configured — this is the “mode switch” for the entire skill.
SKILL.md Structure Analysis
Section titled “SKILL.md Structure Analysis”SKILL.md uses a very clean four-layer structure:
- Environment Detection Layer: Runs
bibi-check.shfirst to determine available mode (CLI or API) — the prerequisite for all subsequent workflows - Intent Routing Table: A 17-row table that precisely maps various natural language intents to corresponding workflow files. The routing design follows the “ask before guessing” principle — if intent is ambiguous, the Agent must ask the user
- Disambiguation Rules: Clearly defines the logic for “multiple matches” → ask, “zero matches” → ask, “just a URL with no context” → default to quick summary
- Reference Index: A table listing all reference docs with content summaries, making it easy for the Agent to navigate on demand
BibiGPT 的模块依赖非常清晰,是一个典型的星型结构:SKILL.md(意图路由器)位于中心,向外连接到 16 个工作流和 5 个参考文档。所有工作流的第一步都是执行环境检测脚本,确保后续操作在正确的模式(CLI/API)下运行。
Module Relationships
Section titled “Module Relationships”BibiGPT’s module dependencies form a clear star topology: SKILL.md (the intent router) sits at the center, radiating outward to 16 workflows and 5 reference documents. Every workflow’s first step is executing the environment detection script, ensuring subsequent operations run in the correct mode (CLI/API).
BibiGPT 模块关系图
graph TD SKILL["SKILL.md
意图路由器"] CHECK["scripts/bibi-check.sh
环境检测"] CLI["references/cli.md
CLI 命令参考"] API["references/api.md
OpenAPI 端点参考"] INSTALL["references/installation.md
安装认证指南"] PLATFORM["references/supported-platforms.md
平台支持"] BILLING["references/billing-aipay.md
支付宝AI收协议"] QS["quick-summary
快速总结"] DD["deep-dive
深度分析"] TE["transcript-extract
字幕提取"] AR["article-rewrite
图文改写"] MV["video-to-tiktok-mv
TikTok MV"] BP["batch-process
批量处理"] RC["research-compile
跨源研究"] EN["export-notes
笔记导出"] VA["visual-analysis
画面分析"] AC["account-check
账户查询"] LB["library-browse
内容库"] CM["channels-manage
频道管理"] FL["feed-latest
订阅动态"] CL["collections-manage
合集管理"] NM["notes-manage
笔记管理"] AT["advanced-tools
高级工具"] SKILL --> CHECK SKILL --> QS SKILL --> DD SKILL --> TE SKILL --> AR SKILL --> MV SKILL --> BP SKILL --> RC SKILL --> EN SKILL --> VA SKILL --> AC SKILL --> LB SKILL --> CM SKILL --> FL SKILL --> CL SKILL --> NM SKILL --> AT CHECK --> CLI CHECK --> API QS --> CLI QS --> API QS --> PLATFORM DD --> CLI DD --> API TE --> CLI TE --> API AR --> TE MV --> DD MV --> TE BP --> QS RC --> QS EN --> QS EN --> INSTALL SKILL --> BILLING style SKILL fill:#e1f5fe,stroke:#0288d1 style CHECK fill:#fff3e0,stroke:#f57c00 style BILLING fill:#fce4ec,stroke:#c62828
脚本全量清单
Section titled “脚本全量清单”| 脚本 | 语言 | 行数 | 复杂度 | 功能 |
|---|---|---|---|---|
bibi-check.sh | Bash | ~30 | ⭐⭐ | 环境检测:自动判断 CLI/API 模式可用性,并针对不同操作系统给出对应安装指引 |
| Script | Language | Lines | Complexity | Purpose |
|---|---|---|---|---|
bibi-check.sh | Bash | ~30 | ⭐⭐ | Environment detection: auto-detect CLI/API mode availability with OS-specific install guidance |
bibi-check.sh — 环境模式检测器
Section titled “bibi-check.sh — 环境模式检测器”bibi-check.sh 是整个 skill 的「模式开关」。它检测 bibi CLI 是否安装以及 BIBI_API_TOKEN 环境变量是否设置,从而决定后续所有工作流使用哪种调用方式。如果两者都不可用,脚本会根据操作系统给出对应的安装命令。
bibi-check.sh is the “mode switch” for the entire skill. It detects whether the bibi CLI is installed and whether the BIBI_API_TOKEN env var is set, determining which invocation mode all subsequent workflows will use. If neither is available, it provides OS-specific installation commands.
-
意图路由表模式:SKILL.md 作为纯路由层,不包含执行逻辑——用一张 17 行的表格完成从自然语言到工作流文件的精确映射。新增功能只需添加工作流文件 + 表行,符合开闭原则。
-
三层接入架构:同一套能力通过 CLI(本地处理本地文件)、OpenAPI(HTTP 直连,适合容器/CI)、Remote MCP Server(零安装,适合所有 MCP 客户端)三种方式暴露。每种方式共享相同的后端能力,但适配不同的使用场景。
-
环境自适应检测:
bibi-check.sh的检测优先级(CLI → API Token → 无)反映了功能丰富度的递减顺序。CLI 支持本地文件而 API 不支持——所以优先检测 CLI。 -
Agent-native 创意工作流:
article-rewrite.md和video-to-tiktok-mv.md让 Agent 自身完成创意工作(改写文章、创作歌词),而不是调用外部 API。这利用了 LLM 最强的文本生成能力,同时保持后端 API 的职责边界清晰。 -
支付宝 AI 收 (HTTP 402) 协议集成:这是所有 Anthropic Skills 中唯一一个实现了按次付费计费协议的 skill。当用户没有订阅时,Agent 可以自主学习处理 HTTP 402 响应,触发支付宝付款流程——这是一种面向 Agent Economy 的前瞻性设计。
-
渐进式帮助发现:CLI 使用
bibi --help→bibi summarize --help→ 执行的渐进式帮助设计,让 Agent 可以在运行时动态发现可用命令,而不需要把所有文档都塞进上下文中。 -
契约文档 (CLAUDE.md):项目根目录的 CLAUDE.md 明确区分了「Agent 需要知道的」(契约)和「团队内部状态」(不该给 Agent 看的),用 Hard No / Hard Yes 的方式划定边界——这是一种可复用的文档治理模式。
| 模式 | 适用场景 | 核心要点 |
|---|---|---|
| 意图路由表 | 功能丰富的 skill,有多个独立工作流 | SKILL.md 只负责匹配和分发,不包含执行逻辑 |
| 环境自适应检测 | 需要在不同环境下以不同方式工作的 skill | 用脚本检测环境,按优先级降级(富功能 → 基本功能) |
| Agent-native 生成 | 需要创意产出的 skill(文章/歌词/设计) | 让 Agent 用 LLM 能力完成创意工作,API 只提供原料数据 |
| 三层接入架构 | 需要在多种平台上使用的 skill | CLI + HTTP API + MCP Server,共享后端、适配不同入口 |
| Pattern | Use Case | Key Point |
|---|---|---|
| Intent Routing Table | Feature-rich skills with multiple independent workflows | SKILL.md only routes and dispatches, contains no execution logic |
| Environment Auto-Detection | Skills that need to work differently across environments | Script detects env, degrades by priority (rich → basic functionality) |
| Agent-Native Generation | Skills requiring creative output (articles/lyrics/design) | Agent uses LLM for creative work, API only provides raw data |
| Three-Tier Access Architecture | Skills that need to work across multiple platforms | CLI + HTTP API + MCP Server, shared backend, different entry points |
如果你想做一个类似「AI 服务 + Agent Skill」的项目,核心思路是:
- 先做后端服务(REST API / MCP Server),让服务本身可以独立调用
- 再做 Skill 包装层:SKILL.md 定义意图路由,workflows/ 写具体操作步骤,references/ 写 API/CLI 参考
- 最后做环境适配:保证在 CLI 不可用时有 API 兜底,在完全没有配置时有清晰的安装指引
BibiGPT 的三层架构(CLI → API → MCP)也可以简化:如果你的服务只通过 API 暴露,可以直接跳过 CLI 层,做一个「API + MCP」两层架构的 skill。
- ⚠️ URL 编码陷阱:OpenAPI 模式要求 URL 必须百分号编码后作为查询参数传递,否则特殊字符(
?、&)会被 shell 或 HTTP 解析器错误处理。所有 workflow 中的 API 调用都要先用python3 -c 'import urllib.parse...'编码 URL。 - ⚠️ 本地文件 vs API 的不对称性:CLI 模式支持直接传本地文件路径(
bibi summarize "/path/to/file.mp4"),但 API 模式不支持文件上传。如果用户在 API 模式下提供本地文件,需要引导他们先上传到公网 URL。 - ⚠️ CLI 输出的 stdout/stderr 分离:
bibi summarize将 Markdown 摘要输出到 stdout、进度信息输出到 stderr。Agent 管道操作(> summary.md、| jq)只捕获 stdout,不会混入进度信息。这是有意为之的管道友好设计。 - ⚠️ manifest 热缓存延迟:CLI 的
bibi --help使用的是本地缓存的命令清单(24h TTL),新发布的命令可能不会立即出现。需要用bibi commands强制刷新缓存。
Design Highlights
Section titled “Design Highlights”-
Intent Routing Table Pattern: SKILL.md is a pure routing layer with no execution logic — a 17-row table maps natural language to workflow files precisely. Adding features only requires a new workflow file + table row, following the open/closed principle.
-
Three-Tier Access Architecture: The same capabilities exposed through CLI (local file processing), OpenAPI (HTTP for containers/CI), and Remote MCP Server (zero-install for all MCP clients). Each shares the same backend but adapts to different usage scenarios.
-
Environment Auto-Detection:
bibi-check.sh’s detection priority (CLI → API Token → None) reflects descending functionality richness. CLI supports local files while API doesn’t — hence CLI is checked first. -
Agent-Native Creative Workflows:
article-rewrite.mdandvideo-to-tiktok-mv.mdhave the Agent itself perform creative work (article rewriting, lyric composition) rather than calling external APIs. This leverages the LLM’s strongest capability (text generation) while keeping the backend API’s responsibility boundaries clean. -
Alipay AI收 (HTTP 402) Protocol Integration: This is the only Anthropic Skill that implements a per-call payment billing protocol. When users have no subscription, the Agent can autonomously learn to handle HTTP 402 responses and trigger Alipay payment flows — a forward-looking design for the Agent Economy.
-
Progressive Help Discovery: The CLI uses
bibi --help→bibi summarize --help→ execute progressive help design, enabling Agents to dynamically discover available commands at runtime without loading all documentation into context. -
Contract Documentation (CLAUDE.md): The project root’s CLAUDE.md clearly distinguishes “what the Agent needs to know” (contract) from “internal team state” (what the Agent shouldn’t see), using Hard No / Hard Yes boundaries — a reusable documentation governance pattern.
Reusable Patterns
Section titled “Reusable Patterns”| 模式 | 适用场景 | 核心要点 |
|---|---|---|
| 意图路由表 | 功能丰富的 skill,有多个独立工作流 | SKILL.md 只负责匹配和分发,不包含执行逻辑 |
| 环境自适应检测 | 需要在不同环境下以不同方式工作的 skill | 用脚本检测环境,按优先级降级(富功能 → 基本功能) |
| Agent-native 生成 | 需要创意产出的 skill(文章/歌词/设计) | 让 Agent 用 LLM 能力完成创意工作,API 只提供原料数据 |
| 三层接入架构 | 需要在多种平台上使用的 skill | CLI + HTTP API + MCP Server,共享后端、适配不同入口 |
| Pattern | Use Case | Key Point |
|---|---|---|
| Intent Routing Table | Feature-rich skills with multiple independent workflows | SKILL.md only routes and dispatches, contains no execution logic |
| Environment Auto-Detection | Skills that need to work differently across environments | Script detects env, degrades by priority (rich → basic functionality) |
| Agent-Native Generation | Skills requiring creative output (articles/lyrics/design) | Agent uses LLM for creative work, API only provides raw data |
| Three-Tier Access Architecture | Skills that need to work across multiple platforms | CLI + HTTP API + MCP Server, shared backend, different entry points |
Porting Guide
Section titled “Porting Guide”To build a similar “AI Service + Agent Skill” project, the core approach is:
- Build the backend service first (REST API / MCP Server) so it can be called independently
- Then build the Skill wrapper: SKILL.md defines intent routing, workflows/ contains step-by-step operations, references/ provides API/CLI references
- Finally, environment adaptation: ensure API fallback when CLI is unavailable, and clear installation guidance when nothing is configured
BibiGPT’s three-tier architecture (CLI → API → MCP) can also be simplified: if your service is only exposed via API, skip the CLI layer and build a two-tier “API + MCP” skill.
Common Pitfalls
Section titled “Common Pitfalls”- ⚠️ URL Encoding Trap: OpenAPI mode requires URLs to be percent-encoded before passing as query params, otherwise special characters (
?,&) are mishandled by shell or HTTP parsers. Every API call in workflows must first encode the URL withpython3 -c 'import urllib.parse...'. - ⚠️ Local File vs API Asymmetry: CLI mode supports direct local file paths (
bibi summarize "/path/to/file.mp4"), but API mode doesn’t support file uploads. If users provide local files in API mode, guide them to first upload to a public URL. - ⚠️ CLI stdout/stderr Separation:
bibi summarizeoutputs Markdown summaries to stdout and progress messages to stderr. Agent pipe operations (> summary.md,| jq) only capture stdout without mixing in progress info — an intentional pipe-friendly design. - ⚠️ Manifest Hot-Cache Delay:
bibi --helpuses locally cached command listings (24h TTL), so newly released commands may not appear immediately. Usebibi commandsto force-refresh the cache.