BibiGPT：AI 音视频智能总结助手

一句话总结

BibiGPT 让 AI Agent 获得「看视频、听音频」的能力——粘贴一个链接，自动生成结构化总结、字幕提取、图文改写甚至 TikTok 风格 MV。

核心能力

🎬 智能总结：对任意音视频 URL 一键生成 AI 结构化摘要
📋 分章节拆解：带时间戳的章节级深度分析，支持追问 Q&A
📝 字幕提取：拉取完整字幕/转写文本，支持说话人识别
✍️ 图文改写：视频 → 公众号图文 / 小红书 / 博客 / Twitter 线程
🎵 短视频 MV：将长视频内容创作成 TikTok 风格的竖屏音乐视频
📚 批量处理：多 URL 队列处理 + 跨源综合研究
💾 笔记导出：一键保存到 Notion/Obsidian/本地 Markdown
🔍 画面分析：解析视频画面内容、PPT、屏幕文字（Pro）
📡 频道订阅：订阅 YouTube/B站频道，获取最新内容动态
💰 智能支付：支持支付宝 AI 收协议（HTTP 402），Agent 可自主完成按次付费

触发场景

用户说”总结这个视频”、“帮我看看这个 B 站视频讲了什么”、“提取字幕”、“把这期播客改写成公众号文章”、“把这个视频做成 TikTok MV”、“批量总结这 5 个视频”、“导出到 Notion”等任何涉及音视频内容理解的请求时触发。

文件清单

One-Line Summary

BibiGPT gives AI Agents the ability to “watch videos and listen to audio” — paste a URL and get structured AI summaries, transcript extraction, article rewrites, or even TikTok-style music videos.

Core Capabilities

🎬 Smart Summarization: One-click AI summary for any video/audio URL
📋 Chapter Breakdown: Timestamped chapter-level deep analysis with Q&A
📝 Transcript Extraction: Full subtitles/transcripts with speaker identification
✍️ Article Rewrite: Video → blog post / newsletter / Twitter thread / social copy
🎵 Short-form MV: Turn long-form video content into TikTok-style vertical music videos
📚 Batch Processing: Multi-URL queue processing + cross-source research synthesis
💾 Notes Export: One-click save to Notion/Obsidian/local Markdown
🔍 Visual Analysis: Analyze on-screen content, slides, and text (Pro)
📡 Channel Subscriptions: Subscribe to YouTube/Bilibili channels for latest content
💰 Smart Payment: Alipay AI收 protocol (HTTP 402) for autonomous per-call agent payment

Trigger Scenarios

When the user says “summarize this video”, “what’s this video about”, “extract subtitles”, “turn this podcast into an article”, “make a TikTok MV from this”, “batch summarize these 5 videos”, “export to Notion”, or any request involving video/audio content understanding.

File Inventory

bibigpt
- README.md 项目文档 (EN)
- README_zh.md 项目文档 (中文)
- CLAUDE.md 外部契约文档
- COZE.md Coze 平台适配
- .gitignore
- skills/bibi
  - SKILL.md 意图路由器 (主入口)
  - scripts
    - bibi-check.sh 环境检测脚本
  - references
    - api.md OpenAPI 端点参考 (10+)
    - cli.md CLI 命令参考
    - installation.md 安装与认证指南
    - supported-platforms.md 支持的平台与限制
    - billing-aipay.md 支付宝 AI 收 (402) 协议
    - endpoints.md 自动生成的端点列表
  - workflows
    - quick-summary.md 快速总结
    - deep-dive.md 深度分章节分析
    - transcript-extract.md 字幕/转写提取
    - article-rewrite.md 图文改写
    - video-to-tiktok-mv.md 视频 → TikTok MV
    - batch-process.md 批量处理
    - research-compile.md 跨源研究
    - export-notes.md 笔记导出
    - visual-analysis.md 画面分析
    - account-check.md 账户查询
    - library-browse.md 内容库浏览
    - channels-manage.md 频道管理
    - feed-latest.md 订阅动态
    - collections-manage.md 合集管理
    - notes-manage.md 笔记管理
    - advanced-tools.md 高级工具 (思维导图/自定义提示词)

目录结构分析

BibiGPT 的技能结构遵循经典的 意图路由 + 工作流分发 模式。核心入口 SKILL.md 不包含具体执行逻辑——它只是一个「路由表」，根据用户的自然语言意图匹配到 16 个独立工作流文件之一。这种设计的一个关键优势是：新增功能只需添加一个新的 workflow 文件并在路由表中注册，不影响现有工作流。

references/ 目录包含 6 个参考文档，覆盖了 CLI 命令、API 端点、安装配置、平台支持、以及独特的支付宝 AI 收计费协议。scripts/ 目录只有一个环境检测脚本，负责判断当前环境是否安装了 bibi CLI 或配置了 API Token——这是整个 skill 的「模式开关」。

SKILL.md 结构解析

SKILL.md 采用了非常清晰的四层结构：

环境检测层：首先运行 bibi-check.sh 确定可用模式（CLI 还是 API），这是后续所有工作流的前提
意图路由表：一张 17 行的表格，将用户的各种自然语言意图精确映射到对应的工作流文件。路由表的设计遵循「宁可问清楚也不错判」的原则——如果意图模糊，要求 Agent 向用户追问
消歧规则：明确定义了「匹配多个」→ 追问、「匹配零个」→ 追问、「只发了链接没说话」→ 默认快速总结的处理逻辑
引用索引：用表格形式列出所有参考文档及其内容概要，方便 Agent 按需跳转

Directory Structure Analysis

BibiGPT’s skill structure follows the classic Intent Router + Workflow Dispatch pattern. The core entry point SKILL.md contains no execution logic — it’s purely a “routing table” that maps natural language user intents to one of 16 independent workflow files. A key advantage: adding a new feature only requires a new workflow file and a routing table entry, without touching existing workflows.

The references/ directory holds 6 reference documents covering CLI commands, API endpoints, installation & auth, platform support, and the unique Alipay AI收 billing protocol. The scripts/ directory has a single environment detection script that determines whether the bibi CLI is installed or an API token is configured — this is the “mode switch” for the entire skill.

SKILL.md Structure Analysis

SKILL.md uses a very clean four-layer structure:

Environment Detection Layer: Runs bibi-check.sh first to determine available mode (CLI or API) — the prerequisite for all subsequent workflows
Intent Routing Table: A 17-row table that precisely maps various natural language intents to corresponding workflow files. The routing design follows the “ask before guessing” principle — if intent is ambiguous, the Agent must ask the user
Disambiguation Rules: Clearly defines the logic for “multiple matches” → ask, “zero matches” → ask, “just a URL with no context” → default to quick summary
Reference Index: A table listing all reference docs with content summaries, making it easy for the Agent to navigate on demand

SKILL.md — YAML Frontmatter ↗ 源文件

1 --- 2 name: bibi 3 description: > 4 AI video & audio summarizer + repackager. Summarize YouTube, Bilibili, 5 podcasts, TikTok, Twitter/X, Xiaohongshu, and any online video or audio, 6 then optionally turn the takeaway into a TikTok-style vertical music video. 7 Triggers: "summarize this video", "总结这个视频", "video summary", 8 "podcast notes", "YouTube summary", "B站总结", "get transcript", 9 "video to TikTok MV", "把视频变成 TikTok"... 10 ---

代码解读

L1 name 字段使用简短小写标识符 "bibi"，遵循 skill 命名规范（与 plugin.json 中的 name 保持一致） L3 description 是触发匹配的核心——不仅描述功能，还包含了大量中英文触发短语，大幅提高 Agent 的意图匹配准确率 L8 触发短语覆盖了中英文、口语化和专业术语多种表达，"把视频变成 TikTok" 这种口语化触发能捕获非技术用户的请求

模块关系

BibiGPT 的模块依赖非常清晰，是一个典型的星型结构：SKILL.md（意图路由器）位于中心，向外连接到 16 个工作流和 5 个参考文档。所有工作流的第一步都是执行环境检测脚本，确保后续操作在正确的模式（CLI/API）下运行。

Module Relationships

BibiGPT’s module dependencies form a clear star topology: SKILL.md (the intent router) sits at the center, radiating outward to 16 workflows and 5 reference documents. Every workflow’s first step is executing the environment detection script, ensuring subsequent operations run in the correct mode (CLI/API).

BibiGPT 模块关系图

graph TD
  SKILL["SKILL.md
意图路由器"]
  CHECK["scripts/bibi-check.sh
环境检测"]
  CLI["references/cli.md
CLI 命令参考"]
  API["references/api.md
OpenAPI 端点参考"]
  INSTALL["references/installation.md
安装认证指南"]
  PLATFORM["references/supported-platforms.md
平台支持"]
  BILLING["references/billing-aipay.md
支付宝AI收协议"]

  QS["quick-summary
快速总结"]
  DD["deep-dive
深度分析"]
  TE["transcript-extract
字幕提取"]
  AR["article-rewrite
图文改写"]
  MV["video-to-tiktok-mv
TikTok MV"]
  BP["batch-process
批量处理"]
  RC["research-compile
跨源研究"]
  EN["export-notes
笔记导出"]
  VA["visual-analysis
画面分析"]
  AC["account-check
账户查询"]
  LB["library-browse
内容库"]
  CM["channels-manage
频道管理"]
  FL["feed-latest
订阅动态"]
  CL["collections-manage
合集管理"]
  NM["notes-manage
笔记管理"]
  AT["advanced-tools
高级工具"]

  SKILL --> CHECK
  SKILL --> QS
  SKILL --> DD
  SKILL --> TE
  SKILL --> AR
  SKILL --> MV
  SKILL --> BP
  SKILL --> RC
  SKILL --> EN
  SKILL --> VA
  SKILL --> AC
  SKILL --> LB
  SKILL --> CM
  SKILL --> FL
  SKILL --> CL
  SKILL --> NM
  SKILL --> AT

  CHECK --> CLI
  CHECK --> API
  QS --> CLI
  QS --> API
  QS --> PLATFORM
  DD --> CLI
  DD --> API
  TE --> CLI
  TE --> API
  AR --> TE
  MV --> DD
  MV --> TE
  BP --> QS
  RC --> QS
  EN --> QS
  EN --> INSTALL
  SKILL --> BILLING

  style SKILL fill:#e1f5fe,stroke:#0288d1
  style CHECK fill:#fff3e0,stroke:#f57c00
  style BILLING fill:#fce4ec,stroke:#c62828

脚本全量清单

脚本	语言	行数	复杂度	功能
`bibi-check.sh`	Bash	~30	⭐⭐	环境检测：自动判断 CLI/API 模式可用性，并针对不同操作系统给出对应安装指引

Script	Language	Lines	Complexity	Purpose
`bibi-check.sh`	Bash	~30	⭐⭐	Environment detection: auto-detect CLI/API mode availability with OS-specific install guidance

bibi-check.sh — 环境模式检测器

bibi-check.sh 是整个 skill 的「模式开关」。它检测 bibi CLI 是否安装以及 BIBI_API_TOKEN 环境变量是否设置，从而决定后续所有工作流使用哪种调用方式。如果两者都不可用，脚本会根据操作系统给出对应的安装命令。

bibi-check.sh is the “mode switch” for the entire skill. It detects whether the bibi CLI is installed and whether the BIBI_API_TOKEN env var is set, determining which invocation mode all subsequent workflows will use. If neither is available, it provides OS-specific installation commands.

bibi-check.sh — 环境模式检测器 ↗ 源文件

1 #!/usr/bin/env bash 2 # Detect available BibiGPT mode: CLI or OpenAPI 3 4 if command -v bibi &>/dev/null; then 5 echo "✓ bibi CLI found: $(bibi --version 2>/dev/null || echo 'unknown version')" 6 echo "Mode: CLI — use 'bibi summarize <URL>' commands." 7 elif [ -n "$BIBI_API_TOKEN" ]; then 8 echo "✓ BIBI_API_TOKEN is set (CLI not installed)." 9 echo "Mode: OpenAPI — call https://api.bibigpt.co/api/v1/ endpoints with curl." 10 echo "" 11 echo "Quick test:" 12 echo ' curl -sf "https://api.bibigpt.co/api/version" -H "Authorization: Bearer $BIBI_API_TOKEN"' 13 else 14 echo "bibi CLI not found and BIBI_API_TOKEN is not set." 15 echo "" 16 if [[ "$OSTYPE" == "darwin"* ]]; then 17 echo "Option 1 (CLI): brew install --cask jimmylv/bibigpt/bibigpt" 18 elif [[ "$OSTYPE" == "msys"* || "$OSTYPE" == "cygwin"* ]]; then 19 echo "Option 1 (CLI): winget install BibiGPT --source winget" 20 elif [[ "$OSTYPE" == "linux"* ]]; then 21 echo "Option 1 (CLI): curl -fsSL https://bibigpt.co/install.sh | bash" 22 fi 23 exit 1 24 fi

代码解读

L2 注释即文档——一句话说明脚本职责，让 Agent 快速理解该脚本的用途 L4 使用 command -v 而非 which 检测命令——这是 POSIX 兼容的最佳实践，在大多数 Shell 环境中都能正常工作 L5 即使 CLI 存在但 --version 失败也不阻塞——用 || echo fallback 保证脚本的鲁棒性，不会因为版本号获取失败而整体崩溃 L7 优先 CLI 模式——CLI 支持本地文件直接处理，比 API 模式功能更强，所以检测顺序是 CLI > API Token > 无 L8 清晰的模式提示——不仅告诉 Agent 当前模式，还给出该模式下的典型调用方式，减少 Agent 查阅参考文档的频率 L11 提供快速测试命令——让 Agent 可以立即验证 API 连接是否正常，减少调试时间 L16 基于 OSTYPE 的精确匹配——针对 macOS (darwin)、Windows (msys/cygwin)、Linux 三种环境给出不同的安装命令 L23 兜底策略——如果 OSTYPE 不匹配任何已知类型，仍然给出通用指引 URL，不会让 Agent 无计可施

设计亮点

意图路由表模式：SKILL.md 作为纯路由层，不包含执行逻辑——用一张 17 行的表格完成从自然语言到工作流文件的精确映射。新增功能只需添加工作流文件 + 表行，符合开闭原则。
三层接入架构：同一套能力通过 CLI（本地处理本地文件）、OpenAPI（HTTP 直连，适合容器/CI）、Remote MCP Server（零安装，适合所有 MCP 客户端）三种方式暴露。每种方式共享相同的后端能力，但适配不同的使用场景。
环境自适应检测：bibi-check.sh 的检测优先级（CLI → API Token → 无）反映了功能丰富度的递减顺序。CLI 支持本地文件而 API 不支持——所以优先检测 CLI。
Agent-native 创意工作流：article-rewrite.md 和 video-to-tiktok-mv.md 让 Agent 自身完成创意工作（改写文章、创作歌词），而不是调用外部 API。这利用了 LLM 最强的文本生成能力，同时保持后端 API 的职责边界清晰。
支付宝 AI 收 (HTTP 402) 协议集成：这是所有 Anthropic Skills 中唯一一个实现了按次付费计费协议的 skill。当用户没有订阅时，Agent 可以自主学习处理 HTTP 402 响应，触发支付宝付款流程——这是一种面向 Agent Economy 的前瞻性设计。
渐进式帮助发现：CLI 使用 bibi --help → bibi summarize --help → 执行的渐进式帮助设计，让 Agent 可以在运行时动态发现可用命令，而不需要把所有文档都塞进上下文中。
契约文档 (CLAUDE.md)：项目根目录的 CLAUDE.md 明确区分了「Agent 需要知道的」（契约）和「团队内部状态」（不该给 Agent 看的），用 Hard No / Hard Yes 的方式划定边界——这是一种可复用的文档治理模式。

可复用模式

模式	适用场景	核心要点
意图路由表	功能丰富的 skill，有多个独立工作流	SKILL.md 只负责匹配和分发，不包含执行逻辑
环境自适应检测	需要在不同环境下以不同方式工作的 skill	用脚本检测环境，按优先级降级（富功能 → 基本功能）
Agent-native 生成	需要创意产出的 skill（文章/歌词/设计）	让 Agent 用 LLM 能力完成创意工作，API 只提供原料数据
三层接入架构	需要在多种平台上使用的 skill	CLI + HTTP API + MCP Server，共享后端、适配不同入口

Pattern	Use Case	Key Point
Intent Routing Table	Feature-rich skills with multiple independent workflows	SKILL.md only routes and dispatches, contains no execution logic
Environment Auto-Detection	Skills that need to work differently across environments	Script detects env, degrades by priority (rich → basic functionality)
Agent-Native Generation	Skills requiring creative output (articles/lyrics/design)	Agent uses LLM for creative work, API only provides raw data
Three-Tier Access Architecture	Skills that need to work across multiple platforms	CLI + HTTP API + MCP Server, shared backend, different entry points

移植思路

如果你想做一个类似「AI 服务 + Agent Skill」的项目，核心思路是：

先做后端服务（REST API / MCP Server），让服务本身可以独立调用

再做 Skill 包装层：SKILL.md 定义意图路由，workflows/ 写具体操作步骤，references/ 写 API/CLI 参考

最后做环境适配：保证在 CLI 不可用时有 API 兜底，在完全没有配置时有清晰的安装指引

BibiGPT 的三层架构（CLI → API → MCP）也可以简化：如果你的服务只通过 API 暴露，可以直接跳过 CLI 层，做一个「API + MCP」两层架构的 skill。

常见坑

⚠️ URL 编码陷阱：OpenAPI 模式要求 URL 必须百分号编码后作为查询参数传递，否则特殊字符（?、&）会被 shell 或 HTTP 解析器错误处理。所有 workflow 中的 API 调用都要先用 python3 -c 'import urllib.parse...' 编码 URL。
⚠️ 本地文件 vs API 的不对称性：CLI 模式支持直接传本地文件路径（bibi summarize "/path/to/file.mp4"），但 API 模式不支持文件上传。如果用户在 API 模式下提供本地文件，需要引导他们先上传到公网 URL。
⚠️ CLI 输出的 stdout/stderr 分离：bibi summarize 将 Markdown 摘要输出到 stdout、进度信息输出到 stderr。Agent 管道操作（> summary.md、| jq）只捕获 stdout，不会混入进度信息。这是有意为之的管道友好设计。
⚠️ manifest 热缓存延迟：CLI 的 bibi --help 使用的是本地缓存的命令清单（24h TTL），新发布的命令可能不会立即出现。需要用 bibi commands 强制刷新缓存。

Design Highlights

Intent Routing Table Pattern: SKILL.md is a pure routing layer with no execution logic — a 17-row table maps natural language to workflow files precisely. Adding features only requires a new workflow file + table row, following the open/closed principle.
Three-Tier Access Architecture: The same capabilities exposed through CLI (local file processing), OpenAPI (HTTP for containers/CI), and Remote MCP Server (zero-install for all MCP clients). Each shares the same backend but adapts to different usage scenarios.
Environment Auto-Detection: bibi-check.sh’s detection priority (CLI → API Token → None) reflects descending functionality richness. CLI supports local files while API doesn’t — hence CLI is checked first.
Agent-Native Creative Workflows: article-rewrite.md and video-to-tiktok-mv.md have the Agent itself perform creative work (article rewriting, lyric composition) rather than calling external APIs. This leverages the LLM’s strongest capability (text generation) while keeping the backend API’s responsibility boundaries clean.
Alipay AI收 (HTTP 402) Protocol Integration: This is the only Anthropic Skill that implements a per-call payment billing protocol. When users have no subscription, the Agent can autonomously learn to handle HTTP 402 responses and trigger Alipay payment flows — a forward-looking design for the Agent Economy.
Progressive Help Discovery: The CLI uses bibi --help → bibi summarize --help → execute progressive help design, enabling Agents to dynamically discover available commands at runtime without loading all documentation into context.
Contract Documentation (CLAUDE.md): The project root’s CLAUDE.md clearly distinguishes “what the Agent needs to know” (contract) from “internal team state” (what the Agent shouldn’t see), using Hard No / Hard Yes boundaries — a reusable documentation governance pattern.

Reusable Patterns

模式	适用场景	核心要点
意图路由表	功能丰富的 skill，有多个独立工作流	SKILL.md 只负责匹配和分发，不包含执行逻辑
环境自适应检测	需要在不同环境下以不同方式工作的 skill	用脚本检测环境，按优先级降级（富功能 → 基本功能）
Agent-native 生成	需要创意产出的 skill（文章/歌词/设计）	让 Agent 用 LLM 能力完成创意工作，API 只提供原料数据
三层接入架构	需要在多种平台上使用的 skill	CLI + HTTP API + MCP Server，共享后端、适配不同入口

Pattern	Use Case	Key Point
Intent Routing Table	Feature-rich skills with multiple independent workflows	SKILL.md only routes and dispatches, contains no execution logic
Environment Auto-Detection	Skills that need to work differently across environments	Script detects env, degrades by priority (rich → basic functionality)
Agent-Native Generation	Skills requiring creative output (articles/lyrics/design)	Agent uses LLM for creative work, API only provides raw data
Three-Tier Access Architecture	Skills that need to work across multiple platforms	CLI + HTTP API + MCP Server, shared backend, different entry points

Porting Guide

To build a similar “AI Service + Agent Skill” project, the core approach is:

Build the backend service first (REST API / MCP Server) so it can be called independently

Then build the Skill wrapper: SKILL.md defines intent routing, workflows/ contains step-by-step operations, references/ provides API/CLI references

Finally, environment adaptation: ensure API fallback when CLI is unavailable, and clear installation guidance when nothing is configured

BibiGPT’s three-tier architecture (CLI → API → MCP) can also be simplified: if your service is only exposed via API, skip the CLI layer and build a two-tier “API + MCP” skill.

Common Pitfalls

⚠️ URL Encoding Trap: OpenAPI mode requires URLs to be percent-encoded before passing as query params, otherwise special characters (?, &) are mishandled by shell or HTTP parsers. Every API call in workflows must first encode the URL with python3 -c 'import urllib.parse...'.
⚠️ Local File vs API Asymmetry: CLI mode supports direct local file paths (bibi summarize "/path/to/file.mp4"), but API mode doesn’t support file uploads. If users provide local files in API mode, guide them to first upload to a public URL.
⚠️ CLI stdout/stderr Separation: bibi summarize outputs Markdown summaries to stdout and progress messages to stderr. Agent pipe operations (> summary.md, | jq) only capture stdout without mixing in progress info — an intentional pipe-friendly design.
⚠️ Manifest Hot-Cache Delay: bibi --help uses locally cached command listings (24h TTL), so newly released commands may not appear immediately. Use bibi commands to force-refresh the cache.