youtube-anycaption-summarizer

# YouTube AnyCaption Summarizer **The YouTube summarizer that still works when captions are broken, missing, or inconsistent.** Outputs: raw markdown transcript + polished markdown summary + session-ready result block. Unlike caption-only tools, this skill still works when subtitles are missing by falling back to local Whisper transcription. Generate a raw transcript markdown file and a polished summary markdown file from one or more YouTube videos. This skill is self-contained. It does not require any other YouTube summarizer skill or prior workflow context. ## Best for - founder videos, operator walkthroughs, and technical explainers - long tutorial videos that need transcript + implementation summary - private/internal YouTube uploads that may require cookies - mixed-caption environments where some videos have CC, some only have auto-captions, and some have no usable subtitles - batch research workflows where many YouTube links need standardized markdown outputs - users who want reliable markdown artifacts, not just a one-off chat summary ## Why choose this over simpler transcript skills? - manual CC first, auto-captions second, local Whisper fallback last - keeps working when subtitle coverage is weak or missing - supports private/restricted YouTube videos via cookies - returns durable markdown artifacts, not just chat text - supports batch processing and session-ready completion reporting ## Install dependencies For a fresh macOS setup, new users should be able to copy-paste the following exactly: ```bash brew install yt-dlp ffmpeg whisper-cpp MODELS_DIR="$HOME/.openclaw/workspace" MODEL_PATH="$MODELS_DIR/ggml-medium.bin" mkdir -p "$MODELS_DIR" if [ ! -f "$MODEL_PATH" ]; then curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin \ -o "$MODEL_PATH.part" && mv "$MODEL_PATH.part" "$MODEL_PATH" else echo "Model already exists at $MODEL_PATH — leaving it unchanged." fi command -v python3 yt-dlp ffmpeg whisper-cli ls -lh "$MODEL_PATH" ``` What this does: - installs `yt-dlp`, `ffmpeg`, and `whisper-cli` - creates the default models directory used by this skill if it does not already exist: `~/.openclaw/workspace` - downloads the default Whisper model file only if it is missing - avoids touching `~/.openclaw/openclaw.json` or any other OpenClaw config file - does not delete, replace, or overwrite other files in your existing workspace folder - verifies that the required binaries and model file are present If you want to store models elsewhere, pass `--models-dir /path/to/models` when running the workflow. ## Example requests - “Summarize this YouTube video into markdown.” - “Generate a transcript and polished summary for this YouTube link.” - “Process this private YouTube video with my browser cookies.” - “Batch summarize these YouTube links and give me transcript + summary files.” - “Use subtitles when available, otherwise transcribe locally.” - “Create a Chinese summary from this English YouTube video.” ## Quick start ### Single video ```bash python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" ``` This creates a dedicated per-video folder, writes the raw transcript markdown, creates the summary placeholder markdown, and prints JSON describing the outputs plus the exact follow-up commands/prompts needed to finish the summary step. Important: the workflow script alone is not the finished deliverable. The current OpenClaw session must still: 1. infer/backfill the language if the workflow left it as `unknown` 2. overwrite the placeholder `Summary.md` with a real polished summary 3. run `scripts/complete_youtube_summary.py` to validate/finalize the result ### Force simplified Chinese summary ```bash python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \ --summary-language zh-CN ``` ### Restricted video with cookies ```bash python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \ --cookies /path/to/cookies.txt ``` or ```bash python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \ --cookies-from-browser chrome ``` ### Batch / queue mode See `references/batch-input-format.md`. ```bash python3 scripts/run_youtube_workflow.py --batch-file ./youtube-urls.txt ``` ## Why this skill stands out This skill is designed to keep working across the messy reality of YouTube: - if a video has **manual closed captions (CC)**, use them first - if it only has **auto-generated subtitles**, use those next - if it has **no usable subtitles at all**, fall back to **local Whisper transcription** That makes it materially more reliable than caption-only workflows. It works well for caption-rich videos, caption-poor videos, and private/internal uploads where subtitle coverage is inconsistent. Core capabilities: - fetch YouTube metadata first and derive safe output paths - support single-video mode and batch / queue mode - handle manual CC, auto-generated subtitles, or no subtitles via subtitle-first extraction with local Whisper fallback - support restricted/private videos via cookies or browser-cookie extraction - normalize noisy transcript text before summarization - create a placeholder summary file, overwrite it with the final summary, and finalize end-to-end timing - clean up only known intermediates created by the workflow unless explicitly told otherwise ## What this skill produces For each video, create exactly one dedicated output folder containing these final deliverables: - `SANITIZED_VIDEO_NAME_transcript_raw.md` - `SANITIZED_VIDEO_NAME_Summary.md` By default, delete only the known intermediate media, subtitle, and WAV files created by the workflow. Do not wipe unrelated files that may already exist in the per-video folder. ## Required local tools Verify these tools exist before running the workflow: - `yt-dlp` - `ffmpeg` - `whisper-cli` - `python3` The workflow also requires a supported Whisper ggml model file in the configured models directory. ## Bundled scripts Use these scripts directly: - `scripts/run_youtube_workflow.py` — main deterministic workflow for metadata, download/subtitles, transcription, placeholder summary creation, cleanup, and workflow metadata emission - `scripts/backfill_detected_language.py` — update `transcript_raw.md`, `Summary.md`, and workflow metadata after the current session LLM decides the major transcript language - `scripts/complete_youtube_summary.py` — validate that `Summary.md` is no longer a placeholder, optionally backfill language, compute the final end-to-end timing report for one item, and emit a session-ready result block - `scripts/normalize_transcript_text.py` — convert raw timestamped transcript text into cleaner summary input without modifying the raw transcript file - `scripts/finalize_youtube_summary.py` — lower-level timing helper used by the completion flow - `scripts/prepare_video_paths.py` — derive sanitized folder and output file paths from a title and video ID Useful references: - `references/detailed-workflow.md` — full operational workflow, completion rules, batch guidance, naming rules, and practical notes - `references/summary-template.md` — required structure and writing rules for the final `Summary.md` - `references/session-output-template.md` — required user-facing output format to return to the current OpenClaw session after completion - `references/batch-input-format.md` — input format for queue / batch processing ## Defaults - Default parent output folder: `~/Downloads` - Default whisper model: `ggml-medium` - Supported whisper models: `ggml-base`, `ggml-small`, `ggml-medium` - Default media mode: audio-only - Default transcript language: auto-detect if transcription is needed - Default summary language: `source` - Raw transcript keeps timestamps ## Public workflow overview At a high level, the skill does this: 1. fetch metadata first and create safe output paths 2. try manual subtitles, then auto-captions, then local Whisper fallback 3. write `SANITIZED_VIDEO_NAME_transcript_raw.md` 4. create `SANITIZED_VIDEO_NAME_Summary.md` as a placeholder 5. have the current OpenClaw session overwrite the placeholder with a real summary 6. run `scripts/complete_youtube_summary.py` to validate completion, backfill language if needed, and emit a session-ready result block ## What counts as completion For a normal end-to-end request, completion means all of the following are true: 1. the workflow script succeeded 2. if language was initially `unknown`, the language was backfilled into both markdown files 3. the placeholder summary file was overwritten with a real summary 4. `scripts/complete_youtube_summary.py` was run successfully 5. the user received the resulting output paths and timing/result status If the workflow script succeeded but the summary/completion step did not happen yet, describe the state as partial/in-progress rather than complete. ## When to read the deeper references Read these as needed: - `references/detailed-workflow.md` when you need the full implementation contract, batch guidance, naming rules, cleanup rules, timing flow, or debugging details - `references/summary-template.md` before writing the final polished `Summary.md` - `references/session-output-template.md` before returning the final user-facing per-video result block - `references/batch-input-format.md` when handling `--batch-file` ## Practical public promise This skill is optimized for dependable end-to-end output, not just quick transcript extraction: - raw transcript markdown - polished summary markdown - session-ready completion report

youtube-anycaption-summarizer

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

youtube-anycaption-summarizer