返回顶部
w

whisper-stt

|

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
1,078
下载量
0
收藏
概述
安装方式
版本历史

whisper-stt

# Whisper STT Skill Free, local speech-to-text using OpenAI Whisper. ## Prerequisites Install dependencies (one-time setup): ```bash pip install openai-whisper torch ``` Optional: Install ffmpeg for broader format support: - macOS: `brew install ffmpeg` - Ubuntu: `sudo apt install ffmpeg` ## Usage ### Transcribe an audio file ```bash python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py <audio_file> ``` ### Options | Option | Description | |--------|-------------| | `--model` | Model size: tiny, base, small, medium, large, large-v3-turbo (default: base) | | `--language, -l` | Language code: zh, en, ja, etc. (auto-detect if not specified) | | `--output, -o` | Output format: json, txt, srt, vtt (default: json) | ### Examples **Chinese audio to text:** ```bash python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py recording.m4a --language zh --output txt ``` **Generate subtitles (SRT):** ```bash python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py video.mp4 --output srt > subtitles.srt ``` **Use faster model:** ```bash python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model tiny --output txt ``` **High accuracy (slower):** ```bash python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model large-v3 --output txt ``` ## Model Selection Guide | Model | Speed | Accuracy | VRAM/RAM | Best For | |-------|-------|----------|----------|----------| | tiny | ~32x | Basic | ~1GB | Quick tests, low resource | | base | ~16x | Good | ~1GB | Balanced speed/accuracy | | small | ~6x | Better | ~2GB | Better accuracy | | medium | ~2x | Very Good | ~5GB | High accuracy | | large | 1x | Excellent | ~10GB | Best quality | | large-v3-turbo | ~8x | Excellent | ~6GB | Fast + accurate (recommended) | ## Troubleshooting **"ModuleNotFoundError: No module named 'whisper'"** → Run: `pip install openai-whisper torch` **"ffmpeg not found"** → Install ffmpeg or convert audio to WAV format first **Slow transcription** → Use smaller model (tiny/base) or ensure GPU is available (Apple Silicon MPS, NVIDIA CUDA) **Poor accuracy on Chinese** → Use `--language zh` explicitly and consider larger model (medium/large) ## Output Formats - **json**: Full result with segments, timestamps, and metadata - **txt**: Plain text transcription only - **srt**: SubRip subtitle format with timing - **vtt**: WebVTT subtitle format for web players ## Credits Powered by [OpenAI Whisper](https://github.com/openai/whisper) - open source speech recognition.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 whisper-stt-1776293880 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 whisper-stt-1776293880 技能

通过命令行安装

skillhub install whisper-stt-1776293880

下载 Zip 包

⬇ 下载 whisper-stt v1.0.0

文件大小: 4.29 KB | 发布时间: 2026-4-16 17:57

v1.0.0 最新 2026-4-16 17:57
- Initial release of the whisper-stt skill for free, local speech-to-text transcription using OpenAI Whisper.
- Supports a range of audio/video input formats (mp3, wav, m4a, ogg, etc.) without API costs.
- Multiple output formats available: json, txt, srt, and vtt (for subtitles).
- Configurable model sizes for performance vs. accuracy tradeoffs.
- Option to specify target language and leverage GPU acceleration if available.
- Comprehensive usage instructions and troubleshooting included.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部