faster-whisper-gpu

# 🎙️ Faster Whisper GPU High-performance local speech-to-text transcription using [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) with NVIDIA GPU acceleration. ## ✨ Features - **🚀 GPU Accelerated**: Uses NVIDIA CUDA for blazing-fast transcription - **🔒 100% Local**: No data leaves your machine. Complete privacy. - **💰 Free Forever**: No API costs. Run unlimited transcriptions. - **🌍 Multilingual**: Supports 99 languages with automatic detection - **📁 Multiple Formats**: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON - **🎯 Multiple Models**: From tiny (fast) to large-v3 (most accurate) - **🎬 Subtitle Generation**: Create SRT files with word-level timestamps ## 📋 Requirements ### Hardware - **NVIDIA GPU** with CUDA support (recommended: 4GB+ VRAM) - Or CPU-only mode (slower but works on any machine) ### Software - Python 3.8+ - NVIDIA drivers (for GPU support) - CUDA Toolkit 11.8+ or 12.x ## 🚀 Quick Start ### Installation ```bash # Install dependencies pip install faster-whisper torch # Verify GPU is available python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')" ``` ### Basic Usage ```bash # Transcribe an audio file (auto-detects GPU) python transcribe.py audio.mp3 # Specify language explicitly python transcribe.py audio.mp3 --language pt # Output as SRT subtitles python transcribe.py audio.mp3 --format srt --output subtitles.srt # Use larger model for better accuracy python transcribe.py audio.mp3 --model large-v3 ``` ## 🔧 Advanced Usage ### Command Line Options ```bash python transcribe.py <audio_file> [options] Options: --model {tiny,base,small,medium,large-v1,large-v2,large-v3} Model size to use (default: base) --language LANG Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified. --format {txt,srt,json,vtt} Output format (default: txt) --output FILE Output file path (default: stdout) --device {cuda,cpu} Device to use (default: cuda if available) --compute_type {int8,int8_float16,int16,float16,float32} Computation precision (default: float16) --task {transcribe,translate} Task: transcribe or translate to English (default: transcribe) --vad_filter Enable voice activity detection filter --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF VAD parameters as comma-separated values --condition_on_previous_text Condition on previous text (default: True) --initial_prompt PROMPT Initial prompt to guide transcription --word_timestamps Include word-level timestamps (for SRT/JSON) --hotwords WORDS Comma-separated hotwords to boost recognition ``` ### Examples #### Portuguese Transcription with SRT Output ```bash python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt ``` #### English Translation from Any Language ```bash python transcribe.py japanese_audio.mp3 --task translate --format txt ``` #### High-Accuracy Mode with Large Model ```bash python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps ``` #### CPU-Only Mode (no GPU) ```bash python transcribe.py audio.mp3 --device cpu --compute_type int8 ``` ## 🐍 Python API ```python from faster_whisper import WhisperModel # Load model model = WhisperModel("base", device="cuda", compute_type="float16") # Transcribe segments, info = model.transcribe("audio.mp3", language="pt") print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})") for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}") ``` ## 📊 Model Sizes & VRAM Requirements | Model | Parameters | VRAM Required | Relative Speed | Accuracy | |----------|------------|---------------|----------------|----------| | tiny | 39 M | ~1 GB | ~32x | Basic | | base | 74 M | ~1 GB | ~16x | Good | | small | 244 M | ~2 GB | ~6x | Better | | medium | 769 M | ~5 GB | ~2x | Great | | large-v3 | 1550 M | ~10 GB | 1x | Best | *Benchmarks measured on NVIDIA RTX 4090* ## 🔍 Supported Languages Faster Whisper supports 99 languages including: - **Portuguese** (`pt`) - **English** (`en`) - **Spanish** (`es`) - **French** (`fr`) - **German** (`de`) - **Italian** (`it`) - **Japanese** (`ja`) - **Chinese** (`zh`) - **Russian** (`ru`) - **And 90+ more...** ## 🛠️ Troubleshooting ### CUDA Out of Memory ```bash # Use smaller model python transcribe.py audio.mp3 --model tiny # Or use CPU python transcribe.py audio.mp3 --device cpu # Or reduce precision python transcribe.py audio.mp3 --compute_type int8 ``` ### Model Download Issues Models are automatically downloaded on first use to `~/.cache/huggingface/hub/`. If behind a proxy, set: ```bash export HF_HOME=/path/to/custom/cache ``` ### Slow Transcription - Ensure GPU is being used: check `nvidia-smi` during transcription - Use smaller model for faster results - Enable VAD filter to skip silent parts ## 🤝 Contributing Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Submit a pull request ## 📜 License MIT License - See [LICENSE](LICENSE) for details. Faster Whisper is developed by [SYSTRAN](https://github.com/SYSTRAN/faster-whisper) and based on OpenAI's Whisper. ## 🙏 Acknowledgments - [OpenAI Whisper](https://github.com/openai/whisper) - Original model - [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) - Optimized implementation - [CTranslate2](https://github.com/OpenNMT/CTranslate2) - Fast inference engine --- **Made with ❤️ for the OpenClaw community**

faster-whisper-gpu

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

faster-whisper-gpu