返回顶部
v

voice-clone-bot

Synthesize speech by cloning a user's voice from a reference audio sample, then reading generated text aloud in that cloned voice. Use this skill whenever the user sends a voice message and expects an audio reply, asks to "speak", "clone my voice", "read this aloud", "reply with audio", or any context where a spoken voice response is appropriate. Also use when the user wants to switch into "voice mode" for conversation. Even if the user doesn't explicitly say "voice clone", use this skill if the

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.1.0
安全检测
已通过
78
下载量
0
收藏
概述
安装方式
版本历史

voice-clone-bot

# Voice Clone Skill A self-initializing, zero-configuration voice cloning skill. It manages a background TTS daemon that keeps heavy model weights in memory for fast inference. Supports multiple engines and unlimited text length. ## Quick reference | Item | Value | | --- | --- | | Entry script | `bash scripts/run_tts.sh --text "..." --ref_audio "..." [--speed 1.0] [--output_dir "..."]` | | Output | Single line: absolute path to generated `.ogg` file | | Attachment format | `MEDIA:<output_path>` | | Default engine | F5-TTS (env `TTS_BACKEND=f5`) | | Host/Port config | `.env` (`TTS_SERVER_HOST`, `TTS_SERVER_PORT`) | ## When to use this skill - The user sends a voice memo or audio file and you need to reply with audio. - The user says "read this aloud", "speak to me", "use my voice", "voice mode". - The conversation context implies a spoken reply is expected. - The user provides a reference audio and asks you to mimic their voice. ## Step-by-step usage ### 1. Identify inputs You need two things: - **`ref_audio`**: The absolute local path to the user's reference audio file (the voice to clone). This is typically the audio file the user just sent, saved by the ASR system (e.g., openai-whisper). - **`text`**: The text content you want to speak. Generate this as you normally would — think of your reply, then voice it. ### 2. Run the synthesis Execute this command: ```bash bash scripts/run_tts.sh --text "Your reply text here." --ref_audio "/absolute/path/to/reference.ogg" ``` Optional parameters: - `--speed 1.2` — Speak faster. Range: 0.5 to 2.0. Default: 1.0. - `--output_dir "/tmp/"` — Save the generated audio file to a specific absolute folder path. Default: `server/generated_audio/`. **Example with all options:** ```bash bash scripts/run_tts.sh \ --text "很高兴认识你,这是我克隆后的声音。" \ --ref_audio "/tmp/user_voice_msg.ogg" \ --speed 0.9 ``` ### 3. Handle the output The script prints a single absolute path on stdout (e.g., `/path/to/reply_a1b2c3d4.ogg`). Append it to your response using the attachment format: ``` MEDIA:/path/to/reply_a1b2c3d4.ogg ``` ### 4. Important constraints - **Do NOT** manually start `python app.py` or manage the backend. The `run_tts.sh` script auto-detects, auto-installs, and auto-starts everything. - **First run is slow** (~30-60 seconds) because it downloads model weights and loads them into memory. Subsequent calls are fast. - **Long texts work automatically.** The engine splits text into sentences, synthesizes each chunk, and stitches them seamlessly. No length limit. ## Controlling voice characteristics ### Speed (all engines) The `--speed` parameter adjusts speaking rate: | Value | Effect | | --- | --- | | `0.7` | Slow, deliberate, suitable for elderly listeners | | `1.0` | Natural conversational speed (default) | | `1.3` | Brisk, suitable for news or briefings | | `1.5+` | Fast, compressed delivery | F5-TTS supports speed natively. Other engines use ffmpeg post-processing (atempo filter), which gives good results but may slightly affect quality at extreme values. ### Emotion and tone These models use **acoustic feature extraction** from the reference audio — they do not accept text-based emotion tags like `[happy]` or `[sad]`. **The emotion of the output is determined entirely by the reference audio.** To control emotion, select or prepare reference audio that carries the desired tone: | Desired tone | Reference audio strategy | | --- | --- | | Calm, neutral | Use a reference clip where the speaker talks normally | | Excited, happy | Use a reference clip where the speaker sounds enthusiastic | | Angry, intense | Use a reference clip with raised voice and sharp intonation | | Sad, melancholic | Use a reference clip with slow, downcast delivery | | Whispering | Use a reference clip where the speaker whispers | **Practical approach for Agents:** If the user has sent multiple voice messages, choose the one whose emotional tone best matches the context of your reply. If only one reference is available, use it as-is — the model will approximate the speaker's general style. **ChatTTS Specifics:** This engine supports inline emotion tags in text: `[laugh]`, `[uv_break]` (pause). It also supports voice cloning when a reference audio is provided. ## Available engines | Engine | ID | Install | Size | Clone | Speed support | Best for | | --- | --- | --- | --- | :---: | --- | --- | | **F5-TTS** | `f5` | `bash scripts/auto_installer.sh` | ~1.5GB | ✅ | Native | Highest quality cloning | | **CosyVoice** | `cosyvoice` | `bash scripts/install_cosyvoice.sh` | ~1.5GB | ✅ | ffmpeg | Natural Chinese prosody | | **ChatTTS** | `chattts` | `bash scripts/install_chattts.sh` | ~400MB | ✅ | ffmpeg | Dialogue with emotion tags | | **OpenVoice** | `openvoice` | `bash scripts/install_openvoice.sh` | ~300MB | ✅ | ffmpeg | Ultra fast, tiny footprint | Switch engines by setting the environment variable before the server starts: ```bash export TTS_BACKEND=cosyvoice ``` ## Uninstalling ```bash # Remove everything (venv, daemon, registration) bash scripts/uninstall.sh # Remove only one engine's source code bash scripts/uninstall.sh --engine cosyvoice # Remove everything INCLUDING downloaded model weights (several GB) bash scripts/uninstall.sh --purge ``` ## File structure ``` scripts/ ├── run_tts.sh # Main entry point (auto-heals, auto-starts daemon) ├── tts_client.py # HTTP client that talks to the backend ├── auto_installer.sh # Install F5-TTS (default) + register skill ├── install_cosyvoice.sh # Install CosyVoice engine ├── install_chattts.sh # Install ChatTTS engine ├── install_openvoice.sh # Install OpenVoice engine └── uninstall.sh # Cleanup script server/ ├── app.py # FastAPI daemon (auto-managed, do not start manually) ├── core_tts.py # Multi-engine factory + long text chunking └── requirements.txt # Base dependencies ``` ## References - Read `references/architecture.md` for system architecture and design rationale.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 voice-clone-bot-1775912342 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 voice-clone-bot-1775912342 技能

通过命令行安装

skillhub install voice-clone-bot-1775912342

下载 Zip 包

⬇ 下载 voice-clone-bot v1.1.0

文件大小: 26.53 KB | 发布时间: 2026-4-12 11:53

v1.1.0 最新 2026-4-12 11:53
Rename synchronization to voice-clone-bot, skill metadata alignment, docs/scripts naming sync, and skill-creator formatted release package (no TTS logic changes).

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部