返回顶部
d

deapi-audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read aloud', 'voice clone', 'clone voice', 'voice design', 'design voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deapi-video instead.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.1
安全检测
已通过
118
下载量
1
收藏
概述
安装方式
版本历史

deapi-audio

# deAPI Audio Text-to-speech, voice cloning, voice design, and audio transcription via deAPI decentralized GPU network. ## Scripts | Script | Use when... | |--------|-------------| | `scripts/text-to-speech.sh` | User wants to convert text to spoken audio | | `scripts/voice-clone.sh` | User wants to clone/replicate a voice from a sample audio file | | `scripts/voice-design.sh` | User wants to generate speech with a voice described in natural language | | `scripts/speech-to-text.sh` | User wants to transcribe an audio file (AAC, MP3, OGG, WAV, WebM, FLAC, max 10MB) | ## Your config ! cat ${CLAUDE_SKILL_DIR}/config.json 2>/dev/null || echo "NOT_CONFIGURED" If the config above is NOT_CONFIGURED, ask the user: - What is your deAPI API key? (get one at https://deapi.ai, free $5 credit) Then write the answer to ${CLAUDE_SKILL_DIR}/config.json as `{ "api_key": "their_key" }`. Alternatively, the user can set the `DEAPI_API_KEY` environment variable directly, which takes priority over config.json. ## Gotchas - For YouTube/video transcription, use the `deapi-video` skill instead. This skill handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg). - Three TTS models: `Kokoro` (default), `Chatterbox`, `Qwen3`. Use `--model Chatterbox` or `--model Qwen3` to switch. - Kokoro: Voice ID format is `{lang}{gender}_{name}`. Language is auto-detected from voice prefix if `--lang` is omitted. - Chatterbox: voice is always `default`, speed is fixed at `1`, supports 22 languages. Text limit 10-2000 chars. - Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and generate separately. - TTS output format defaults to mp3. WAV files are much larger but lossless. - Kokoro: `speed` range is 0.5-2.0. Values outside this range cause errors. - Qwen3 Voice Clone (`voice-clone.sh`): ref audio must be 5-15 seconds. Too short or too long degrades quality. Formats: MP3, WAV, FLAC, OGG, M4A. URLs are downloaded automatically. - Qwen3 Voice Design (`voice-design.sh`): quality depends on the `--instruct` description. Encourage specific details: gender, age, accent, speaking style, emotion. - Qwen3 models use full language names (`English`, `French`, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese. - Qwen3 TTS (`--model Qwen3`): 9 voices available, default `Vivian`. Chinese language lacks `Ryan` voice. - Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice Design use voice=`default`. - Audio transcription accepts a local file path or URL (`--audio`). Formats: AAC, MP3, OGG, WAV, WebM, FLAC. Max 10 MB. - Result URLs expire in 24 hours. Download promptly. ## Quick examples ```bash # Basic TTS bash scripts/text-to-speech.sh --text "Hello world" # British voice bash scripts/text-to-speech.sh --text "Good morning" --voice bf_emma # Chatterbox model (multilingual) bash scripts/text-to-speech.sh --model Chatterbox --text "Bonjour le monde" --lang fr # Qwen3 model bash scripts/text-to-speech.sh --model Qwen3 --text "Hello world" --voice Serena --lang English # Clone a voice from a sample bash scripts/voice-clone.sh --text "Hello, this is my cloned voice" --ref-audio /path/to/sample.mp3 # Clone with reference transcript for better accuracy bash scripts/voice-clone.sh --text "Welcome to the show" --ref-audio /path/to/sample.wav --ref-text "This is the original transcript" # Design a custom voice from description bash scripts/voice-design.sh --text "Good morning everyone" --instruct "A warm, deep male voice with a slight British accent" # Voice design in another language bash scripts/voice-design.sh --text "Bonjour tout le monde" --instruct "A cheerful young female voice" --lang French # Transcribe audio file (local or URL) bash scripts/speech-to-text.sh --audio /path/to/recording.mp3 bash scripts/speech-to-text.sh --audio "https://example.com/podcast.mp3" ``` For the full voice list and language codes, see [references/voices.md](references/voices.md).

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 deapi-audio-1776056855 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 deapi-audio-1776056855 技能

通过命令行安装

skillhub install deapi-audio-1776056855

下载 Zip 包

⬇ 下载 deapi-audio v1.0.1

文件大小: 12.5 KB | 发布时间: 2026-4-14 10:55

v1.0.1 最新 2026-4-14 10:55
No file or functionality changes detected in this release.

- Version updated to 1.0.1 with no modifications to files or documentation.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部