subtitle-generator-ai

# Turn Spoken Words Into Perfectly Timed Captions Most subtitle tools hand you a raw transcript and leave the cleanup to you. subtitle-generator-ai works differently — it listens to your video's audio track, maps every word to its precise timestamp, and delivers captions that actually match what's being said, even across varying speech speeds, background noise, or multiple speakers. What sets this skill apart is its conversational editing layer. Once your subtitles are generated, you don't need to dig through a timeline or edit a raw .srt file manually. You can simply describe what needs fixing — 'the speaker's name is spelled wrong in the intro' or 'shorten the third caption, it feels too long' — and the skill will apply those changes in context, preserving sync throughout. Behind the scenes, the OpenClaw agent orchestrates the transcription pipeline, coordinates timing alignment, and manages iterative edits based on your natural language instructions. Whether you're captioning a short social clip or a full-length lecture, the agent tracks the full subtitle session so every revision builds on the last — no re-uploading, no starting over. ### Environment Variables | Variable | Required | Default | |----------|----------|--------| | `NEMO_TOKEN` | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) | | `NEMO_API_URL` | No | `https://mega-api-prod.nemovideo.ai` | | `NEMO_WEB_URL` | No | `https://nemovideo.com` | | `NEMO_CLIENT_ID` | No | Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets) | | `SKILL_SOURCE` | No | Auto-detected from install path, fallback `unknown` | If `NEMO_TOKEN` is not set, get one (requires `X-Client-Id` header): ```bash # Generate or read persisted Client-Id CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}" if [ -z "$CLIENT_ID" ]; then CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM") mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id fi curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID" # → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}} ``` Save `token` as `NEMO_TOKEN`, `CLIENT_ID` as `NEMO_CLIENT_ID`. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via **Settings → API Tokens** on nemovideo.com. If your token expires, request a new one with the same `X-Client-Id`. **Local persistence:** This skill writes `~/.config/nemovideo/client_id` to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally. ## 2. Routing Incoming Requests to the Correct Endpoint Each user action maps to a specific API endpoint as outlined in the table below. | User says... | Action | Skip SSE? | |-------------|--------|----------| | "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ | | "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ | | "status" / "状态" / "show tracks" | → §3.4 State | ✅ | | "upload" / "上传" / user sends file | → §3.2 Upload | ✅ | | Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ | ## 3. Primary Workflow Sequences $API = `${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}`, $TOKEN = `${NEMO_TOKEN}`, $WEB = `${NEMO_WEB_URL:-https://nemovideo.com}`. Skill attribution variables — do NOT hardcode: - `$SKILL_NAME` = this file's YAML frontmatter `name` - `$SKILL_VERSION` = this file's YAML frontmatter `version` - `$SKILL_SOURCE` = detect at runtime using the first match: 1. Env `SKILL_SOURCE` if set 2. If this file's path contains a known platform directory (e.g. `~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`) 3. Fallback: `unknown` **CRITICAL**: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402. ``` X-Skill-Source: $SKILL_NAME X-Skill-Version: $SKILL_VERSION X-Skill-Platform: $SKILL_SOURCE ``` ### 3.0 Initializing a New Session ```bash curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \ -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \ -d '{"task_name":"project","language":"<lang>"}' # → {"code":0,"data":{"task_id":"...","session_id":"..."}} ``` Before any operations can begin, a session must be established with the server. This session ID ties all subsequent requests together for the duration of the task. **Open in browser**: After creating a session, give the user a link to view/edit the task in NemoVideo: `$WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE` ### 3.1 Delivering Messages Through SSE ```bash curl -s -X POST "$API/run_sse" \ -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \ -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}' ``` All conversational exchanges with the AI are transmitted using Server-Sent Events, which stream responses back to the client in real time. #### SSE Handling | Event | Action | |-------|--------| | Text response | Apply GUI translation (§4), present to user | | Tool call/result | Wait silently, don't forward | | `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." | | Stream closes | Process final response | Typical durations: text 5-15s, video generation 100-300s, editing 10-30s. **Timeout**: 10 min heartbeats-only → assume timeout. **Never re-send** during generation (duplicates + double-charge). Ignore trailing "I encountered a temporary issue" if prior responses were normal. #### Silent Response Fallback (CRITICAL) Roughly 30% of editing operations complete without returning any text in the SSE stream. When no text content is received, do not treat this as an error or prompt the user to retry. Instead, immediately call the state query endpoint to retrieve the updated project status, then confirm success to the user based on the returned data. **Two-stage generation**: When a raw video is submitted, the backend automatically runs a two-stage enhancement pipeline. Stage one processes the core video output, and stage two appends background music and a title sequence without any additional input required. Wait for both stages to complete before presenting results to the user. ### 3.2 Handling File Uploads **File upload**: `curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"` **URL upload**: `curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'` Use **me** in the path; backend resolves user from token. Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac. The upload endpoint accepts both video and audio files, supporting all major formats commonly used in content production. ### 3.3 Checking Available Credits ```bash curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \ -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" # → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}} ``` Query the credits endpoint before initiating any billable operation to confirm the user has a sufficient balance. ### 3.4 Retrieving Current Project State ```bash curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \ -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" ``` Use **me** for user in path; backend resolves from token. Key fields: `data.state.draft`, `data.state.video_infos`, `data.state.canvas_config`, `data.state.generated_media`. **Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata. **Draft ready for export** when `draft.t` exists with at least one track with non-empty `sg`. **Track summary format**: ``` Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s) ``` ### 3.5 Exporting and Delivering the Final Output **Export does NOT cost credits.** Only generation/editing consumes credits. Exporting a finished video does not deduct any credits from the user's account. To complete delivery: (a) call the export endpoint with the project ID, (b) poll for export completion status, (c) retrieve the download URL from the response, (d) present the URL to the user, and (e) confirm the export was successful. **b)** Submit: `curl -s -X POST "$API/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'` Note: `sessionId` is **camelCase** (exception). On failure → new `id`, retry once. **c)** Poll (every 30s, max 10 polls): `curl -s "$API/api/render/proxy/lambda/<id>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"` Status at top-level `status`: pending → processing → completed / failed. Download URL at `output.url`. **d)** Download from `output.url` → send to user. Fallback: `$API/api/render/proxy/<id>/download`. **e)** When delivering the video, **always also give the task detail link**: `$WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE` Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + **task detail link**. ### 3.6 Recovering from an SSE Disconnection If the SSE connection drops unexpectedly, follow these five steps to recover gracefully: (1) Detect the disconnection event and log the last received event ID. (2) Wait a minimum of two seconds before attempting to reconnect to avoid hammering the server. (3) Re-establish the SSE connection using the stored session ID and pass the last event ID in the reconnect header. (4) If the reconnection fails after three attempts, call the state query endpoint directly to determine current job status. (5) Resume normal operation or notify the user only if the job itself has failed, not merely the connection. ## 4. Mapping Backend Responses to the User Interface The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions from the backend must never be forwarded verbatim to the user. | Backend says | You do | |-------------|--------| | "click [button]" / "点击" | Execute via API | | "open [panel]" / "打开" | Show state via §3.4 | | "drag/drop" / "拖拽" | Send edit via SSE | | "preview in timeline" | Show track summary | | "Export button" / "导出" | Execute §3.5 | | "check account/billing" | Check §3.3 | **Keep** content descriptions. **Strip** GUI actions. ## 5. Recommended Interaction Patterns • Always confirm a session is active before sending any message or file to the API. • When a silent response is received, query project state immediately rather than asking the user to repeat their request. • Present credit balance proactively before any operation that consumes credits, giving the user a chance to cancel. • After a two-stage processing pipeline completes, summarize both stages in a single, concise status update to avoid overwhelming the user. • On export completion, provide the download URL alongside a brief description of the output file so the user knows exactly what they are receiving. ## 6. Known Constraints and Limitations • Subtitle generation is limited to videos no longer than the maximum duration specified in the current plan tier. • Only one active export job per session is permitted at a time; concurrent exports are not supported. • The SSE stream does not guarantee ordered delivery during high-load periods, so always reconcile results against the state endpoint. • Credit balances are cached for up to 60 seconds, meaning very recent transactions may not reflect immediately. • File uploads are subject to size caps defined by the account tier and cannot be chunked across multiple requests. ## 7. Error Codes and Handling Procedures The table below lists all error codes the API may return along with the recommended recovery action for each. | Code | Meaning | Action | |------|---------|--------| | 0 | Success | Continue | | 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) | | 1002 | Session not found | New session §3.0 | | 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai" | | 4001 | Unsupported file | Show supported formats | | 4002 | File too large | Suggest compress/trim | | 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) | | 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." | | 429 | Rate limit (1 token/client/7 days) | Retry in 30s once | **Common**: no video → generate first; render fail → retry new `id`; SSE timeout → §3.6; silent edit → §3.1 fallback. ## 8. API Version and Required Token Scopes Always verify the API version header in every response to ensure compatibility with the current skill implementation. The access token must include the following scopes for full functionality: read, write, export, and credits. If any scope is missing, the affected endpoint will return a 403 and the user should be directed to reauthorize the integration.

subtitle-generator-ai

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

subtitle-generator-ai