podcast-clip-maker
# From Full Episode to Highlight Reel in Minutes
Podcast episodes are packed with quotable moments, strong arguments, and emotional peaks — but finding and clipping them by hand is a time sink that most creators can't afford. The podcast-clip-maker skill was built specifically to solve this problem for audio-video podcast content, not generic video files.
When you submit a podcast recording, the OpenClaw agent analyzes the transcript and audio energy patterns together, pinpointing segments where speaker engagement is highest — think punchy one-liners, surprising revelations, or heated debates. It then trims those segments with clean in and out points, optionally adds burned-in captions, and packages each clip at the right aspect ratio for the platform you're targeting.
Under the hood, the AI backend combines speech-to-text alignment with prosody scoring to rank candidate clips by shareability before a single frame is exported. This means you're not just getting random cuts — you're getting the moments most likely to make someone stop scrolling. The conversational editing model lets you refine results through follow-up instructions, so you stay in control of tone, length, and branding without ever touching a timeline editor.
### Environment Variables
| Variable | Required | Default |
|----------|----------|--------|
| `NEMO_TOKEN` | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| `NEMO_API_URL` | No | `https://mega-api-prod.nemovideo.ai` |
| `NEMO_WEB_URL` | No | `https://nemovideo.com` |
| `NEMO_CLIENT_ID` | No | Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets) |
| `SKILL_SOURCE` | No | Auto-detected from install path, fallback `unknown` |
If `NEMO_TOKEN` is not set, get one (requires `X-Client-Id` header):
```bash
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
```
Save `token` as `NEMO_TOKEN`, `CLIENT_ID` as `NEMO_CLIENT_ID`. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via **Settings → API Tokens** on nemovideo.com. If your token expires, request a new one with the same `X-Client-Id`.
**Local persistence:** This skill writes `~/.config/nemovideo/client_id` to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
## 2. Routing Incoming Requests
Match each user intent to the appropriate endpoint using the table below.
| User says... | Action | Skip SSE |
|-------------|--------|----------|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
## 3. Primary Operation Flows
$API = `${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}`, $TOKEN = `${NEMO_TOKEN}`, $WEB = `${NEMO_WEB_URL:-https://nemovideo.com}`.
Skill attribution variables — do NOT hardcode:
- `$SKILL_NAME` = this file's YAML frontmatter `name`
- `$SKILL_VERSION` = this file's YAML frontmatter `version`
- `$SKILL_SOURCE` = detect at runtime using the first match:
1. Env `SKILL_SOURCE` if set
2. If this file's path contains a known platform directory (e.g. `~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`)
3. Fallback: `unknown`
**CRITICAL**: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
```
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
```
### 3.0 Initializing a Session
```bash
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
```
A session must be established before any other operations can proceed. The session ID returned here is required for all subsequent requests.
**Open in browser**: After creating a session, give the user a link to view/edit the task in NemoVideo:
`$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
### 3.1 Delivering Messages Over SSE
```bash
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
```
All conversational exchanges with the backend are transmitted through a Server-Sent Events stream.
#### SSE Handling
| Event | Action |
|-------|--------|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Wait silently, don't forward |
| `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
**Timeout**: 10 min heartbeats-only → assume timeout. **Never re-send** during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
#### Silent Response Fallback (CRITICAL)
Approximately 30% of edit operations return no visible text in the stream. When this occurs: first, wait for the SSE stream to close naturally; then call the state endpoint to check job status; next, poll at 5-second intervals until a terminal status is reached; finally, surface the result to the user only after a completed state is confirmed.
**Two-stage generation**: The backend automatically runs a two-stage pipeline after raw video is produced. Stage one delivers the unprocessed video clip. Stage two, triggered automatically, overlays background music and attaches a generated title. Do not prompt the user for these additions — they are applied without any additional input.
### 3.2 Handling File Uploads
**File upload**: `curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"`
**URL upload**: `curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'`
Use **me** in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Both local files and remote URLs are accepted for uploading source podcast media.
### 3.3 Checking Available Credits
```bash
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
```
Query the credits endpoint before initiating any processing job to confirm the user has a sufficient balance.
### 3.4 Polling Job Status
```bash
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
```
Use **me** for user in path; backend resolves from token.
Key fields: `data.state.draft`, `data.state.video_infos`, `data.state.canvas_config`, `data.state.generated_media`.
**Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata.
**Draft ready for export** when `draft.t` exists with at least one track with non-empty `sg`.
**Track summary format**:
```
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
```
### 3.5 Exporting and Delivering the Final Clip
**Export does NOT cost credits.** Only generation/editing consumes credits.
Exporting a finished clip consumes no credits. The delivery sequence is: (a) confirm the job has reached a completed state; (b) call the export endpoint with the job ID; (c) receive the signed download URL from the response; (d) present the URL to the user with a clear download prompt; (e) optionally include clip metadata such as duration and title in the same message.
**b)** Submit: `curl -s -X POST "$API/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'`
Note: `sessionId` is **camelCase** (exception). On failure → new `id`, retry once.
**c)** Poll (every 30s, max 10 polls): `curl -s "$API/api/render/proxy/lambda/<id>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"`
Status at top-level `status`: pending → processing → completed / failed. Download URL at `output.url`.
**d)** Download from `output.url` → send to user. Fallback: `$API/api/render/proxy/<id>/download`.
**e)** When delivering the video, **always also give the task detail link**: `$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + **task detail link**.
### 3.6 Recovering From an SSE Disconnection
If the SSE connection drops unexpectedly, follow these steps: (1) Record the last event ID received before the disconnect. (2) Attempt to reconnect to the SSE endpoint immediately, passing the last event ID in the request header. (3) If reconnection fails within 10 seconds, fall back to polling the state endpoint. (4) Continue polling at 5-second intervals until a terminal status is returned. (5) Resume normal delivery flow once a completed or failed state is confirmed.
## 4. Translating GUI Behavior
The backend is built with a graphical interface in mind, so never pass GUI-specific instructions or button labels through to the user.
| Backend says | You do |
|-------------|--------|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
**Keep** content descriptions. **Strip** GUI actions.
## 5. Recommended Interaction Patterns
• Confirm the upload has succeeded and the job has been queued before telling the user that processing has started.
• When a silent response occurs, communicate progress proactively rather than leaving the user without feedback.
• Always present credit balance information before starting an operation that will consume credits.
• After export, deliver the download link alongside a brief summary of the clip's key details.
• If a job reaches a failed state, explain the issue in plain language and suggest a concrete next step.
## 6. Known Limitations
• A single session cannot process more than one job concurrently; queue additional requests until the active job finishes.
• Source audio or video files must not exceed the documented maximum file size; advise users to compress large files before uploading.
• Background music and title overlays are applied automatically and cannot be disabled or customized through this interface.
• SSE streams may time out on long-running jobs; the silent response fallback and polling logic must always be implemented.
• Credit balances are read at the time of the check and may not reflect simultaneous usage from other active sessions.
## 7. Error Handling Reference
Use the table below to map HTTP status codes and API error responses to appropriate user-facing messages and retry behaviors.
| Code | Meaning | Action |
|------|---------|--------|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
**Common**: no video → generate first; render fail → retry new `id`; SSE timeout → §3.6; silent edit → §3.1 fallback.
## 8. API Version and Token Scopes
Before making any requests, verify that the API version in the base URL matches the version this skill was built against. Token scopes must include read access for session and state endpoints, and write access for upload, message, and export endpoints. If a 403 is returned, the most likely cause is a missing or insufficient scope on the access token rather than an invalid credential.
标签
skill
ai