youtube-video-clip-maker
## 0. First Contact
When the user opens this skill or sends their first message, **greet them immediately**:
> 🎥 Ready to youtube video clip maker! Just send me a video or describe your project.
**Try saying:**
- "help me create a short video"
- "edit my video"
- "add effects to this clip"
**IMPORTANT**: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
### Auto-Setup
When the user first interacts, set up the connection:
1. **Check token**: If `NEMO_TOKEN` env var is set, use it. Otherwise:
2. **Read or generate Client-ID**:
- Read `~/.config/nemovideo/client_id` if it exists
- Otherwise generate a UUID, save it to `~/.config/nemovideo/client_id`
3. **Acquire anonymous token**:
```bash
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
```
Store the returned `token` as `NEMO_TOKEN` for this session. You get 100 free credits.
4. **Create a session** (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
# Cut the Right Moments Without Touching a Timeline
Most video editors demand you already know exactly where your clip lives — scrub to the timestamp, drag the handles, export, repeat. The youtube-video-clip-maker skill flips that workflow entirely. Instead of hunting through footage manually, you describe the segment you want: a topic, a speaker's remark, an action sequence, or a rough time range. The skill interprets your intent and isolates the relevant portion of the video for you.
Under the hood, this skill is powered by an OpenClaw agent that coordinates between a video analysis layer and a frame-accurate trimming engine. The OpenClaw agent handles the conversational back-and-forth — asking clarifying questions when a clip boundary is ambiguous, confirming output duration, and managing format conversion so the final file matches your target platform or upload spec.
This makes the skill especially useful for YouTube creators who regularly need to produce Shorts from existing long-form uploads, or for social media teams that repurpose webinar recordings into digestible clips. Because the interaction is conversational rather than interface-driven, you can refine a clip multiple times in the same session without re-uploading or starting over. Input files can be mp4, mov, avi, webm, or mkv.
### Environment Variables
| Variable | Required | Default |
|----------|----------|--------|
| `NEMO_TOKEN` | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| `NEMO_API_URL` | No | `https://mega-api-prod.nemovideo.ai` |
| `NEMO_WEB_URL` | No | `https://nemovideo.com` |
| `NEMO_CLIENT_ID` | No | Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets) |
| `SKILL_SOURCE` | No | Auto-detected from install path, fallback `unknown` |
If `NEMO_TOKEN` is not set, get one (requires `X-Client-Id` header):
```bash
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
```
Save `token` as `NEMO_TOKEN`, `CLIENT_ID` as `NEMO_CLIENT_ID`. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via **Settings → API Tokens** on nemovideo.com. If your token expires, request a new one with the same `X-Client-Id`.
**Local persistence:** This skill writes `~/.config/nemovideo/client_id` to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
## 2. Routing Incoming Requests to the Correct Endpoint
Use the table below to determine which API endpoint should handle each type of incoming user request.
| User says... | Action | Skip SSE |
|-------------|--------|----------|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
## 3. Primary Workflow Sequences
$API = `${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}`, $TOKEN = `${NEMO_TOKEN}`, $WEB = `${NEMO_WEB_URL:-https://nemovideo.com}`.
Skill attribution variables — do NOT hardcode:
- `$SKILL_NAME` = this file's YAML frontmatter `name`
- `$SKILL_VERSION` = this file's YAML frontmatter `version`
- `$SKILL_SOURCE` = detect at runtime using the first match:
1. Env `SKILL_SOURCE` if set
2. If this file's path contains a known platform directory (e.g. `~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`)
3. Fallback: `unknown`
**CRITICAL**: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
```
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
```
### 3.0 Initializing a New Session
```bash
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
```
Before any operations can begin, a session must be established with the API. Store the returned session identifier immediately, as every subsequent request depends on it.
**Open in browser**: After creating a session, give the user a link to view/edit the task in NemoVideo:
`$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
### 3.1 Delivering Messages Over an SSE Connection
```bash
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
```
All conversational messages to the backend are transmitted through a persistent Server-Sent Events channel.
#### SSE Handling
| Event | Action |
|-------|--------|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Process internally, don't forward |
| `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
**Timeout**: 10 min heartbeats-only → assume timeout. **Never re-send** during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
#### Silent Response Fallback (CRITICAL)
Approximately 30% of editing operations complete without returning any text in the response body. When this occurs: (1) do not treat the empty response as a failure, (2) immediately call the state query endpoint to confirm job status, (3) retrieve the output URL from the state payload, (4) present the result to the user as a successfully completed edit.
**Two-stage generation**: After delivering the raw edited video, the backend automatically triggers a second processing stage that overlays background music and renders a title card. Treat this as a two-phase pipeline: Phase 1 produces the undecorated clip, and Phase 2 — initiated without any additional input from the AI — produces the fully decorated final output.
### 3.2 Handling File Uploads
**File upload**: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"`
**URL upload**: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'`
Use **me** in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The API accepts direct binary file uploads for source video material submitted by the user.
### 3.3 Retrieving Credit Balance
```bash
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
```
Query the credits endpoint before beginning any edit operation to confirm the user has a sufficient balance to proceed.
### 3.4 Polling Current Job State
```bash
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
```
Use **me** for user in path; backend resolves from token.
Key fields: `data.state.draft`, `data.state.video_infos`, `data.state.canvas_config`, `data.state.generated_media`.
**Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata.
**Draft ready for export** when `draft.t` exists with at least one track with non-empty `sg`.
**Track summary format**:
```
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
```
### 3.5 Triggering Export and Delivering the Final File
**Export does NOT cost credits.** Only generation/editing consumes credits.
Exporting a finished clip does not consume any credits from the user's balance. To complete delivery: (a) confirm the job has reached a completed state, (b) call the export endpoint with the job identifier, (c) receive the download URL in the response, (d) verify the URL is reachable before surfacing it, (e) present the link to the user with a clear call to action.
**b)** Submit: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'`
Note: `sessionId` is **camelCase** (exception). On failure → new `id`, retry once.
**c)** Poll (every 30s, max 10 polls): `curl -s "https://mega-api-prod.nemovideo.ai/api/render/proxy/lambda/<id>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"`
Status at top-level `status`: pending → processing → completed / failed. Download URL at `output.url`.
**d)** Download from `output.url` → send to user. Fallback: `https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download`.
**e)** When delivering the video, **always also give the task detail link**: `$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + **task detail link**.
### 3.6 Recovering from an SSE Disconnection
If the SSE stream drops unexpectedly, follow these five steps: (1) Wait 2 seconds before attempting any recovery action to avoid a thundering-herd retry loop. (2) Re-establish the SSE connection using the original session identifier — do not create a new session. (3) Poll the state endpoint once to determine whether the in-flight job completed during the disconnection window. (4) If the job shows a completed status, retrieve and deliver the output URL to the user without resubmitting the edit request. (5) Only if the job status is absent or shows a hard failure should you prompt the user to resubmit their request.
## 4. Translating Backend GUI References for the User
The backend is designed around a graphical interface and will occasionally reference UI controls or screen elements — never relay these GUI-specific instructions directly to the user.
| Backend says | You do |
|-------------|--------|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
**Keep** content descriptions. **Strip** GUI actions.
## 5. Recommended Conversational Interaction Patterns
• Always confirm the desired trim points — start time and end time — before submitting any edit request, rather than making assumptions about intent.
• After submitting a job, set the user's expectation that processing may take a moment and that you will report back once the status resolves.
• When a silent response is received, transition directly to polling without alerting the user to the technical detail.
• Present the final export URL using plain, action-oriented language such as 'Your clip is ready — here is the download link' rather than exposing raw API output.
• If a credit balance is insufficient, explain the limitation clearly and suggest the user top up before retrying.
## 6. Known Constraints and Limitations
• A single session cannot process more than one edit job concurrently; queue additional requests until the active job reaches a terminal state.
• Source video files must not exceed the documented maximum file size; advise users to compress or trim source material if the limit is approached.
• Background music and title overlays applied during Phase 2 are selected automatically by the backend and cannot be customized through the API at this time.
• Export URLs are time-limited and will expire after the documented TTL window; instruct users to download their clip promptly.
• The API does not support real-time preview generation; the user will only see output after the full processing pipeline completes.
## 7. Error Recognition and Response Guidance
The table below maps API error codes to their probable causes and the recommended recovery action the AI should take or communicate to the user.
| Code | Meaning | Action |
|------|---------|--------|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
**Common**: no video → generate first; render fail → retry new `id`; SSE timeout → §3.6; silent edit → §3.1 fallback.
## 8. API Version Compatibility and Required Token Scopes
Before integrating, confirm that the API version declared in the response headers matches the version this skill was authored against; a mismatch may cause undocumented behavior. The OAuth token supplied with every request must include all scopes listed in the skill manifest — at minimum the read, write, and export scopes — or the API will return a 403 on privileged operations.
标签
skill
ai