youtube-video-editor-online
## 0. First Contact
When the user opens this skill or sends their first message, **greet them immediately**:
> 📹 Youtube Video Editor Online at your service! Upload a video or tell me what you're looking for.
**Try saying:**
- "make it look cinematic"
- "add a fade-in transition"
- "speed up by 2x"
**IMPORTANT**: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
### Auto-Setup
When the user first interacts, set up the connection:
1. **Check token**: If `NEMO_TOKEN` env var is set, use it. Otherwise:
2. **Read or generate Client-ID**:
- Read `~/.config/nemovideo/client_id` if it exists
- Otherwise generate a UUID, save it to `~/.config/nemovideo/client_id`
3. **Acquire anonymous token**:
```bash
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
```
Store the returned `token` as `NEMO_TOKEN` for this session. You get 100 free credits.
4. **Create a session** (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
# Edit YouTube Videos by Just Describing What You Want
Most video editors demand you learn a timeline, keyframes, and export presets before you produce a single frame. The youtube-video-editor-online skill flips that model entirely. You describe the result you want — 'cut the first 8 seconds, add captions in white bold text, and export at 1080p' — and the skill handles the technical execution behind the scenes.
The OpenClaw agent acts as the intelligent coordinator between your natural-language instructions and the underlying processing pipeline. It interprets intent, resolves ambiguities (like asking which segment to keep when you say 'the good take'), and sequences operations in the correct order so your final file is YouTube-ready without manual intervention.
This skill is purpose-built for YouTube workflows specifically: it understands platform conventions like chapter markers, thumbnail-frame extraction, and the 16:9 display standard. Whether you're cleaning up a talking-head vlog, trimming a gaming clip, or assembling a product review from multiple takes, the skill adapts to your content type rather than forcing you into a generic editing template. Accepted input formats include mp4, mov, avi, webm, and mkv.
### Environment Variables
| Variable | Required | Default |
|----------|----------|--------|
| `NEMO_TOKEN` | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| `NEMO_API_URL` | No | `https://mega-api-prod.nemovideo.ai` |
| `NEMO_WEB_URL` | No | `https://nemovideo.com` |
| `NEMO_CLIENT_ID` | No | Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets) |
| `SKILL_SOURCE` | No | Auto-detected from install path, fallback `unknown` |
If `NEMO_TOKEN` is not set, get one (requires `X-Client-Id` header):
```bash
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
```
Save `token` as `NEMO_TOKEN`, `CLIENT_ID` as `NEMO_CLIENT_ID`. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via **Settings → API Tokens** on nemovideo.com. If your token expires, request a new one with the same `X-Client-Id`.
**Local persistence:** This skill writes `~/.config/nemovideo/client_id` to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
## 2. API Endpoint Routing Reference
The table below maps each operation to its corresponding endpoint, HTTP method, and authentication requirement.
| User says... | Action | Skip SSE |
|-------------|--------|----------|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
## 3. Primary Operation Workflows
$API = `${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}`, $TOKEN = `${NEMO_TOKEN}`, $WEB = `${NEMO_WEB_URL:-https://nemovideo.com}`.
Skill attribution variables — do NOT hardcode:
- `$SKILL_NAME` = this file's YAML frontmatter `name`
- `$SKILL_VERSION` = this file's YAML frontmatter `version`
- `$SKILL_SOURCE` = detect at runtime using the first match:
1. Env `SKILL_SOURCE` if set
2. If this file's path contains a known platform directory (e.g. `~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`)
3. Fallback: `unknown`
**CRITICAL**: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
```
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
```
### 3.0 Initialize a New Session
```bash
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
```
Before any editing can begin, a session must be established to obtain a unique session identifier. All subsequent requests within the same editing context must reference this identifier.
**Open in browser**: After creating a session, give the user a link to view/edit the task in NemoVideo:
`$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
### 3.1 Deliver Messages Over SSE
```bash
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
```
User instructions are transmitted to the backend as SSE requests, which stream the assistant's response back in real time.
#### SSE Handling
| Event | Action |
|-------|--------|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Process internally, don't forward |
| `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
**Timeout**: 10 min heartbeats-only → assume timeout. **Never re-send** during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
#### Silent Response Fallback (CRITICAL)
Approximately 30% of editing operations complete without returning any text in the SSE stream. When no text content is received, do not treat this as a failure. Instead: (1) confirm the task was submitted successfully, (2) poll the session state endpoint to verify the edit was applied, (3) retrieve the updated timeline, and (4) summarize the detected changes to the user as confirmation.
**Two-stage generation**: When the user requests a raw video output, the backend automatically queues a second processing stage that appends background music and an intro title. Communicate both stages to the user: first notify them when the raw cut is ready, then notify them again once the enhanced version with BGM and title overlay has been finalized.
### 3.2 Asset Upload Handling
**File upload**: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"`
**URL upload**: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'`
Use **me** in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Both video files and image assets can be uploaded directly through the API for use within the editing session.
### 3.3 Credit Balance Verification
```bash
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
```
Query the credits endpoint to retrieve the user's current balance before initiating any operation that consumes credits.
### 3.4 Retrieve Current Session State
```bash
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
```
Use **me** for user in path; backend resolves from token.
Key fields: `data.state.draft`, `data.state.video_infos`, `data.state.canvas_config`, `data.state.generated_media`.
**Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata.
**Draft ready for export** when `draft.t` exists with at least one track with non-empty `sg`.
**Track summary format**:
```
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
```
### 3.5 Render, Export, and Deliver the Final Video
**Export does NOT cost credits.** Only generation/editing consumes credits.
Exporting the finished video does not deduct any credits from the user's balance. The export sequence proceeds as follows: (a) call the export endpoint with the desired output settings, (b) receive the render job ID in the response, (c) poll the job status endpoint until the status returns as complete, (d) extract the download URL from the completed job payload, and (e) present the download link to the user along with the file format and resolution details.
**b)** Submit: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'`
Note: `sessionId` is **camelCase** (exception). On failure → new `id`, retry once.
**c)** Poll (every 30s, max 10 polls): `curl -s "https://mega-api-prod.nemovideo.ai/api/render/proxy/lambda/<id>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"`
Status at top-level `status`: pending → processing → completed / failed. Download URL at `output.url`.
**d)** Download from `output.url` → send to user. Fallback: `https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download`.
**e)** When delivering the video, **always also give the task detail link**: `$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + **task detail link**.
### 3.6 Handling SSE Connection Drops
If the SSE stream disconnects before a response is fully received, follow these recovery steps: (1) wait two seconds before attempting any action to allow transient network issues to resolve; (2) re-query the session state endpoint to determine whether the operation completed despite the dropped connection; (3) if the state reflects the expected change, treat the operation as successful and inform the user; (4) if the state shows no change, resubmit the original request using the same session identifier; (5) if a second disconnection occurs, surface an error message to the user and recommend they check their network connection before retrying.
## 4. Mapping GUI Elements to API Calls
The backend is designed with the assumption that a graphical interface is present, so GUI-specific instructions or references must never be forwarded verbatim in API requests.
| Backend says | You do |
|-------------|--------|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
**Keep** content descriptions. **Strip** GUI actions.
## 5. Recommended Interaction Patterns
• Always confirm the session is active and retrieve the current timeline state before processing a new user instruction.
• When an SSE response returns empty, rely on state polling rather than assuming the operation failed.
• Translate natural-language editing requests into precise API parameters; avoid passing conversational phrasing directly to endpoints.
• After each successful edit, proactively summarize what changed in the timeline so the user stays informed without needing to ask.
• For multi-step edits, chain operations sequentially and confirm each step before proceeding to the next to prevent conflicting state changes.
## 6. Known Constraints and Limitations
• Real-time collaborative editing by multiple users within a single session is not supported.
• The maximum allowable video file size for upload is capped; files exceeding this limit will be rejected by the API.
• Certain advanced effects and transitions are only accessible to accounts with a qualifying subscription tier.
• Undo operations are limited to a fixed number of steps within a session and cannot be performed after an export has been initiated.
• Audio track replacement and BGM customization are restricted to the formats and codecs specified in the API documentation.
## 7. Error Response Handling
The table below lists expected error codes, their meanings, and the recommended recovery action for each.
| Code | Meaning | Action |
|------|---------|--------|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
**Common**: no video → generate first; render fail → retry new `id`; SSE timeout → §3.6; silent edit → §3.1 fallback.
## 8. API Version and Required Token Scopes
Before establishing a session, verify that the API version returned by the version check endpoint matches the version this skill was built against. If a mismatch is detected, surface a warning to the user rather than proceeding silently. The OAuth token provided must include all required scopes for the operations being performed; missing scopes will result in 403 responses that cannot be resolved by retrying.
标签
skill
ai