ai-image-to-video-generator
## 0. First Contact
When the user opens this skill or sends their first message, **greet them immediately**:
> 📱 Ready to image to video generator! Just send me a video or describe your project.
**Try saying:**
- "help me create a short video"
- "add effects to this clip"
- "edit my video"
**IMPORTANT**: Do NOT wait silently. Always greet the user proactively on first contact.
### Auto-Setup (do this SILENTLY before anything else)
Before responding to the user, complete these steps invisibly:
1. **Check token**: If `NEMO_TOKEN` env var is set, use it. Otherwise:
2. **Read or generate Client-ID**:
- Read `~/.config/nemovideo/client_id` if it exists
- Otherwise generate a UUID, save it to `~/.config/nemovideo/client_id`
3. **Acquire anonymous token**:
```bash
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
```
Store the returned `token` as `NEMO_TOKEN` for this session. You get 100 free credits.
4. **Create a session** (§3.0) so you're ready to work immediately.
**CRITICAL**: Never mention tokens, authentication, API setup, or Client-IDs to the user. The user should feel like the skill "just works".
# Turn Static Images Into Cinematic Motion Stories
Still images carry meaning, but video captures attention. This skill bridges that gap by letting you describe exactly how you want your image to move — whether that's a gentle parallax drift, a dramatic zoom, flowing particle effects, or scene transitions — and delivering a rendered video that matches your creative intent.
The experience is fully conversational. Rather than wrestling with timelines, keyframes, or motion curves, you simply describe what you want in plain language. The OpenClaw agent interprets your instructions, communicates with the underlying AI video synthesis backend, and iterates with you until the result feels right. You can refine motion speed, direction, loop behavior, and visual tone across multiple turns in the same session.
Under the hood, the AI backend analyzes depth, subject boundaries, and visual composition within your source image to generate plausible, coherent motion that respects the original scene. The result is not a slideshow or a pan-and-scan effect — it is a genuinely animated video derived from a single frame. Final exports are available in mp4, mov, avi, webm, and mkv formats to fit any platform or publishing workflow.
### Environment Variables
| Variable | Required | Default |
|----------|----------|--------|
| `NEMO_TOKEN` | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| `NEMO_API_URL` | No | `https://mega-api-prod.nemovideo.ai` |
| `NEMO_WEB_URL` | No | `https://nemovideo.com` |
| `NEMO_CLIENT_ID` | No | Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets) |
| `SKILL_SOURCE` | No | Auto-detected from install path, fallback `unknown` |
If `NEMO_TOKEN` is not set, get one (requires `X-Client-Id` header):
```bash
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
```
Save `token` as `NEMO_TOKEN`, `CLIENT_ID` as `NEMO_CLIENT_ID`. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via **Settings → API Tokens** on nemovideo.com. If your token expires, request a new one with the same `X-Client-Id`.
**Local persistence:** This skill writes `~/.config/nemovideo/client_id` to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
## 2. Routing Incoming Requests
Use the following table to determine which endpoint handles each type of incoming request.
| User says... | Action | Skip SSE? |
|-------------|--------|----------|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
## 3. Primary Operation Flows
$API = `${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}`, $TOKEN = `${NEMO_TOKEN}`, $WEB = `${NEMO_WEB_URL:-https://nemovideo.com}`.
Skill attribution variables — do NOT hardcode:
- `$SKILL_NAME` = this file's YAML frontmatter `name`
- `$SKILL_VERSION` = this file's YAML frontmatter `version`
- `$SKILL_SOURCE` = detect at runtime using the first match:
1. Env `SKILL_SOURCE` if set
2. If this file's path contains a known platform directory (e.g. `~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`)
3. Fallback: `unknown`
**CRITICAL**: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
```
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
```
### 3.0 Establishing a Session
```bash
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
```
A session must be initialized before any other operations can proceed. This creates the context that all subsequent requests will be tied to.
**Open in browser**: After creating a session, give the user a link to view/edit the task in NemoVideo:
`$WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
### 3.1 Delivering Messages via SSE
```bash
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
```
All conversational messages are transmitted to the backend through a Server-Sent Events connection.
#### SSE Handling
| Event | Action |
|-------|--------|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Wait silently, don't forward |
| `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
**Timeout**: 10 min heartbeats-only → assume timeout. **Never re-send** during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
#### Silent Response Fallback (CRITICAL)
Approximately 30% of edit operations return no visible text in the response. When this occurs: (1) do not treat the absence of text as a failure, (2) poll the task state endpoint to confirm processing is underway, (3) once the task reaches a completed state, proceed directly to the export step, and (4) inform the user that their edit is being processed without alarming them about the lack of a text reply.
**Two-stage generation**: After the raw video is produced, the backend automatically initiates a second processing stage that layers in background music and a title sequence. Do not treat the first completed video as the final deliverable — wait for both stages to finish before presenting the result to the user.
### 3.2 Handling File Uploads
**File upload**: `curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"`
**URL upload**: `curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'`
Use **me** in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts image and video files that will serve as source material for generation tasks.
### 3.3 Checking Available Credits
```bash
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
```
Query the credits endpoint before initiating any generation task to confirm the user has a sufficient balance.
### 3.4 Polling Task Status
```bash
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
```
Use **me** for user in path; backend resolves from token.
Key fields: `data.state.draft`, `data.state.video_infos`, `data.state.canvas_config`, `data.state.generated_media`.
**Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata.
**Draft ready for export** when `draft.t` exists with at least one track with non-empty `sg`.
**Track summary format**:
```
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
```
### 3.5 Exporting and Delivering the Final Asset
**Export does NOT cost credits.** Only generation/editing consumes credits.
Triggering an export does not deduct any credits from the user's balance. To deliver the finished asset: (a) call the export endpoint once the task is confirmed complete, (b) retrieve the download URL from the response, (c) verify the URL is accessible, (d) present the link or embed the asset directly in the chat, and (e) confirm successful delivery to the user.
**b)** Submit: `curl -s -X POST "$API/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'`
Note: `sessionId` is **camelCase** (exception). On failure → new `id`, retry once.
**c)** Poll (every 30s, max 10 polls): `curl -s "$API/api/render/proxy/lambda/<id>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"`
Status at top-level `status`: pending → processing → completed / failed. Download URL at `output.url`.
**d)** Download from `output.url` → send to user. Fallback: `$API/api/render/proxy/<id>/download`.
**e)** When delivering the video, **always also give the task detail link**: `$WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE`
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + **task detail link**.
### 3.6 Recovering from an SSE Disconnection
If the SSE stream drops unexpectedly, follow these steps to recover: (1) detect the disconnection event and log it internally without surfacing an error to the user prematurely, (2) attempt to re-establish the SSE connection using the existing session ID, (3) if reconnection succeeds, resume listening for task progress events from where the stream left off, (4) if reconnection fails after the maximum number of retries, fall back to polling the task state endpoint at a regular interval, and (5) once a terminal task state is confirmed, proceed with the export flow as normal.
## 4. Translating GUI Elements
The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions must never be passed through directly to the user.
| Backend says | You do |
|-------------|--------|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
**Keep** content descriptions. **Strip** GUI actions.
## 5. Recommended Interaction Patterns
• Acknowledge the user's request immediately and set clear expectations about processing time before the generation task begins.
• Provide incremental progress updates during long-running tasks so users remain informed without needing to ask.
• When a task completes, always surface the final exported asset rather than an intermediate result.
• If the user submits an ambiguous prompt, ask a single focused clarifying question rather than making assumptions.
• After delivering the finished video, invite the user to request edits or refinements to keep the conversation moving forward.
## 6. Known Limitations
• Generation tasks can take several minutes to complete; real-time delivery is not possible.
• The system does not support more than one concurrent generation task per session.
• Source images must meet minimum resolution requirements or the upload will be rejected.
• Background music and title overlays are applied automatically and cannot be individually disabled through the API.
• Credit balances are read-only via the API; top-ups must be handled through the platform's billing interface.
## 7. Error Handling Reference
The table below maps common error codes to their likely causes and the recommended recovery action for each.
| Code | Meaning | Action |
|------|---------|--------|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
**Common**: no video → generate first; render fail → retry new `id`; SSE timeout → §3.6; silent edit → §3.1 fallback.
## 8. API Version and Required Scopes
Always verify that the API version header matches the version documented in this skill before making requests, as older versions may not support all endpoints described here. The access token provided at session creation must include the required scopes for generation, upload, export, and credits reading; requests made with tokens missing any of these scopes will return a 403 response.
标签
skill
ai