grok-ai-video-generator

## 0. First Contact When the user opens this skill or sends their first message, **greet them immediately**: > 📹 Ready to grok ai video generator! Just send me a video or describe your project. **Try saying:** - "add effects to this clip" - "edit my video" - "help me create a short video" **IMPORTANT**: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact. ### Auto-Setup When the user first interacts, set up the connection: 1. **Check token**: If `NEMO_TOKEN` env var is set, use it. Otherwise: 2. **Read or generate Client-ID**: - Read `~/.config/nemovideo/client_id` if it exists - Otherwise generate a UUID, save it to `~/.config/nemovideo/client_id` 3. **Acquire anonymous token**: ```bash curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID" ``` Store the returned `token` as `NEMO_TOKEN` for this session. You get 100 free credits. 4. **Create a session** (§3.0) so you're ready to work immediately. Let the user know briefly: "Setting things up… ready!" then proceed with their request. # Conversational Video Creation Powered by Grok Most video tools demand you learn their interface before you create anything useful. The grok-ai-video-generator skill flips that assumption entirely — you describe what you want in plain language, and the system figures out how to build it. Whether you're assembling a product walkthrough, generating a short narrative clip, or iterating on an existing cut, the workflow stays conversational from start to finish. Under the hood, the OpenClaw agent acts as the orchestration layer between your instructions and Grok's multimodal reasoning capabilities. It interprets your intent, breaks down complex video requests into discrete generation and editing steps, and manages the sequencing so outputs arrive coherent and on-spec. The agent doesn't just pass prompts — it tracks context across your session, meaning follow-up instructions like 'make that transition slower' or 'swap the background music' land correctly without you re-explaining the full project. This skill is particularly effective for iterative work — rough cuts that need rapid refinement, social content that requires multiple format variants, or scripted sequences where timing and visual phrasing matter. Because the pipeline handles mp4, mov, avi, webm, and mkv natively, you're never forced to convert files before or after generation. ### Environment Variables | Variable | Required | Default | |----------|----------|--------| | `NEMO_TOKEN` | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) | | `NEMO_API_URL` | No | `https://mega-api-prod.nemovideo.ai` | | `NEMO_WEB_URL` | No | `https://nemovideo.com` | | `NEMO_CLIENT_ID` | No | Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets) | | `SKILL_SOURCE` | No | Auto-detected from install path, fallback `unknown` | If `NEMO_TOKEN` is not set, get one (requires `X-Client-Id` header): ```bash # Generate or read persisted Client-Id CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}" if [ -z "$CLIENT_ID" ]; then CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM") mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id fi curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID" # → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}} ``` Save `token` as `NEMO_TOKEN`, `CLIENT_ID` as `NEMO_CLIENT_ID`. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via **Settings → API Tokens** on nemovideo.com. If your token expires, request a new one with the same `X-Client-Id`. **Local persistence:** This skill writes `~/.config/nemovideo/client_id` to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally. ## 2. Routing Logic — Matching User Requests to the Correct Endpoint Use the table below to determine which API endpoint to invoke based on the nature of the incoming user request. | User says... | Action | Skip SSE | |-------------|--------|----------| | "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ | | "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ | | "status" / "状态" / "show tracks" | → §3.4 State | ✅ | | "upload" / "上传" / user sends file | → §3.2 Upload | ✅ | | Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ | ## 3. Primary Operation Flows $API = `${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}`, $TOKEN = `${NEMO_TOKEN}`, $WEB = `${NEMO_WEB_URL:-https://nemovideo.com}`. Skill attribution variables — do NOT hardcode: - `$SKILL_NAME` = this file's YAML frontmatter `name` - `$SKILL_VERSION` = this file's YAML frontmatter `version` - `$SKILL_SOURCE` = detect at runtime using the first match: 1. Env `SKILL_SOURCE` if set 2. If this file's path contains a known platform directory (e.g. `~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`) 3. Fallback: `unknown` **CRITICAL**: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402. ``` X-Skill-Source: $SKILL_NAME X-Skill-Version: $SKILL_VERSION X-Skill-Platform: $SKILL_SOURCE ``` ### 3.0 Initialize a Session ```bash curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \ -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \ -d '{"task_name":"project","language":"<lang>"}' # → {"code":0,"data":{"task_id":"...","session_id":"..."}} ``` A session must be established before any other operations can proceed. Each session maintains its own context and state throughout the conversation lifecycle. **Open in browser**: After creating a session, give the user a link to view/edit the task in NemoVideo: `$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE` ### 3.1 Deliver Messages Over SSE ```bash curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \ -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \ -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}' ``` All conversational messages and generation requests are transmitted to the backend via a persistent Server-Sent Events connection. #### SSE Handling | Event | Action | |-------|--------| | Text response | Apply GUI translation (§4), present to user | | Tool call/result | Wait silently, don't forward | | `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." | | Stream closes | Process final response | Typical durations: text 5-15s, video generation 100-300s, editing 10-30s. **Timeout**: 10 min heartbeats-only → assume timeout. **Never re-send** during generation (duplicates + double-charge). Ignore trailing "I encountered a temporary issue" if prior responses were normal. #### Silent Response Fallback (CRITICAL) Approximately 30% of edit operations complete without returning any text in the SSE stream. When no textual response is detected after the stream closes, do not treat this as a failure. Instead: (1) poll the task state endpoint to confirm completion, (2) retrieve the updated video artifact directly, (3) confirm success to the user based on the retrieved asset rather than the absent text response. **Two-stage generation**: After a raw video is produced, the backend automatically initiates a second processing stage that layers in background music and a title sequence. This means every generation produces two distinct output events — the initial raw clip followed by the fully composed final version. Both stages must be tracked before reporting completion to the user. ### 3.2 Asset Upload **File upload**: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"` **URL upload**: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'` Use **me** in the path; backend resolves user from token. Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac. The upload endpoint accepts user-provided media files that can be referenced as source material within subsequent generation or editing requests. ### 3.3 Credit Balance Check ```bash curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \ -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" # → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}} ``` Query the credits endpoint to retrieve the user's current balance before initiating any operation that consumes credits. ### 3.4 Task Status Polling ```bash curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \ -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" ``` Use **me** for user in path; backend resolves from token. Key fields: `data.state.draft`, `data.state.video_infos`, `data.state.canvas_config`, `data.state.generated_media`. **Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata. **Draft ready for export** when `draft.t` exists with at least one track with non-empty `sg`. **Track summary format**: ``` Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s) ``` ### 3.5 Export and Deliver the Final Asset **Export does NOT cost credits.** Only generation/editing consumes credits. Exporting a completed video does not deduct credits from the user's balance. The export flow proceeds as follows: (a) confirm the task has reached a completed state, (b) call the export endpoint with the relevant task identifier, (c) await the export job confirmation, (d) retrieve the download URL from the export response, (e) present the URL to the user as the deliverable. **b)** Submit: `curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'` Note: `sessionId` is **camelCase** (exception). On failure → new `id`, retry once. **c)** Poll (every 30s, max 10 polls): `curl -s "https://mega-api-prod.nemovideo.ai/api/render/proxy/lambda/<id>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"` Status at top-level `status`: pending → processing → completed / failed. Download URL at `output.url`. **d)** Download from `output.url` → send to user. Fallback: `https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download`. **e)** When delivering the video, **always also give the task detail link**: `$WEB/workspace/claim?task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE` Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + **task detail link**. ### 3.6 Handling SSE Connection Drops If the SSE stream disconnects before a final result is received, execute the following recovery sequence: (1) record the task ID that was active at the time of disconnection; (2) wait a minimum of five seconds before attempting any follow-up request; (3) call the task state endpoint using the stored task ID to determine current progress; (4) if the task is still running, continue polling at a reasonable interval until a terminal state is reached; (5) once a completed or failed state is confirmed, resume the normal delivery or error-handling flow accordingly. ## 4. Backend GUI Abstraction Layer The backend operates under the assumption that all interactions occur through a graphical interface, so GUI-specific instructions must never be forwarded verbatim to the API. | Backend says | You do | |-------------|--------| | "click [button]" / "点击" | Execute via API | | "open [panel]" / "打开" | Show state via §3.4 | | "drag/drop" / "拖拽" | Send edit via SSE | | "preview in timeline" | Show track summary | | "Export button" / "导出" | Execute §3.5 | | "check account/billing" | Check §3.3 | **Keep** content descriptions. **Strip** GUI actions. ## 5. Recommended Interaction Patterns • Always confirm the user's intent before invoking a credit-consuming operation, summarizing what will be generated or modified. • After each generation completes, present a concise summary of what was produced along with any available preview or download link. • When a task is running, proactively communicate progress using polled state data rather than leaving the user without feedback. • If a user request is ambiguous, ask one clarifying question before proceeding rather than making assumptions that could waste credits. • Surface credit balance information proactively when the remaining balance is low or when the requested operation has a high cost. ## 6. Known Constraints and Limitations • Video generation tasks are asynchronous and may take several minutes; synchronous responses should never be assumed. • Editing operations have no guaranteed text response in the SSE stream; silent completions are expected and must be handled gracefully. • A single session context cannot be shared across multiple concurrent generation tasks. • Uploaded assets are subject to format and size restrictions defined by the API; unsupported files will be rejected at the upload stage. • Credit deductions are applied at the point of task initiation and are non-refundable for tasks that reach a processing state. ## 7. Error Recognition and Response Handling Refer to the table below to identify the appropriate recovery action for each error code or failure condition returned by the API. | Code | Meaning | Action | |------|---------|--------| | 0 | Success | Continue | | 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) | | 1002 | Session not found | New session §3.0 | | 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai" | | 4001 | Unsupported file | Show supported formats | | 4002 | File too large | Suggest compress/trim | | 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) | | 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." | | 429 | Rate limit (1 token/client/7 days) | Retry in 30s once | **Common**: no video → generate first; render fail → retry new `id`; SSE timeout → §3.6; silent edit → §3.1 fallback. ## 8. API Version and Required Token Scopes Before establishing a session, verify that the API version in use matches the version this skill was built against; a mismatch may cause undocumented behavior. The OAuth token supplied with each request must include all scopes required for the intended operations — at minimum, read access for status polling and write access for generation, editing, upload, and export endpoints. Tokens missing required scopes will result in authorization errors that cannot be resolved at runtime without re-authenticating.

grok-ai-video-generator

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

grok-ai-video-generator