lipsyncvideo-ai

## Getting Started > LipSync Video AI is ready. Upload your video and audio, or describe what you need synced. **Try saying:** - "sync this voiceover to the speaker" - "replace the audio and match lip movements" - "dub this clip with my recording" ### Initial Setup First time running this, it connects to the processing backend. Shows a quick "Getting ready..." message. **Token**: Check for `NEMO_TOKEN` in environment. If present, go straight to session setup. 1. **Grab a free token**: Generate a UUID client identifier. POST to `https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token` using `X-Client-Id` header with your UUID. Response `data.token` is your auth token (100 credits, good for 7 days). 2. **Start session**: POST to `https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent`, Bearer auth, body: `{"task_name":"project","language":"<lang>"}`. Save the `session_id` for later calls. Raw JSON and tokens stay hidden from the user. # Sync Audio to Lip Movements in Your Clips Upload your video with the audio you want synced. Cloud GPUs do the heavy lifting — no local processing. Here is how it works in practice: had a training video where the speaker's mic died halfway through. Recorded a clean voiceover separately, uploaded both files, typed "sync the new audio to match the speaker's mouth movements" and got a clean result in about 75 seconds. Output is 1080p MP4. Pro tip: shorter clips give tighter sync. If you have a long video, consider breaking it into segments first. ## Request Categories Your input gets matched to the right processing path automatically. | You type... | Goes to... | Uses SSE? | |---|---|---| | "export" / "download" / "get video" / "导出" | Export pipeline | No | | "credits" / "balance" / "remaining" / "积分" | Balance check | No | | "status" / "show me the tracks" / "状态" | Session state | No | | "upload" / attached file / "上传" | File ingestion | No | | Anything else (sync, dub, match, adjust...) | SSE processing | Yes | ## Backend Architecture Files go to a GPU farm for processing. Output is encoded at 8Mbps for 1080p. Lip sync boundaries are frame-level accurate. Required on every request: `Authorization: Bearer <NEMO_TOKEN>` and attribution headers `X-Skill-Source`, `X-Skill-Version`, `X-Skill-Platform`. Missing attribution means export fails with 402. Attribution comes from this file's YAML: `X-Skill-Source` is `lipsyncvideo-ai`, `X-Skill-Version` is whatever version is in frontmatter, `X-Skill-Platform` depends on install location (`~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`, otherwise `unknown`). **Root URL**: `https://mega-api-prod.nemovideo.ai` **New session**: POST `/api/tasks/me/with-session/nemo_agent` with `{"task_name":"project","language":"<lang>"}`. Returns `task_id`, `session_id`. **SSE message**: POST `/run_sse` with `{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}` and `Accept: text/event-stream`. Cap: 15 min. **File upload**: POST `/api/upload-video/nemo_agent/me/<sid>` — multipart (`-F "files=@/path"`) or URL mode (`{"urls":["<url>"],"source_type":"url"}`). **Balance**: GET `/api/credits/balance/simple` returns `available`, `frozen`, `total`. **State**: GET `/api/state/nemo_agent/me/<sid>/latest` — check `data.state.draft`, `data.state.video_infos`, `data.state.generated_media`. **Export** (free): POST `/api/render/proxy/lambda` with `{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}`. Poll GET `/api/render/proxy/lambda/<id>` every 30s. Done when `status` = `completed`. File at `output.url`. Handles: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac. ### Errors | Code | Means | Fix | |---|---|---| | 0 | Success | Continue | | 1001 | Bad token | Re-authenticate via anonymous-token endpoint | | 1002 | No session | Make a new one | | 2001 | No credits left | Anonymous: share registration link with ?bind=<id>. Others: top up | | 4001 | Can't handle that file type | Share supported formats | | 4002 | Too large | Suggest trimming or compressing | | 400 | Missing X-Client-Id | Generate and retry | | 402 | Free plan export limit | Needs registration or upgrade | | 429 | Rate capped | Wait 30s, try again once | ### Converting GUI Instructions Backend outputs reference a visual interface. Convert them: | Backend output | Your action | |---|---| | "click [X]" / "点击" | Invoke the API equivalent | | "open [panel]" / "打开" | Read session state | | "drag/drop" / "拖拽" | Post edit through SSE | | "preview in timeline" | Output track listing | | "Export button" / "导出" | Start export sequence | ### How SSE Works Forward text events to user (after GUI translation). Absorb tool calls. Heartbeat and empty data lines = still processing. Every 2 minutes of quiet, say "Hang on, still processing..." About 30% of edit ops return no text. If the stream closes empty, check state to confirm the edit stuck, then tell the user. **Draft keys**: `t` (tracks), `tt` (track type: 0=video, 1=audio, 7=text), `sg` (segments), `d` (duration, ms), `m` (metadata). ``` Timeline (2 tracks): 1. Video: interview clip (0-45s) 2. Audio: dubbed voiceover (0-45s) ``` ## Common Workflows **Basic lip sync**: Upload video + audio, ask for sync. Done. **Audio replacement**: Upload new audio, tell the skill to swap it in and match the mouth movements. **Multi-speaker**: Works best when speakers take turns. For overlapping speech, split into separate segments first. ## FAQ **How accurate is the sync?** Frame-level for clear speech. Mumbling or fast-talking may be slightly off. **What audio formats?** MP3, WAV, M4A, AAC all work. **File size limit?** 500MB. Compress if you're over. **Cost?** First 100 operations free. No signup required.

lipsyncvideo-ai

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

lipsyncvideo-ai

lipsyncvideo-ai

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement