leonardo-ai

## Getting Started > This skill connects directly to Leonardo AI so you can generate images right here in the chat. Drop your prompt and let's get something made. **Try saying:** - "Generate a portrait in oil painting style" - "Create 4 logo concept variations now" - "Turn my sketch into realistic art" ### Getting Connected Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...". **If `NEMO_TOKEN` is in the environment**, use it directly and create a session. Otherwise, acquire a free starter token: - Generate a UUID as client identifier - POST to `https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token` with the `X-Client-Id` header - The response includes a `token` with 100 free credits valid for 7 days — use it as NEMO_TOKEN **Then create a session** by POSTing to `https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent` with Bearer authorization and body `{"task_name":"project","language":"en"}`. The `session_id` in the response is needed for all following requests. Tell the user you're ready. Keep the technical details out of the chat. # Turn Text Prompts Into Generated Images Fast Say you're building a pitch deck and need a fantasy cityscape at dusk — you type that description here, pick a model like Leonardo Diffusion XL, and get a 1536x1024 PNG back in roughly 20 seconds. No fussing with sliders on a separate website. You can also pass in a reference image and ask for a variation. The skill sends your base image plus the style prompt to Leonardo AI and returns up to 4 variations in a single batch. It's not just for art. Product designers use it to mock up packaging at 512x512 before committing to a real shoot, and that alone cuts early-stage review cycles down to one afternoon instead of three days. ## Routing Prompts To Actions Your input gets parsed for keywords like 'generate', 'upscale', 'canvas', or a model name (e.g. 'Phoenix', 'Kino XL') to route to the correct Leonardo AI endpoint. | User says... | Action | Skip SSE? | |-------------|--------|----------| | "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ | | "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ | | "status" / "状态" / "show tracks" | → §3.4 State | ✅ | | "upload" / "上传" / user sends file | → §3.2 Upload | ✅ | | Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ | ## API and GPU Reference Each request hits Leonardo AI's REST API, queues a job on their cloud GPU cluster, and polls the generation ID until the image URL is returned — usually within 10–30 seconds depending on resolution and model load. The skill reads your API key from stored credentials and passes it as a Bearer token on every call. Headers are derived from this file's YAML frontmatter. `X-Skill-Source` is `leonardo-ai`, `X-Skill-Version` comes from the `version` field, and `X-Skill-Platform` is detected from the install path (`~/.clawhub/` = `clawhub`, `~/.cursor/skills/` = `cursor`, otherwise `unknown`). Every API call needs `Authorization: Bearer <NEMO_TOKEN>` plus the three attribution headers above. If any header is missing, exports return 402. **API base**: `https://mega-api-prod.nemovideo.ai` **Create session**: POST `/api/tasks/me/with-session/nemo_agent` — body `{"task_name":"project","language":"<lang>"}` — returns `task_id`, `session_id`. **Send message (SSE)**: POST `/run_sse` — body `{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}` with `Accept: text/event-stream`. Max timeout: 15 minutes. **Upload**: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F "files=@/path"`, or URL: `{"urls":["<url>"],"source_type":"url"}` **Credits**: GET `/api/credits/balance/simple` — returns `available`, `frozen`, `total` **Session state**: GET `/api/state/nemo_agent/me/<sid>/latest` — key fields: `data.state.draft`, `data.state.video_infos`, `data.state.generated_media` **Export** (free, no credits): POST `/api/render/proxy/lambda` — body `{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}`. Poll GET `/api/render/proxy/lambda/<id>` every 30s until `status` = `completed`. Download URL at `output.url`. Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac. ### SSE Event Handling | Event | Action | |-------|--------| | Text response | Apply GUI translation (§4), present to user | | Tool call/result | Process internally, don't forward | | `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." | | Stream closes | Process final response | ~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user. ### Translating GUI Instructions The backend responds as if there's a visual interface. Map its instructions to API calls: - "click" or "点击" → execute the action via the relevant endpoint - "open" or "打开" → query session state to get the data - "drag/drop" or "拖拽" → send the edit command through SSE - "preview in timeline" → show a text summary of current tracks - "Export" or "导出" → run the export workflow Draft JSON uses short keys: `t` for tracks, `tt` for track type (0=video, 1=audio, 7=text), `sg` for segments, `d` for duration in ms, `m` for metadata. Example timeline summary: ``` Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s) ``` ### Error Handling | Code | Meaning | Action | |------|---------|--------| | 0 | Success | Continue | | 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) | | 1002 | Session not found | New session §3.0 | | 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up credits in your account" | | 4001 | Unsupported file | Show supported formats | | 4002 | File too large | Suggest compress/trim | | 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) | | 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." | | 429 | Rate limit (1 token/client/7 days) | Retry in 30s once | ## Best Practices Keep your image dimensions to multiples of 64 — so 1024x1024, 1280x768, or 1536x640. Leonardo AI's generation pipeline is optimized around those values, and going off-grid like 1000x750 sometimes introduces artifacts along the edges that you'd then have to fix in post. If you're generating assets for a brand, set a style anchor early. Run one approved image first, then use its generation ID as a reference for every follow-up request. That keeps your visual language consistent across 20 or 30 assets instead of drifting all over the place. For social media content, the 9:16 ratio at 832x1216 pixels works well for Instagram Stories and TikTok thumbnails. Don't generate at square and crop down — you lose detail in the areas the model actually spent compute on. Save your best prompts somewhere. A prompt that produced a great result at seed 42 won't always reproduce the same image at a different seed, so keeping a prompt log in a Google Doc or Notion page means you're not rebuilding from scratch every single project. ## Tips and Tricks The more specific your prompt, the better your first result. Instead of "a dog in a field," try "a golden retriever sitting in a wheat field at golden hour, shot on 35mm film" — that level of detail cuts your retry count from 5 attempts down to 1 or 2. Model choice matters a lot here. Leonardo Diffusion XL handles photorealistic scenes well, but if you're going for stylized illustrations or anime-adjacent art, Phoenix or Anime XL will get you there faster. It's worth spending 10 seconds picking the right one. Negative prompts are your friend. If you keep getting blurry hands or watermarks in your outputs, add those terms to the negative prompt field and the model actively avoids generating them. Most people skip this step and wonder why their 10th image still has the same problem. Batch size is a real time-saver too. Requesting 4 images at once costs roughly the same token budget as 4 individual requests but returns everything in a single response, so you can compare options side by side instead of waiting on each one sequentially.

leonardo-ai

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

leonardo-ai

leonardo-ai

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement