ai-task-hub
# AI Task Hub
Formerly `skill-hub-gateway`.
Public package boundary:
- Only orchestrates `portal.skill.execute`, `portal.skill.poll`, `portal.skill.presentation`, `portal.account.connect`, `portal.account.balance`, and `portal.account.ledger`.
- Does not exchange `api_key` or `userToken` inside this package.
- Does not handle recharge or payment flows inside this package.
- Optional env hints `PUBLIC_BRIDGE_ENTRY_HOST` and `AI_TASK_HUB_ENTRY_HOST` are only host-side `entry_host` fallbacks, not API secrets, auth tokens, or billing credentials.
- Prefers attachment URLs, and when host runtime explicitly exposes attachment bytes for the current request, forwards only that explicit attachment material through the public bridge before execution.
- When the published skill is invoked directly by a third-party agent runtime, it uses `POST /agent/public-bridge/invoke`.
- published skill persistence = disabled.
- continuity owner = `host_or_private_wrapper`.
## Data Handling Boundary
- Only forwards attachment bytes that the host runtime explicitly provides for the current request.
- Off-host media transfer is limited to the gateway-controlled host `https://gateway-api.binaryworks.app`.
- Public upload handoff is limited to `POST /agent/public-bridge/upload-file` for the same request flow.
- Does not read local paths, scan the local filesystem, or guess files outside explicit host-provided attachment material.
- Does not persist uploaded bytes or credentials to local disk, and does not write skill/config state.
- Host/runtime should obtain user consent before forwarding media and should avoid sending sensitive or regulated data unless the user explicitly approved that transfer.
## Read This First
- Do not mix connector lifecycle commands with published skill actions.
- `connect` / `status` / `invoke` / `logout` are connector lifecycle commands for host/runtime installation state.
- `portal.account.connect` / `portal.skill.execute` / `portal.skill.poll` / `portal.skill.presentation` / `portal.account.balance` / `portal.account.ledger` are the published skill actions.
- Do not treat `portal.skill.execute -> portal.skill.poll -> portal.skill.presentation` as the default path for every capability. Default behavior must follow the capability `delivery mode`.
- Public capability inventory below lists only enabled capabilities intentionally advertised to agents. Disabled or internal-only routes may still exist in backend code but are not part of the advertised package surface.
## Two Operational Surfaces
- Connector lifecycle commands:
- `connect`: start or resume browser authorization for one connector installation
- `status`: read whether the connector installation already has continuity
- `invoke`: call one published skill action through the connector runtime
- `logout`: clear continuity for that connector installation
- Published skill actions:
- `portal.account.connect`: explicit account bind or connection-status check
- `portal.skill.execute`: submit a capability run
- `portal.skill.poll`: poll a submitted run
- `portal.skill.presentation`: fetch rendered outputs for a run
- `portal.account.balance`: read current points balance
- `portal.account.ledger`: read points ledger rows
## Public Capability Inventory (Enabled And Available)
- Image analysis:
- `human_detect`
- `image_tagging`
- `face-detect`
- `body-keypoints-2d`
- `face-emotion-recognition`
- Background removal / cutout / mask:
- `person-instance-segmentation`
- `person-semantic-segmentation`
- `concert-cutout`
- `full-body-matting`
- `head-matting`
- `product-cutout`
- Audio:
- `asr`
- `tts_report`
- Document:
- `markdown_convert`
- Retrieval:
- `embeddings`
- `reranker`
- Generation:
- `image-generation`
- Video:
- `Video Face Generation`
## Delivery Mode Guidance
- `instant_result`
- Default host behavior: prefer a result-first entrypoint and return the final result in the same interaction.
- Typical capabilities: `tts_report`, image analysis, retrieval.
- Default product path: use result-first host flow; `portal.skill.execute` is usually enough.
- `aggregated_short_wait`
- Default host behavior: prefer a result-first entrypoint with short internal waiting, then fall back to poll only if the short wait does not complete.
- Typical capabilities: `image-generation`.
- Default product path: start with result-first host flow; only use `portal.skill.poll` when the short wait does not complete.
- `asset_delivery`
- Default host behavior: keep `presentation` or rendered files as part of the formal result surface.
- Typical capabilities: segmentation, matting, cutout.
- Default product path: `portal.skill.execute` -> `portal.skill.poll`, then `portal.skill.presentation` when files or rendered outputs are needed.
- `long_running`
- Default host behavior: treat the capability as an explicit async job and do not promise single-turn completion.
- Typical capabilities: `asr`, `markdown_convert`, `video-face-generation`.
- Default product path: `portal.skill.execute` with later `portal.skill.poll`; use `portal.skill.presentation` only when that capability later exposes a rendered result surface.
## Minimal Architecture Rules
- Default path stays connector-first. Do not invent a host-specific continuity or identity model beside connector/runtime.
- Default path stays result-first. Do not teach every capability as `execute -> poll -> presentation`.
- `portal.skill.poll` and `portal.skill.presentation` are follow-up surfaces, not universal default steps.
- Published skill keeps compatibility and asset-delivery surfaces, but it does not own long-lived continuity.
- Do not ask end users for manual URLs, JSON field names, local paths, or internal bridge details unless they explicitly ask for debugging.
## Non-Recommended Patterns
- Do not bypass connector/runtime as the default host product path when hosted connector or local bootstrap is available.
- Do not describe raw published skill actions as if they were the primary UX for every capability.
- Do not add host-specific auth, continuity, or capability-routing rules outside the existing connector/runtime and gateway contracts.
- Do not expose debugging payload structure, bridge layering, or upload choreography to end users unless troubleshooting is explicitly requested.
## Official Host Integration
- Current built-in host integrations for `mobileclaw`, `openclaw`, `codex`, and `claude` require the AI Task Hub connector/runtime to own continuity from the first call.
- Future public hosts should follow the same connector/runtime continuity contract instead of inventing a host-specific identity model.
- For remote URL / OAuth / connection-record style hosts, use the hosted connector runtime: `POST /agent/hosted-connector/install`, `POST /agent/hosted-connector/connect`, `POST /agent/hosted-connector/invoke`, `POST /agent/hosted-connector/status`, `POST /agent/hosted-connector/logout`.
- For local command-only hosts, use the shared connector/runtime bootstrap outside this published package.
- `POST /agent/public-bridge/invoke` is the underlying transport used by that connector/runtime, and also the compatibility fallback when a host bypasses the connector during debugging or manual integration.
- Trusted host runtime that can safely hold bridge assertion secret may still use `POST /agent/skill/bridge/invoke`.
- published skill persistence remains disabled even when connector/runtime is present; long-lived continuity belongs in host or connector state, not in this package.
## User-Facing Response Policy
- When users upload images, audio, documents, or video and ask for a capability, prefer executing immediately only when the host runtime has already supplied an explicit attachment object or explicit attachment bytes for that request.
- Do not explain `image_url`, `attachment.url`, storage URLs, bridge layers, host uploads, input normalization, or controlled media domain details to end users unless they explicitly ask for technical debugging.
- Do not ask end users to provide manual URLs, JSON field names, or upload-chain instructions; those are internal host-to-skill mechanics.
- If the runtime supports attachment handling, limit processing to the explicit attachment object supplied for the current request and keep the upload/URL handoff scoped to execute/poll/presentation for that same request.
- Only when execution actually fails and the user must intervene should you mention missing processable files, incomplete authorization, or retry guidance, using user-oriented language without exposing internal layering.
Chinese documentation: `SKILL.zh-CN.md`
## When to Use This Skill
Use this skill when the user asks to:
- detect faces, human presence, body keypoints, image tags, or facial emotion from images
- generate person/product segmentation, mask, cutout, or matting outputs that this public package explicitly exposes
- transcribe uploaded audio into text (`speech to text`, `audio transcription`)
- generate speech from text input (`text to speech`, `voice generation`)
- convert uploaded files into markdown (`document to markdown`)
- start async jobs and check status later (`poll`, `check job status`)
- fetch rendered visual outputs such as `overlay`, `mask`, and `cutout`
- run embedding or reranking tasks for retrieval workflows
- check current account points balance or recent points ledger rows
## Common Requests
Example requests that should trigger this skill:
- "Detect faces in this image and return bounding boxes."
- "Tag this image and summarize the main objects."
- "Remove the background from this product photo."
- "Create a clean cutout from this portrait image."
- "Transcribe this meeting audio into text."
- "Generate speech from this paragraph."
- "Convert this PDF file into markdown."
- "Start this job now and let me poll the run status later."
- "Fetch overlay and mask files for run_456."
- "Generate embeddings for this text list and rerank the candidates."
- "Check my current points balance."
- "Show my recent points ledger from 2026-03-01 to 2026-03-15."
## Search-Friendly Capability Aliases
- `vision` aliases: face detection, human detection, person detection, image tagging
- `background` aliases: remove background, background removal, cutout, matting, product-cutout
- `asr` aliases: speech to text, audio transcription, transcribe audio
- `tts` aliases: text to speech, voice generation, speech synthesis
- `markdown_convert` aliases: document to markdown, file to markdown, markdown conversion
- `poll` aliases: check job status, poll long-running task, async run status
- `presentation` aliases: rendered output, overlay, mask, cutout files
- `account.balance` aliases: points balance, credits balance, remaining points
- `account.ledger` aliases: points ledger, credits history, points statement
- `embeddings/reranker` aliases: vectorization, semantic vectors, relevance reranking
Public discovery boundary for visual capabilities:
- This published skill only advertises visual capabilities whose backing services are currently enabled for public delivery.
- Disabled or internally retained legacy routes are intentionally omitted from discovery references and capability manifests even if related backend code still exists.
## Runtime Contract
Default API base URL: `https://gateway-api.binaryworks.app`
Published package policy: outbound base URL is locked to the default API base URL to reduce token exfiltration risk.
Action to endpoint mapping:
- `portal.skill.execute` -> `POST /agent/skill/execute`
- `portal.skill.poll` -> `GET /agent/skill/runs/:run_id`
- `portal.skill.presentation` -> `GET /agent/skill/runs/:run_id/presentation`
- `portal.account.connect` -> `POST /agent/public-bridge/invoke` (explicit connect/status check only)
- `portal.account.balance` -> `GET /agent/skill/account/balance`
- `portal.account.ledger` -> `GET /agent/skill/account/ledger`
## Install Mechanism & Runtime Requirements
- This skill is instruction-first and does not define a remote installer flow.
- Runtime execution is limited to bundled local scripts under `scripts/*.mjs`.
- Required runtime binary is `node` (as declared in `metadata.openclaw.requires.bins`).
- No remote download-to-exec install chain is used (`curl|wget ... | sh|bash|python|node` is not part of this package).
## Auth Contract
Third-party agent entry mode (official host integrations should route through connector/runtime):
- Officially supported host integrations should install or provision the connector/runtime and let it manage `entry_user_key`; do not ask end users to manage continuity manually.
- If a host bypasses connector/runtime and invokes the published skill directly, use `POST /agent/public-bridge/invoke` and persist the same `entry_user_key` outside this published skill package.
- Do not require end users to provide any credential.
- Use `portal.account.connect` when host/runtime wants an explicit browser-connect preflight instead of waiting for a protected action to fail.
- Connector/runtime remains required for the official host integration path even when browser authorization is not yet required.
- With `TRIAL_ENABLED` and available trial points, first-time calls may proceed without browser authorization.
- On first use without an existing binding, gateway can proceed without browser authorization when TRIAL_ENABLED and trial points are available; `AUTHORIZATION_REQUIRED` is returned only for conditional upgrade paths (for example trial exhausted or trial-disabled rollback).
- The returned `authorization_url` may include `gateway_api_base_url`; preserve it when completing browser authorization so `/agent-auth/complete` is posted back to the same API environment that created the auth session.
- Host/runtime should show `authorization_url` to the user, persist `entry_user_key`, then retry the same action with that same `entry_user_key`.
- Connector/runtime must preserve that same `entry_user_key` before and after browser authorization.
- When `AUTHORIZATION_REQUIRED` or `portal.account.connect` returns `connector_install`, treat it as the official npm connector/runtime guidance for hosts that have not yet provisioned the supported continuity layer.
- `connector_install` refers to the official connector package outside this published skill package.
- That connector is the required continuity owner for officially supported host integrations, even though account authorization may still be deferred until trial exhaustion or policy upgrade.
- For OpenClaw/MobileClaw-style local hosts, follow the connector guide referenced by `connector_install.guide_url`, then continue the same browser `authorization_url` flow with the same `entry_user_key`.
- If gateway later returns `AUTHORIZATION_REQUIRED` with `details.likely_cause=ENTRY_USER_KEY_NOT_REUSED`, `details.recovery_action=REUSE_ENTRY_USER_KEY`, and `details.reauthorization_required=false`, host should restore the previously persisted `entry_user_key` and retry without sending the user through browser authorization again.
Identifier format constraints used by gateway auth:
- `agent_uid` must match `^agent_[a-z0-9][a-z0-9_-]{5,63}$`.
- `conversation_id` must match `^[A-Za-z0-9._:-]{8,128}$`.
- In deployed bridge mode, host may pass its own stable runtime agent identifier and the gateway bridge will canonicalize it server-side.
Host-side token bridge (outside published package):
- To keep this package compliant and low-privilege, this published runtime does not issue or accept caller-managed task tokens.
- Preferred deployed bridge endpoint for third-party agent entry: connector/runtime should call `POST /agent/public-bridge/invoke`.
- Trusted host runtime that can safely hold bridge assertion secret may continue to use `POST /agent/skill/bridge/invoke`.
- These bridge endpoints are served by gateway runtime, not bundled into this published package, and do not require caller-managed credentials.
- published skill persistence = disabled; continuity must stay in `host_or_private_wrapper`, not inside this published package.
- Bridge request body should include `action`, `agent_uid`, `conversation_id`, and optional `payload`.
- `conversation_id` should be a host-generated opaque session/install identifier, not a public chat ID, raw thread ID, or PII.
- Public bridge should resolve a stable external user binding when available; if the binding is missing and trial conditions are satisfied, first-time onboarding can continue without browser authorization, while conditional upgrade paths return a host-owned authorization URL plus `entry_user_key`.
- Cross-conversation account continuity requires reusing the same `entry_user_key`; public bridge intentionally does not accept owner overrides.
- Gateway bridge will canonicalize `agent_uid`, repair binding when missing, issue short-lived internal task token, and run the action server-side.
- `portal.skill.execute` through public bridge is write-capable and should send `options.confirm_write=true` after user confirmation; otherwise gateway may return `ACTION_CONFIRMATION_REQUIRED`.
- `base_url`, `gateway_api_key`, `api_key`, `user_token`, `agent_task_token`, `owner_uid_hint`, and `install_channel` overrides are rejected by the deployed bridge endpoint.
- Recommended host behavior: persist `entry_user_key`, normalize `agent_uid`, and re-run the same bridge action after authorization completes.
Host integration modes:
- `connector-managed interactive` (recommended): connector/runtime calls `POST /agent/public-bridge/invoke`, surfaces the returned host-owned authorization URL to the user when needed, persists returned `entry_user_key`, and retries after authorization completes.
- `trusted host bridge` (secondary): a trusted backend you control may call `POST /agent/skill/bridge/invoke` with its own bridge assertion secret.
- Published skill package itself does not open browser, persist credentials, or perform OAuth/token exchange flows.
- The authorization URL above is owned by deployed gateway/admin-web pages, not by this skill package runtime.
- Successful public bridge responses add `data.agent_guidance.bridge_auth` with `continuity_owner=host_or_private_wrapper`, `published_skill_persistence=disabled`, and the returned `bridge_context`.
- Public bridge failures that include entry context add `error.details.bridge_auth` so host/runtime can recover continuity outside the published skill package.
## Compatibility and Debug Transport Reference
Default product path for official hosts remains connector-first and result-first.
Use the raw transport reference below only when:
- a host is in compatibility mode and can preserve the same `entry_user_key` itself
- a trusted backend is integrating directly
- debugging requires checking the raw bridge contract
Preferred raw transport for third-party agent entry (normally owned by connector/runtime, and also usable as a compatibility fallback when a host can preserve continuity itself):
- Deployed bridge API:
```json
{
"entry_host": "<host_runtime>",
"action": "portal.account.balance",
"agent_uid": "support_assistant",
"conversation_id": "host_session_20260316_opaque_001",
"payload": {}
}
```
- Send that body to `POST /agent/public-bridge/invoke`.
- `entry_host` must match the active host runtime. Current built-in examples are `mobileclaw`, `openclaw`, `codex`, `claude`; future lowercase host slugs can follow the same contract when connector/runtime is configured for them.
- Example mappings: MobileClaw -> `mobileclaw`, OpenClaw -> `openclaw`, Codex -> `codex`, Claude -> `claude`.
- If the host cannot pass `entry_host` explicitly, export `PUBLIC_BRIDGE_ENTRY_HOST` or `AI_TASK_HUB_ENTRY_HOST`, or install the package under the host-specific runtime path so host inference stays correct.
- This is the underlying production transport for connector-managed third-party integration.
- With `TRIAL_ENABLED` and available trial points, first-time onboarding can complete without browser authorization.
- On first use, gateway may return `AUTHORIZATION_REQUIRED` with `authorization_url` and `entry_user_key` only when conditional authorization upgrade is required (for example trial exhausted).
- Persist `entry_user_key` and retry with the same value after user authorization completes.
- Preserve any `gateway_api_base_url` embedded in the authorization flow so the completion request lands on the same gateway API environment.
- `agent_uid` should be your host-defined stable runtime agent identifier.
- `conversation_id` should be your host-generated opaque session/install identifier; it is not tied to Telegram or any single tool and does not determine account ownership.
- Use the same `entry_user_key` across conversations when those conversations should share one account.
Trusted host runtime secondary mode:
- If you control the upstream backend and it can safely hold bridge assertion secret, use `POST /agent/skill/bridge/invoke`.
- This path is for trusted host runtime only, not OpenClaw / MobileClaw / Codex / Claude style third-party entry.
Action payload templates (same for public bridge and trusted host bridge mode):
- `portal.skill.execute`
```json
{
"capability": "human_detect",
"input": { "image_url": "https://files.example.com/demo.png" },
"request_id": "optional_request_id"
}
```
- `portal.skill.poll`
```json
{ "run_id": "run_123" }
```
- `portal.skill.presentation`
```json
{ "run_id": "run_123", "channel": "web", "include_files": true }
```
- `portal.account.connect`
```json
{ "connect_mode": "browser", "auth_session_id": "optional_existing_auth_session" }
```
- `portal.account.balance`
```json
{}
```
- `portal.account.ledger`
```json
{ "date_from": "2026-03-01", "date_to": "2026-03-15" }
```
Agent-side decision flow:
- For official host integration, prefer connector install/connect/invoke/status/logout or the hosted connector lifecycle over direct published-skill invocation.
- Only bypass connector/runtime and call `POST /agent/public-bridge/invoke` directly when you are in a compatibility or debugging path and can preserve the same `entry_user_key` yourself.
- Default capability path must follow `delivery mode`, not a universal three-step recipe:
- `instant_result`: prefer result-first flow; use `portal.skill.execute` only when dropping to published skill actions.
- `aggregated_short_wait`: prefer result-first flow; fall back to `portal.skill.poll` only if the short wait does not complete.
- `asset_delivery`: use `portal.skill.presentation` only when the capability needs formal rendered assets.
- `long_running`: keep explicit async expectations and use `portal.skill.poll` as the normal follow-up.
- Explicit account linking: call `portal.account.connect`, surface the returned `authorization_url` when present, and keep reusing the same `entry_user_key`.
- Account query: call `portal.account.balance` or `portal.account.ledger` directly.
- Keep `conversation_id` as session context only; do not use it as the account key.
- For cross-conversation continuity in third-party entry mode, persist and reuse the same `entry_user_key`; do not pass `owner_uid_hint` to the public bridge endpoint.
- If `AUTHORIZATION_REQUIRED` is returned, show `authorization_url`, persist `entry_user_key`, then retry the same action after user authorization completes.
- If `AUTHORIZATION_REQUIRED` includes `details.likely_cause=ENTRY_USER_KEY_NOT_REUSED`, do not open a new auth flow yet; first restore the previously persisted `entry_user_key` and retry the same bridge call.
- Treat `details.reauthorization_required=false` as a recovery hint that browser re-login is unnecessary for this failure mode.
- If `AUTH_UNAUTHORIZED` + `agent_uid claim format is invalid`: use canonical `agent_uid` (`agent_...`) instead of a short host alias (`assistant`, `planner`).
- If `SYSTEM_NOT_FOUND` + `agent binding not found`: restart the same bridge flow once and let gateway repair binding.
Output parsing contract:
- Always parse standard gateway envelope: `request_id`, `data`, `error`.
- Treat non-empty `error` as failure even when HTTP tooling hides status code.
## Visualization Playbooks (Agent Guidance)
- For successful visual actions (`portal.skill.execute`, `portal.skill.poll`, `portal.skill.presentation`), the script enriches responses with `data.agent_guidance.visualization.playbook`.
- Playbook mapping covers the visual capabilities currently exposed by this published skill (detection/classification/keypoints/segmentation/matting families).
- For `image-generation`, user delivery should be image-first: present the generated image itself and omit structured fields unless the user explicitly asks for source data or debugging details.
- Global rendering guardrail for all visual capabilities:
- Must use skill-native rendered assets first (`overlay`/`mask`/`cutout`/`view_url`) when available.
- Manual local drawing fallback is disabled by default (`allow_manual_draw=false`) to avoid inconsistent agent-side rendering.
- If rendered assets are missing, fallback is summary-only from structured output (`raw`/`visual.spec`), not local drawing.
- Example special rule:
- `body-contour-63pt` -> when both rendered assets and geometry are absent, playbook marks `status=degraded` and recommends fallback capability `body-keypoints-2d`.
## Payload Contract
- `portal.skill.execute`: payload requires `capability` and `input`.
- `payload.request_id` is optional and passed through.
- `portal.skill.poll` and `portal.skill.presentation`: payload requires `run_id`.
- `portal.skill.presentation` supports `include_files` (defaults to `true`).
- `portal.account.connect`: payload may include `connect_mode` and optional `auth_session_id` when host/runtime is checking an existing browser bind.
- `portal.account.balance`: payload is optional and ignored.
- `portal.account.ledger`: payload may include `date_from` + `date_to` (`YYYY-MM-DD`, must be provided together).
Attachment normalization:
- Prefer explicit `image_url` / `audio_url` / `file_url` / `video_url`.
- `attachment.url` is mapped to target media field by capability.
- When host runtime exposes attachment bytes, this published package forwards only that explicit attachment material through the public bridge and injects the returned URL before execute.
- There is no separate `portal.upload` action in this package; for third-party agent entry, callers should keep using `portal.skill.execute`, and the bundled runtime will only forward explicit attachment bytes already supplied by the host for the current request.
- If a host bypasses the bundled auto-upload helper and implements upload itself, use `POST /agent/public-bridge/upload-file` for third-party/public entry, not `POST /agent/skill/bridge/upload-file`.
- Local `file_path` handling is disabled in the published public skill.
- The runtime does not scan the local filesystem, guess file locations, expand directories/globs, or read local paths from `payload.file_path`, `input.file_path`, `attachment.path`, or `attachment.file_path`.
- Arbitrary unmanaged local filesystem access remains unsupported; hosts should provide bytes or a bridge-managed URL instead.
- Example host upload endpoint: `/agent/public-bridge/upload-file`.
- `Video Face Generation` requires 2 uploaded files from the user before execution:
- source video -> `input.video_url`
- merge face image -> `input.merge_infos[0].merge_face_image.url`
- If either required file is missing, agent should ask the user to upload both files first.
- Prefer a short source video for testing or smoke runs because these video-generation jobs are asynchronous and slower than image-only tasks.
- Do not rely on a single `attachment.url` auto-mapping for `Video Face Generation`; host must pass both structured URL fields explicitly.
## Error Contract
- Preserve gateway envelope: `request_id`, `data`, `error`.
- Preserve `POINTS_INSUFFICIENT` and pass through `error.details.recharge_url`.
## Bundled Files
- `scripts/skill.mjs`
- `scripts/agent-task-auth.mjs`
- `scripts/base-url.mjs`
- `scripts/attachment-normalize.mjs`
- `scripts/telemetry.mjs` (compatibility shim)
- `references/capabilities.json`
- `references/openapi.json`
- `SKILL.zh-CN.md`
标签
skill
ai