返回顶部
r

robotics-vla

>

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.1.0
安全检测
已通过
120
下载量
1
收藏
概述
安装方式
版本历史

robotics-vla

# Robotics VLA Skill Expert guidance for building generalist robot policies using Vision-Language-Action (VLA) flow models, based on the π0 architecture. ## Core Architecture **π0 model = VLM backbone + action expert + flow matching** | Component | Detail | |---|---| | VLM backbone | PaliGemma (3B) — provides visual + language understanding | | Action expert | Separate transformer weights (~300M) for robot state + actions | | Total params | ~3.3B | | Action output | Chunks of H=50 actions; 50Hz or 20Hz robots | | Inference speed | ~73ms on RTX 4090 | See `references/architecture.md` for full technical details (attention masks, flow matching math, MoE design). ## Training Pipeline **Two-phase approach (mirrors LLM training):** 1. **Pre-training** → broad physical capabilities + recovery behaviors across many tasks/robots 2. **Fine-tuning** → fluent, task-specific execution on target task Key rule: combining both phases outperforms either alone. Pre-training gives robustness; fine-tuning gives precision. See `references/training.md` for data mixture ratios, loss functions, and fine-tuning dataset sizing. ## Action Representation **Use flow matching, not autoregressive discretization.** - Flow matching models continuous action distributions → essential for high-frequency dexterous control - Autoregressive token prediction (e.g. RT-2 style) cannot produce action chunks efficiently - Action chunks allow open-loop execution at 50Hz without temporal ensembling ## Multi-Embodiment Support Single model handles 7+ robot configurations via: - Zero-padding smaller action spaces to match the largest (17-dim) - Shared VLM backbone; embodiment-specific behavior learned via data - Weighted task sampling: n^0.43 to handle imbalanced data across robot types See `references/embodiments.md` for robot platform specs and action space details. ## High-Level Policy Integration For long-horizon tasks, use a two-tier approach: - **High-level VLM**: decomposes task ("bus the table") → subtasks ("pick up napkin") - **Low-level π0**: executes each subtask as a language-conditioned action sequence Analogous to SayCan. Intermediate language commands significantly boost performance vs. flat task descriptions. ## Related & Complementary Research (2025) π0 has been extended and complemented by several key works. See `references/related-work.md` for the full landscape, including: - **π0-FAST / π0.5 / π0.6** — direct successors with faster training, open-world generalization, and RL fine-tuning - **RTC** — async action chunking to eliminate inference pauses (plug-in, no retraining) - **UniVLA** — unsupervised action extraction from raw video (no action labels needed) - **ManiFlow / Streaming Flow** — smoother action generation - **GR00T N1, Helix, OpenVLA-OFT, DiVLA, RDT-1B** — parallel approaches from NVIDIA, Figure AI, and academia ## Evaluation Checklist When evaluating a robot manipulation policy: - [ ] Out-of-box generalization (no fine-tuning) vs. baselines - [ ] Language following accuracy with flat / human-guided / HL commands - [ ] Fine-tuning efficiency (success rate vs. hours of data) - [ ] Complex multi-stage tasks (5–20 min, recovery from failure) - [ ] Compare: OpenVLA, Octo, ACT, Diffusion Policy as baselines

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 robotics-vla-1775969721 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 robotics-vla-1775969721 技能

通过命令行安装

skillhub install robotics-vla-1775969721

下载 Zip 包

⬇ 下载 robotics-vla v1.1.0

文件大小: 8.45 KB | 发布时间: 2026-4-13 11:49

v1.1.0 最新 2026-4-13 11:49
Add 2025 research landscape: pi0-FAST/0.5/0.6 successors, RTC async chunking, UniVLA unsupervised actions, ManiFlow, GR00T N1, Helix, OpenVLA-OFT

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部