orche

# 🎼 Orche — Multi-Phase Orchestration Engine > A multi-agent orchestration engine that executes complex tasks in **Query → Plan (Debate) → Execute → Verify** — 4 phases. > A sub-agent panel debates, critiques, executes, and verifies, while the watchdog monitors the entire process. --- ## 📌 Table of Contents 1. [Why Orche?](#-why-orche) 2. [Comparison with Existing Orchestration Skills](#-comparison-with-existing-orchestration-skills) 3. [Quick Start — First Orchestration in 5 Minutes](#-quick-start--first-orchestration-in-5-minutes) 4. [Phase Overview](#-phase-overview) 5. [Phase 0: Query](#phase-0-query) 6. [Phase 1: Planning](#phase-1-planning) 7. [Phase 2: Execution](#phase-2-execution) 8. [Phase 3: Verification](#phase-3-verification) 9. [State Management](#-state-management) 10. [Directory Structure](#-directory-structure) 11. [Harness Rules (Safety Mechanisms)](#-harness-rules-safety-mechanisms) 12. [Watchdog Integration](#-watchdog-integration) 13. [Hallucination Checklist (14 Items)](#-hallucination-checklist-14-items) 14. [Cost Management](#-cost-management) 15. [Exception Handling Summary](#-exception-handling-summary) 16. [User Intervention Points](#-user-intervention-points) 17. [Abort Procedure](#-abort-procedure) 18. [Session Disconnection Recovery](#-session-disconnection-recovery) 19. [Demo Scenarios](#-demo-scenarios) 20. [Production Run Records](#-production-run-records) 21. [hallucination-guard Integration](#-hallucination-guard-integration) 22. [Advanced Configuration](#-advanced-configuration) 23. [FAQ](#-faq) 24. [Changelog](#-changelog) --- ## 🎯 Why Orche? ### Problem: Limitations of Existing Multi-Agent Execution When complex tasks are delegated to AI agents, the following problems arise: | Problem | Symptom | Result | |---------|---------|--------| | **Hallucination** | Uses non-existent APIs/libraries | Non-executable code | | **Execution without planning** | Starts writing code immediately | Direction change costs explode | | **No verification** | Deliverables "look plausible" | Don't actually work | | **Linear-only progression** | Keeps going forward even on failure | Root cause unresolved | | **Cost explosion** | Indiscriminate agent deployment | Budget exhaustion | ### Solution: Orche's 4-Phase Engine ``` ┌─────────────────────────────────────────────────────────────┐ │ │ │ Phase 0 Phase 1 Phase 2 Phase 3 │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │Query │─G0─▶│ Plan │──G1─▶│Execute│─G2─▶│Verify│ │ │ │ │ │(Debate)│ │(Parall)│ │ │ │ │ └──────┘ └──────┘ └──────┘ └──┬───┘ │ │ │ ▲ │ │ │ │ │ fail? │ │ │ └──── YES ◀──┘ │ │ │ │ │ └── Watchdog monitors the entire process ───────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` **Key Differentiators:** - ✅ **Phase Gates**: Explicit condition fulfillment required at each phase transition - ✅ **Critic Debate**: Advocate vs Critic debate improves plan quality - ✅ **14-Item Hallucination Check**: Fact, consistency, completeness, and hallucination pattern verification - ✅ **Auto Regression**: Automatically returns to Phase 2 on verification failure - ✅ **Watchdog Integration**: Monitors entire process, auto-detects deadlocks/loss/budget overruns - ✅ **Cost Management**: 4-tier budget alerts + concurrent agent limit --- ## 📊 Comparison with Existing Orchestration Skills | Feature | `dispatching-parallel-agents` | `parallel-agent-management` | `executing-plans` | **`orche`** | |---------|:---:|:---:|:---:|:---:| | **Parallel execution** | ✅ | ✅ | ❌ (sequential) | ✅ | | **Pre-query phase** | ❌ | ❌ | ❌ | ✅ Phase 0 | | **Debate/discussion** | ❌ | ❌ | ❌ | ✅ Includes Critic | | **Phase gates** | ❌ | ❌ | ✅ (checkpoints) | ✅ 4 gates | | **Hallucination check** | ❌ | ❌ | ❌ | ✅ 14 items | | **Auto regression on verification failure** | ❌ | ❌ | ❌ | ✅ Up to 3 retries | | **Watchdog integration** | ❌ | ❌ | ❌ | ✅ | | **Cost management** | ❌ | ❌ | ❌ | ✅ 4-tier budget | | **Session disconnection recovery** | ❌ | ❌ | ❌ | ✅ State file based | | **Best for** | Simple parallel tasks | Codebase splitting | Sequential plan execution | Complex multi-step projects | ### When to Use Orche? **✅ Orche is appropriate when:** - Complex tasks requiring 3+ different roles - Tasks where output accuracy matters (research, analysis, code design) - Tasks where hallucination is critical - Tasks with interdependent deliverables **❌ Other skills are better when:** - Simple 2-3 independent tasks in parallel → `dispatching-parallel-agents` - File-by-file codebase splitting → `parallel-agent-management` - Sequential execution with an already-confirmed plan → `executing-plans` --- ## 🚀 Quick Start — First Orchestration in 5 Minutes ### Step 1: Install the Skill (30 Seconds) Place the skill file in the OpenClaw skills directory: ```bash # Copy to skills directory mkdir -p ~/.agents/skills/orche cp SKILL.md ~/.agents/skills/orche/SKILL.md ``` ### Step 2: Trigger (10 Seconds) Enter the `/orche` command in chat and describe your task: ``` /orche "Write a market analysis report on Korean AI startups" ``` Or in natural language: ``` Orchestrate this project systematically ``` ### Step 3: Phase 0 — Answer the Queries (2 Minutes) Orche will ask 3~10 questions. Answer clearly: ``` 🎼 /orche started — Phase 0: Query 📋 Questions to clarify requirements: 1. 📐 Scope: All AI fields? Specific segment (LLM, robotics, etc.)? 2. 🎯 Priority: Market size vs competitive analysis vs investment trends? 3. 📏 Success criteria: Report length? Number of data sources? 4. 🔧 Technical constraints: Specific data sources needed? 5. ⏰ Deadline: Is there a due date? ``` ### Step 4: Auto-Progress (Remaining) Once you answer, Orche automatically: - **Phase 1**: Builds plan through expert + Critic debate - **Phase 2**: Sub-agents research and write in parallel - **Phase 3**: Verification team runs hallucination check + cross-validation ### Step 5: Receive Results ``` 🎼 /orche Completion Report 📋 Task: orche-1711432800 ⏱️ Total duration: 23 minutes 📊 By phase: Query 3m → Plan 5m → Execute 10m → Verify 5m 🤖 Agents deployed: 11 ✅ Tasks completed: 5/5 📁 Deliverables: - workspace/orche-1711432800/requirements.md - workspace/orche-1711432800/final-plan.md - workspace/orche-1711432800/tasks/ - workspace/orche-1711432800/verification/final-verdict.md ``` --- ## 🔄 Phase Overview ``` Phase 0: Query — Clarify requirements (3~10 questions) Phase 1: Plan — 3~7 sub-agents debate → finalize plan Phase 2: Execute — Split into tasks → 3~8 sub-agents execute in parallel Phase 3: Verify — 3~5 verifiers deployed → regress to Phase 2 if issues found ``` ### Phase Gate System Each phase transition requires **gate conditions** to be met before proceeding. Automatic blocking on non-compliance. | Gate | Transition | Key Conditions | |------|-----------|----------------| | **G0** | Phase 0 → 1 | Requirements structured + ambiguous expressions removed + user approval | | **G1** | Phase 1 → 2 | final-plan.md created + hallucination check passed + no circular dependencies | | **G2** | Phase 2 → 3 | All tasks completed + deliverables exist + no zombie agents | | **Verification** | Phase 3 → Done | 14-item hallucination check passed + all required deliverables present | **On gate failure:** - G0 not passed → Phase 1 entry blocked. Re-confirmation requested from user. - G1 not passed → Phase 2 entry blocked. Plan re-debate forced. - G2 not passed → Phase 3 entry blocked. Missing tasks re-executed. - Verification fail → Auto regression to Phase 2 (up to 3 times). --- ## Phase 0: Query ### Purpose Remove ambiguity from user instructions and specify them to an actionable level. ### Procedure 1. Analyze original request → identify missing/ambiguous areas 2. Generate 3~10 questions (at least 1 from each of 6 categories below) 3. Collect user responses → finalize requirements document (max 2 additional rounds) 4. On user approval → save `requirements.md` ### Question Categories | Category | Example | |----------|---------| | **Scope** | What's in scope? What should be excluded? | | **Technical constraints** | Language/framework/environment limitations? | | **Priority** | What matters most? (speed/quality/cost) | | **Success criteria** | How to judge completion? Deliverables? | | **Dependencies** | External systems/APIs/data? | | **Risk** | Impact of failure? Rollback needed? | ### Gate G0: Phase 0 → 1 Transition Conditions - ✅ Requirements are structured (goal, constraints, deliverables specified) - ✅ Ambiguous expressions removed ("appropriately", "etc." type expressions eliminated) - ✅ Feasibility assessment complete (required tools/access confirmed) - ✅ User confirmation received ### Phase 0: When Deemed Infeasible If G0 feasibility check determines the task is **impossible**: 1. Report specific reasons to user 2. Suggest alternatives (if possible) 3. Await user decision → modification instructions or termination 4. On termination → `orche-state.json` status → `"rejected"` --- ## Phase 1: Planning ### Debate Panel Composition (3~7 members) | Role | Responsibility | |------|---------------| | **Domain Expert** (1~2) | Draft plan based on domain expertise | | **Advocate** | Strengthen plan merits, argue feasibility | | **Critic** | Point out weaknesses/risks/gaps + hallucination check | | **Moderator** | Synthesize debate, build consensus, write final plan | ### Sub-Agent Spawning ``` sessions_spawn: task: "<role-specific prompt>" label: "orche-<role>-<taskId>" model: "anthropic/claude-sonnet-4-6" # Default (use opus for high complexity) mode: "run" ``` > **Model Selection Guide:** > - Planning/debate: `sonnet` (cost-efficient) > - Complex domain experts: `opus` (when deep reasoning needed) > - Simple execution tasks: `sonnet` or `haiku` (fast processing) ### Debate Protocol (Round-based, File-async) **Round 1:** 1. Expert agents → spawn in parallel → each writes a proposal 2. Advocate → analyze strengths of drafts 3. Critic → point out weaknesses + hallucination check: - [ ] References to non-existent APIs/libraries? - [ ] Unrealistic performance/time estimates? - [ ] Logical contradictions? - [ ] Missing dependencies/prerequisites? 4. Moderator → synthesize → write `final-plan.md` If unresolved issues remain → **Round 2** (up to 3 rounds max). ### Gate G1: Phase 1 → 2 Transition Conditions - ✅ `final-plan.md` created - ✅ Task list + dependencies + success criteria specified - ✅ Hallucination check passed - ✅ No circular dependencies - ✅ Rollback strategy defined for each task --- ## Phase 2: Execution ### Task Splitting Split from `final-plan.md` into independently executable task units: ```json { "id": "task-1", "title": "...", "description": "...", "dependencies": [], "assignedAgent": null, "status": "pending", "output": null } ``` **Task Status Flow:** `pending → running → completed / failed` ### Dependency-Based Scheduling - **Independent tasks** → spawn in parallel immediately - **Dependent tasks** → spawn after predecessor completion - **Max concurrent agents:** 8 (cost explosion prevention: **3 concurrent recommended**) ### Required Inclusions in Execution Agent Prompts All execution agent prompts must include the following: 1. Overall plan summary (brief) 2. Detailed task specification 3. Predecessor task deliverables (file paths) 4. Deliverable save path 5. Success criteria 6. Harness rules (see below) ### Immediate Checks on Task Completion - [ ] Deliverable file exists? - [ ] Not empty? - [ ] Success criteria met? - [ ] No obvious hallucinations? - On failure → retry that task only (max 2 times). 2 failures → alert user. ### Gate G2: Phase 2 → 3 Transition Conditions - ✅ All tasks `completed` or `skipped_with_reason` - ✅ All deliverable files exist + non-empty - ✅ No zombie agents/sessions - ✅ No critical errors --- ## Phase 3: Verification ### Verification Team Composition (3~5 members) | Role | Responsibility | |------|---------------| | **Completeness Verifier** | All tasks executed, nothing missing | | **Accuracy Verifier** | Deliverables match task spec + success criteria | | **Hallucination Verifier** | Detect factual errors, logical contradictions, non-existent references | | **Integration Verifier** | Cross-deliverable consistency, interface compatibility | | **Final Reviewer** (optional) | Final review against original requirements | ### Verification Result Handling | Result | Action | |--------|--------| | **All PASS** | Phase 3 complete → normal termination | | **HIGH issue** | Regress to Phase 2 (re-execute affected tasks only) | | **MED issue** | Report to user, confirm re-execution | | **LOW issue** | Record in report, proceed | ### Regression Conditions (Phase 3 → Phase 2) Automatic regression to Phase 2 if any of the following apply: - `hallucination_score >= 2` (2+ failures in 14-item check) - Required deliverable missing/corrupted - Test failure rate > 30% - User-mandatory conditions unmet - Mutual contradiction between deliverables ### Regression Limits - `retryCount` maximum **3 times** - Same reason repeated 2 times → escalate to user - Token budget 80% consumed → escalate to user --- ## 📦 State Management All orchestration state is recorded in `workspace/orche-state.json`. The watchdog and each phase share this file to track progress. ```json { "taskId": "orche-<timestamp>", "phase": 0, "status": "active", "startedAt": "<ISO>", "phaseStartedAt": "<ISO>", "requirements": null, "plan": null, "tasks": [], "agents": [], "watchdog": { "enabled": true, "lastCheck": null }, "retryCount": 0, "maxRetries": 3, "tokenBudget": { "total": 500000, "used": 0, "currency": "tokens" }, "errors": [] } ``` ### State Field Descriptions | Field | Type | Description | |-------|------|-------------| | `taskId` | string | Unique identifier (`orche-<unix_timestamp>`) | | `phase` | number | Current phase (0~3) | | `status` | string | `active` / `completed` / `aborted` / `rejected` | | `requirements` | object | Requirements finalized in Phase 0 | | `plan` | object | Plan finalized in Phase 1 | | `tasks` | array | Phase 2 task list + status | | `agents` | array | Currently active sub-agent session key list | | `retryCount` | number | Phase 3 → 2 regression count | | `tokenBudget` | object | Token budget management | | `errors` | array | Error log | --- ## 📁 Directory Structure Each orchestration creates an independent working directory: ``` workspace/orche-<taskId>/ ├── requirements.md ← Phase 0 deliverable ├── round-1/ ← Phase 1 debate records │ ├── expert-1-proposal.md ← Expert proposal │ ├── expert-2-proposal.md ← (if applicable) │ ├── advocate-review.md ← Advocate review │ ├── critic-review.md ← Critic review │ └── moderator-summary.md ← Moderator synthesis ├── round-2/ ← (2nd round debate if needed) ├── final-plan.md ← Phase 1 final plan ├── tasks/ ← Phase 2 deliverables │ ├── task-1/ │ │ └── output.md │ ├── task-2/ │ │ └── output.md │ └── ... └── verification/ ← Phase 3 verification results ├── completeness-report.md ← Completeness verification ├── accuracy-report.md ← Accuracy verification ├── hallucination-report.md ← Hallucination verification ├── integration-report.md ← Integration verification └── final-verdict.md ← Final verdict ``` --- ## 🛡️ Harness Rules (Safety Mechanisms) ### On Orche Start — Execute Immediately 1. **Watchdog registration:** Register task in `watchdog.md` (when using OpenClaw watchdog) 2. **State initialization:** Create `orche-state.json` 3. **Report start to user** ### Phase Gate Harness (Mandatory Check Before Each Phase Transition) ``` G0 not passed → Phase 1 entry blocked. Re-confirmation requested from user. G1 not passed → Phase 2 entry blocked. Plan re-debate forced. G2 not passed → Phase 3 entry blocked. Missing tasks re-executed. Verification fail → Auto regression to Phase 2 (retryCount < 3). ``` ### Sub-Agent Harness Rules that must be included in every sub-agent spawn prompt: ``` [Harness Rules] 1. You must save results to the designated file path upon completion 2. Mark uncertain facts with "needs verification:" — no fabrication 3. Do not use non-existent APIs/libraries 4. On task failure, record failure reason in output file and terminate ``` These rules are **injected into all** sub-agents. This ensures: - File-based communication is guaranteed (Rule 1) - Hallucinations are caught early (Rules 2, 3) - Failure causes are traceable (Rule 4) ### Completion Harness ``` On Phase 3 completion, you must: 1. Set orche-state.json status → "completed" 2. Mark task [x] in watchdog.md (if using watchdog) 3. Send final report to user ``` --- ## 🔍 Watchdog Integration Orche integrates with OpenClaw's watchdog system to monitor the entire process. ### What the Watchdog Does | Detection Item | Auto Response | |---------------|---------------| | Agent 15min+ no change (hang) | kill + re-spawn | | Empty deliverable (file size 0) | Retry with reinforced prompt | | All agents lost (deadlock) | Immediate user notification | | Token budget 80% reached | User warning | ### Watchdog Registration Method Register the task in `watchdog.md` at Orche start: ```markdown - [ ] orche <taskId> — Phase 0 (Query) session_keys: agent:main:subagent:xxxx state_file: workspace/orche-state.json evidence_paths: - workspace/orche-<taskId>/requirements.md - workspace/orche-<taskId>/final-plan.md done_when: - orche-state.json status == "completed" ``` ### Watchdog Design Principles > These principles reflect lessons learned in production. - ✅ Read only `watchdog.md` → query `sessions_list` using `session_keys` - ✅ Do not judge failure by "no session" alone — if deliverables/state files exist, assume progress/completion first - ✅ Judge normal termination only when: no tasks + 0 active sessions - ⚠️ If anything is active, report status and wait > **No watchdog in your environment?** Orche works without a watchdog. > The watchdog is an **additional safety net**; phase gates and harness rules function as core safety mechanisms without it. --- ## ✅ Hallucination Checklist (14 Items) Used by Phase 1's Critic and Phase 3's verification team. ### Factual Verification | ID | Item | Verification Method | |----|------|-------------------| | **H-1** | File path existence | Verify with `stat()` or `ls` | | **H-2** | Command existence | Verify with `which` or `command -v` | | **H-3** | URL validity | Verify with HTTP request (optional) | | **H-4** | Code syntax validity | Grammar check with linter/parser | | **H-5** | Numerical data cross-verification | Confirm with 2+ sources | ### Consistency | ID | Item | Verification Method | |----|------|-------------------| | **H-6** | No self-contradiction | Detect conflicting statements within document | | **H-7** | Plan-result alignment | 1:1 mapping verification | | **H-8** | Terminology consistency | Same term for same concept | ### Completeness | ID | Item | Verification Method | |----|------|-------------------| | **H-9** | No remaining TODO/FIXME | `grep -r "TODO\|FIXME"` | | **H-10** | No placeholders | Detect `[INSERT]`, `TBD`, `...` | | **H-11** | All deliverables exist | Cross-check deliverable list against requirements | ### Hallucination Patterns | ID | Item | Verification Method | |----|------|-------------------| | **H-12** | No fictional library/API references | Verify existence on npm/pip/crates.io etc. | | **H-13** | No unsourced statistics | Verify source citation or "needs verification:" label | | **H-14** | No overconfident expressions | Detect "always", "never", "100%" type claims | ### Judgment Criteria ``` 0~1 failures → PASS ✅ 2~3 failures → WARNING ⚠️ (MED — user judgment) 4+ failures → FAIL ❌ (HIGH — auto regression) ``` --- ## 💰 Cost Management Orche is a multi-agent system, so cost management is essential. ### Budget Tiers | Tier | Ratio | Action | |------|-------|--------| | 🟢 **Normal** | < 50% | Normal operation | | 🟡 **Caution** | 50~80% | Limit concurrent agents, prefer `sonnet` | | 🔴 **Warning** | 80~95% | Stop new spawns, complete in-progress tasks only | | ⛔ **Exceeded** | > 95% | Full stop, request user judgment | ### Cost Saving Tips 1. **Actively leverage model selection:** - Simple execution: `haiku` (lowest cost) - Planning/debate/general execution: `sonnet` (best value) - Complex reasoning: `opus` (only when needed) 2. **Limit concurrent agents to 3** (default). 8 is the maximum; typically 3~4 is sufficient. 3. **Set token budget according to task scale:** | Task Scale | Recommended Budget (tokens) | Expected Agents | Estimated Cost (Claude Sonnet) | |-----------|---------------------------|----------------|-------------------------------| | Small (simple research) | 200,000 | 6~8 | ~$0.6 | | Medium (code refactoring) | 500,000 | 10~15 | ~$1.5 | | Large (market analysis report) | 1,000,000 | 15~20 | ~$3.0 | > **Note:** Costs above are estimates based on Claude Sonnet. Actual costs vary by model, prompt length, and retry count. 4. **Use Phase 0 well.** Clear scope definition during initial queries reduces unnecessary tasks and cuts costs. ### Token Budget Adjustment Adjust the budget by modifying `tokenBudget.total` in `orche-state.json`: ```json { "tokenBudget": { "total": 1000000, "used": 0 } } ``` --- ## 🚨 Exception Handling Summary | Exception | Detection Method | Auto Response | User Notification | |-----------|-----------------|---------------|-------------------| | Agent hang | Watchdog 15min no change | kill + re-spawn | On 2 consecutive occurrences | | Empty deliverable | File size 0 | Retry with reinforced prompt | On 2 failures | | API error (429/500) | HTTP status | Retry after 30s | On 3 failures | | Hallucination detected | Critic/verification team | Regenerate affected portion | On HIGH severity | | Total deadlock | All agents lost | — | Immediately | | Context exceeded | Error message | Retry with reduced input | On failure | | maxRetries exceeded | Counter check | — | Immediate judgment request | | Token budget exceeded | used/total ratio | Model downgrade | At 80% | ### Error Logging All errors are recorded in `orche-state.json`'s `errors` array: ```json { "errors": [ { "timestamp": "2026-03-26T10:30:00Z", "phase": 2, "taskId": "task-3", "type": "AGENT_HANG", "message": "Agent orche-exec-task-3 unresponsive for 15+ minutes", "action": "kill + re-spawn" } ] } ``` --- ## 🙋 User Intervention Points | Timing | Required? | Content | |--------|-----------|---------| | Phase 0 complete | ✅ **Required** | Approve requirements | | Phase 1 complete | Optional | Review plan (default: auto-proceed) | | Phase 2 failure | ✅ **Required** | Judgment after 2 retry failures | | Phase 3 MED issue | ✅ **Required** | Decide on re-execution | | Phase 3 complete | Notification | Final results report | ### Auto-Proceed Mode Auto-proceeding from Phase 1 to Phase 2 is the default behavior. If you want to review the plan directly, specify "I want approval at each step" during Phase 0 queries. --- ## 🛑 Abort Procedure When the user requests task cancellation: 1. **Terminate all active sub-agents** (session kill) 2. **Remove task from watchdog** (if in use) 3. **Set `orche-state.json` status → `"aborted"`** 4. **Report abort to user** (including list of completed tasks) ``` 🎼 /orche Abort Report 📋 Task: orche-1711432800 🛑 Abort reason: User request ✅ Completed tasks: 3/7 📁 Completed deliverables: - workspace/orche-1711432800/tasks/task-1/output.md - workspace/orche-1711432800/tasks/task-2/output.md - workspace/orche-1711432800/tasks/task-3/output.md ``` --- ## 🔄 Session Disconnection Recovery ### Problem If the main session disconnects mid-process, the orchestration stops. ### Recovery Mechanism 1. **`orche-state.json` is the recovery key.** - Phase/task progress is recorded in the file, so after session restart, the state file can be read to resume. 2. **State updated immediately on each task completion:** - `completedAt` recorded in `orche-state.json` upon receiving each sub-agent result 3. **Watchdog detects hangs:** (when using watchdog) - If task exists in `watchdog.md` and main session is unresponsive for 20+ minutes in `sessions_list`, reports to user 4. **Manual recovery:** - User instructs orche to resume → reads `orche-state.json` → continues from incomplete tasks ### Recovery Command Example ``` /orche resume ``` This command reads `orche-state.json` and resumes execution from the last completed phase/task. --- ## 🎮 Demo Scenarios ### Scenario 1: Research Report Writing **User Input:** ``` /orche "Write a 2025 Korean GenAI market analysis report. Must include market size, key players, investment trends, and regulatory landscape." ``` **Phase 0 — Query (2~3 min):** ``` 🎼 Phase 0: Query 1. 📐 Scope: All GenAI? (LLM, image generation, code assistants, etc.) 2. 🎯 Depth: Individual company analysis? Or market overview level? 3. 📊 Data: Public data only? Paid research sources? 4. 📏 Length: How many pages (A4 equivalent)? 5. 🎨 Format: Markdown? PDF? 6. ⏰ Purpose: Internal report? Investment decision? Blog? ``` **Phase 1 — Planning (5 min):** - Expert A (AI market analyst): Proposes report structure - Expert B (Korean market specialist): Suggests Korea-specific data sources - Critic: Points out "Market size estimates need cross-verification from 2+ sources" - Moderator: Finalizes split into 5 tasks **Phase 2 — Execution (10~15 min):** | Task | Agent | Deliverable | |------|-------|-------------| | Market overview | agent-1 | `tasks/task-1/market-overview.md` | | Key players | agent-2 | `tasks/task-2/key-players.md` | | Investment trends | agent-3 | `tasks/task-3/investment-trends.md` | | Regulatory landscape | agent-4 | `tasks/task-4/regulatory.md` | | Consolidated report | agent-5 | `tasks/task-5/final-report.md` | **Phase 3 — Verification (5 min):** - H-5 check: Market size figures cross-verified → 2 sources confirmed ✅ - H-13 check: Investment amounts have cited sources → ✅ - H-11 check: All sections complete → ✅ - Final verdict: **PASS** ✅ --- ### Scenario 2: Code Refactoring **User Input:** ``` /orche "Refactor our project's API layer. Currently Express monolithic — want to separate modules by domain and increase test coverage." ``` **Phase 0 — Query (3 min):** ``` 🎼 Phase 0: Query 1. 📐 Scope: Entire API? Specific domains only? 2. 🔧 Tech stack: Express + TypeScript? ORM? 3. 📁 Current structure: Source directory path? 4. 🎯 Target structure: Have an example? (NestJS-style? Layered?) 5. ✅ Tests: Current coverage? Test framework? 6. ⚠️ Constraints: Backward compatibility? Zero-downtime deployment? ``` **Phase 1 — Planning (7 min):** - Expert A (Node.js architect): Proposes module separation strategy - Expert B (Testing specialist): Proposes testing strategy - Critic: Points out "Need to analyze dependency graph first to ensure no circular dependencies" - Advocate: Argues "Gradual migration is safer" - Moderator: Finalizes 6 tasks in order: dependency analysis → module separation → test writing **Phase 2 — Execution (15~20 min):** | Task | Agent | Deliverable | |------|-------|-------------| | Dependency analysis | agent-1 | `tasks/task-1/dependency-graph.md` | | User module separation | agent-2 | `tasks/task-2/user-module/` | | Auth module separation | agent-3 | `tasks/task-3/auth-module/` | | Order module separation | agent-4 | `tasks/task-4/order-module/` | | Shared utilities cleanup | agent-5 | `tasks/task-5/shared/` | | Test writing | agent-6 | `tasks/task-6/tests/` | **Phase 3 — Verification (7 min):** - H-1 check: All file paths exist → ✅ - H-4 check: TypeScript compilation successful → ✅ - H-12 check: No non-existent npm package references → ✅ - H-7 check: Plan-result alignment → Task 5 gap found ⚠️ - **1st regression:** Task 5 re-executed → completed - Final verdict: **PASS** ✅ --- ### Scenario 3: Market Analysis **User Input:** ``` /orche "Do a SaaS competitor analysis. Target Notion, Coda, Slite — compare features, pricing strategy, and market positioning." ``` **Phase 0 — Query (2 min):** ``` 🎼 Phase 0: Query 1. 📐 Scope: Any additional competitors beyond these 3? 2. 🎯 Analysis depth: Feature-level detail? Strategic level? 3. 📊 Data: Public pricing only? Actual usage experience needed? 4. 📏 Deliverables: Comparison table + strategy report? 5. 🔍 Our product: Is there a reference product to compare against? ``` **Phase 1 — Planning (5 min):** - Expert (SaaS market specialist): Proposes analysis framework (Porter's Five Forces + Feature Matrix) - Critic: Warns "Pricing should be based on public pages only — no guessing enterprise pricing" - Moderator: Finalizes 4 tasks **Phase 2 — Execution (10 min):** | Task | Agent | Deliverable | |------|-------|-------------| | Notion analysis | agent-1 | `tasks/task-1/notion-analysis.md` | | Coda analysis | agent-2 | `tasks/task-2/coda-analysis.md` | | Slite analysis | agent-3 | `tasks/task-3/slite-analysis.md` | | Comparative summary | agent-4 | `tasks/task-4/comparison-report.md` | **Phase 3 — Verification (5 min):** - H-5 check: Pricing data cross-verified → public page basis ✅ - H-13 check: MAU/ARR figures have cited sources → "needs verification:" properly labeled ✅ - H-14 check: "Notion will definitely win" type overconfidence → none found ✅ - Final verdict: **PASS** ✅ --- ## 📈 Production Run Records > Below are summaries of actual projects completed with Orche. ### Run #1: ClawHub Skill Packaging | Item | Details | |------|---------| | **Task** | Repackage internal skills for ClawHub publishing | | **Duration** | 25 minutes | | **Agents deployed** | 12 (planning 4 + execution 5 + verification 3) | | **By phase** | Query 3m → Plan 6m → Execute 10m → Verify 6m | | **Regressions** | 0 (passed first time) | | **Notable** | Critic flagged "environment-dependent hardcoded cron IDs" → fixed during planning phase | ### Run #2: Multilingual Documentation Translation + Review | Item | Details | |------|---------| | **Task** | Technical doc translation (Korean/English/Japanese) + terminology unification | | **Duration** | 35 minutes | | **Agents deployed** | 15 (planning 3 + execution 8 + verification 4) | | **By phase** | Query 2m → Plan 5m → Execute 18m → Verify 10m | | **Regressions** | 1 (H-8 terminology consistency failure → glossary-based retranslation) | | **Notable** | Terminology inconsistency auto-detected → Phase 2 regression → fixed and passed | ### Run #3: API Design + Implementation + Testing | Item | Details | |------|---------| | **Task** | REST API design → implementation → test code → documentation | | **Duration** | 42 minutes | | **Agents deployed** | 18 (planning 5 + execution 8 + verification 5) | | **By phase** | Query 4m → Plan 8m → Execute 20m → Verify 10m | | **Regressions** | 1 (H-12 non-existent middleware reference → affected task re-executed) | | **Notable** | Critic debate on "implement auth middleware vs use library" → Moderator decided on verified library | ### Run Statistics Summary ``` Total completions: 3 Average duration: 34 minutes Average agents: 15 First-pass rate: 33% (1/3) Resolved via auto-regression: 100% (2/2 regressions resolved within 3 retries) Manual escalations: 0 ``` --- ## 🔗 hallucination-guard Integration > If the **hallucination-guard** skill is installed, Orche can leverage it during the verification phase for more precise hallucination detection. ### Integration Method 1. **Install hallucination-guard skill:** ```bash # Install from ClawHub (needs verification: install command may vary by environment) mkdir -p ~/.agents/skills/hallucination-guard # Place SKILL.md in that directory ``` 2. **Auto-utilized in Phase 3 verification:** - When hallucination-guard is installed, Phase 3's **hallucination verifier** additionally applies that skill's checklist. - Particularly improves detection accuracy for "non-existent APIs/libraries" in code-related tasks. 3. **Integration Architecture:** ``` Phase 3: Verification ├── Completeness Verifier ├── Accuracy Verifier ├── Hallucination Verifier ──── Uses hallucination-guard skill │ ├── Orche's built-in 14-item check │ └── + hallucination-guard additional checks (when installed) ├── Integration Verifier └── Final Reviewer ``` 4. **When hallucination-guard is not installed:** - Orche functions normally with its built-in 14-item checklist alone. - hallucination-guard is an **optional enhancement** tool. ### Recommended Combinations | Task Type | Orche Alone | + hallucination-guard | |-----------|:---:|:---:| | Research report | ✅ Sufficient | ✅✅ Enhanced numerical verification | | Code refactoring | ✅ Sufficient | ✅✅✅ API existence check essential | | Market analysis | ✅ Sufficient | ✅✅ Enhanced statistics source verification | | Document translation | ✅ Sufficient | ✅ Minimal difference | --- ## ⚙️ Advanced Configuration ### Adjusting Debate Round Count Default is up to 3 rounds. Complex domains may require more rounds. Specify "I want thorough plan debate" during Phase 0 to adjust the round count. ### Verification Intensity Adjustment | Intensity | Verifier Count | Hallucination Check | Recommended For | |-----------|---------------|-------------------|-----------------| | **light** | 2 | Essential 5 items only | Simple tasks, time savings | | **standard** (default) | 3~4 | All 14 items | General tasks | | **thorough** | 5 | 14 items + additional cross-verification | High-risk tasks | You can specify verification intensity during Phase 0: ``` /orche "Refactor the API. Use thorough verification." ``` ### Agent Model Configuration Specify different models per phase based on task characteristics: ```json { "modelConfig": { "planning": "anthropic/claude-sonnet-4-6", "execution": "anthropic/claude-sonnet-4-6", "verification": "anthropic/claude-sonnet-4-6", "complex": "anthropic/claude-opus-4-6" } } ``` ### Concurrent Agent Limit Default is 3 (maximum 8). Adjust in `orche-state.json`: ```json { "concurrency": { "default": 3, "max": 8 } } ``` --- ## ❓ FAQ ### Q: Can Phase 0 be skipped? **A:** No. Phase 0 is **mandatory**. Proceeding with ambiguous requirements causes direction change costs in Phase 2 to grow exponentially. Clear answers in Phase 0 actually reduce total time. ### Q: How much does it cost? **A:** Depends on task scale. Small tasks (research report) use approximately 200K tokens (Claude Sonnet ~$0.6), large tasks (API design + implementation) approximately 1M tokens (~$3.0). See the cost management section for budget settings. ### Q: What happens if verification keeps failing? **A:** Auto-regression up to 3 times. If all 3 fail, escalated to user for direct judgment. Also escalated if the same reason fails 2 consecutive times. ### Q: Can it work without a watchdog? **A:** Yes. The watchdog is an additional safety net; phase gates and harness rules are the core safety mechanisms. Orche functions normally without a watchdog. ### Q: Can it be used alongside other orchestration skills? **A:** Orche operates independently. However, handling simple parallel tasks within Phase 2 using `dispatching-parallel-agents` patterns is naturally compatible. ### Q: Is progress lost on session disconnect? **A:** No. All state is recorded in `orche-state.json` and the file system. Use `/orche resume` to continue from the last progress point. ### Q: What if the Critic over-opposes in the debate panel? **A:** The Moderator builds consensus. Debate runs up to 3 rounds; if consensus isn't reached, the Moderator has final decision authority. ### Q: Do I have to use the Opus model? **A:** No. Sonnet alone can handle most tasks. Opus is used selectively only for domain expert roles requiring complex reasoning. Sonnet is recommended as default for cost savings. --- ## 📋 Changelog ### v1.0.0 (2026-03-26) - 🎉 Initial ClawHub release - Full Phase 0~3 pipeline implementation - 14-item hallucination checklist - Phase gate system - Critic debate protocol - Auto regression mechanism (up to 3 times) - Watchdog integration (optional) - 4-tier cost management - Session disconnection recovery (state file based) - 3 demo scenarios + production run records --- ## Important Notes 1. **Sub-agents are created via `sessions_spawn`** (`mode: "run"`) 2. **Do not pass full session context to agents** — pass only the minimum necessary information 3. **File-based communication** — no direct inter-agent communication 4. **Concurrent agent limit** — recommended 3, maximum 8 5. **Phase 0 cannot be skipped** 6. **Be cost-conscious** — don't deploy 7 agents for a simple task 7. **Harness rules are injected into all sub-agents** — this is the key to hallucination prevention --- *Made with 🎼 by [reikys](https://github.com/reikys) — Battle-tested in production environments.*

orche

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

orche

orche

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement