clawcheck

# ClawCheck Two-phase audit: a fast deterministic scan catches structural issues, then you (the agent) do a deep quality evaluation on the flagged areas. ## When to Use - After initial setup or major config changes - Before publishing skills to ClawHub (quality gate) - Periodic health check (weekly cron or manual) - When something feels off but `openclaw doctor` says "ok" - After installing new skills or updating OpenClaw ## What This Checks vs Built-in | This skill | `openclaw doctor` (built-in) | |---|---| | Secrets exposure + token hygiene | Config JSON schema validation | | Cron ops health + prompt quality review | Plugin/skill eligibility | | Config optimization + value assessment | Channel connectivity | | Skill structural + content quality audit | State migrations, browser detection | ## How It Works: Two Phases ### Phase 1: Deterministic Scan (fast, free) Run the script to get a structural baseline: ```bash python3 {baseDir}/scripts/audit.py ``` Individual modules: ```bash python3 {baseDir}/scripts/audit.py --security python3 {baseDir}/scripts/audit.py --cron python3 {baseDir}/scripts/audit.py --config python3 {baseDir}/scripts/audit.py --skills ``` This produces JSON with scores, findings, and the bottom/top skill lists. Use this as your triage map for Phase 2. ### Phase 2: Deep Quality Audit (you, the agent) After running the script, perform these evaluations. Budget your depth based on what the user asked for ("quick check" = Phase 1 only, "full audit" or "quality review" = both phases). #### 2a. Config Quality Review Read `~/.openclaw/openclaw.json` and evaluate: - **Heartbeat prompt**: Read `agents.defaults.heartbeat.prompt`. Is it specific enough to catch real issues? Does it avoid heavy operations? A good heartbeat prompt is < 200 words, checks 2-3 things, and has clear escalation criteria. - **Model choices**: Is the primary model appropriate for the workload? Are fallbacks a meaningful step-down (not the same tier)? Is the subagent model cheaper than primary? - **Compaction thresholds**: Are `reserveTokens` and `keepRecentTokens` reasonable for the context window size? Rule of thumb: reserve should be 15-20% of contextTokens. - **Session maintenance**: Are `pruneAfter`, `maxEntries`, `rotateBytes` set to values that match the usage pattern? Heavy cron usage needs more aggressive pruning. - **Cron maxConcurrentRuns**: Is it high enough for the number of frequent jobs? Count jobs with `*/` in their schedule expression. Score each aspect 1-5. Report specific improvements. #### 2b. Cron Prompt Quality Review Read `~/.openclaw/cron/jobs.json`. Select the 5 most important enabled jobs using this heuristic: 1. Any job in error state (from Phase 1 findings) 2. Jobs with highest `frequency x timeoutSeconds` (most resource-consuming) 3. Jobs running on expensive models (opus/primary) 4. If still under 5, pick by business impact (backups, monitoring, user-facing) For each selected job evaluate: - **Prompt clarity**: Specific enough to execute without guessing? Clear steps, expected output format, error handling? - **Safety**: Has guardrails? ("NEVER run git push", "read-only", "do not edit files directly") - **Efficiency**: Token-efficient? Flag prompts > 1500 chars that run on expensive models. Could the prompt reference a skill file instead of inlining instructions? - **Output value**: Produces actionable output or just noise? - **Timeout**: `payload.timeoutSeconds` set and reasonable for scope? Score each job 1-5 on: purpose, prompt quality, safety, efficiency. Flag jobs scoring below 3. **Cross-reference**: Check if any cron prompts reference skills that scored below 70 in Phase 1. A cron job is only as reliable as the skills it depends on. #### 2c. Skill Content Quality Review From the Phase 1 results, pick: - The 3 lowest-scoring skills (from `bottom_5`) - Any skills the user specifically asks about - Skills used by failing cron jobs (cross-reference cron findings) For each selected skill, read its full SKILL.md and evaluate: - **Accuracy** (2x weight): Would following these instructions produce correct behavior? Are API references current? Are file paths real? - **Completeness** (1.5x): Are all use cases covered? Edge cases? What happens when dependencies are missing? - **Clarity** (1x): Can an agent follow this without ambiguity? No hedging, clear steps, good examples? - **Efficiency** (1x): Is the SKILL.md bloated? Could it be shorter without losing information? Does it suggest efficient patterns (batching, caching)? - **Voice alignment** (1x, content-producing skills only): Does the output match the brand/user's tone? Scoring formula depends on skill type: - **Content/marketing skills** (has voice component): `(accuracy*2 + completeness*1.5 + clarity + efficiency + voice) / 6.5` - **Utility/tool skills** (no voice): `(accuracy*2 + completeness*1.5 + clarity + efficiency) / 5.5` For skills scoring below 4.0, write specific improvement recommendations with concrete examples. #### 2d. Security Assessment Phase 1 now scans workspace files for common secret patterns (sk-, ghp_, AIzaSy, Bearer tokens, hex private keys, etc.). In Phase 2, go deeper: - Review any secrets the script found in workspace files. Are they real credentials or false positives (e.g., example/placeholder values)? - Check if any skill `scripts/` contain hardcoded credentials or API URLs with embedded tokens - Check if `.env` files exist inside skill directories - Look for credentials in cron job prompts (some prompts inline API keys instead of referencing env vars) - Check if any workspace knowledge files contain customer data, passwords, or access tokens ## Output Format ### Phase 1 (script output) ```json { "score": 82, "score_type": "structural_hygiene", "status": "healthy", "sections": { "security": {"score": 65, "finding_count": 3}, "cron": {"score": 95, "finding_count": 1}, "config": {"score": 88, "finding_count": 2}, "skills": {"score": 80, "finding_count": 1} }, "findings": [...] } ``` ### Phase 2 (your evaluation) Present as a readable report to the user: ``` ## ClawCheck Report ### Structural Baseline (Phase 1) Overall: 82/100 (healthy) Security: 65 | Cron: 95 | Config: 88 | Skills: 80 ### Deep Quality Findings (Phase 2) **Config:** - Heartbeat prompt: 4/5 (clear but could add Telegram alert on critical) - Model choices: 5/5 (opus primary, sonnet fallback, sonnet subagent) - Compaction: 4/5 (reserveTokens=150k for 800k context = 19%, good) **Cron (top concerns):** - "Morning Brief" (3/5): prompt is 400 words but lacks output format spec - "Bleeding Edge Scanner" (2/5): no safety guardrails, no error handling **Skills (bottom 3):** - marketing-automation: BROKEN (no SKILL.md) - apple-notes (62/100 structural): [content evaluation] - blucli (62/100 structural): [content evaluation] ### Recommended Actions (priority order) 1. [most impactful fix] 2. [next fix] 3. [next fix] ``` ## Scoring Weights (Phase 1 script) Security 30%, cron 25%, config 20%, skills 25%. Skill structure formula: `(structure*2 + completeness*1.5 + clarity + efficiency) / 5.5 * 20` ## Remediation For detailed fix patterns with real config examples, see `{baseDir}/references/remediation.md`. Quick fixes for common findings: ### Inline secrets ```json "GAMMA_API_KEY": {"source": "exec", "provider": "op-gamma", "id": "value"} ``` ### Plaintext bot token ```json "botToken": {"source": "exec", "provider": "op-telegram", "id": "value"} ``` ### Missing heartbeat ```json "heartbeat": {"every": "1h", "model": "sonnet", "prompt": "HEARTBEAT: Quick check..."} ``` ### Missing timezone on cron ```json "schedule": {"kind": "cron", "expr": "0 9 * * *", "tz": "Europe/Madrid"} ``` ## Error Handling - If OpenClaw dir not found: script exits with error JSON and exit code 1. - If `openclaw.json` is missing or invalid: script exits with error JSON. - If individual module fails: caught and reported as warning, other modules still run. - If bundled skills dir not accessible: skipped silently. - Phase 2 failures: if you can't read a file, note it and move on. Don't stop the whole audit. ## Non-Goals - No direct edits to config or skills (report only, user decides) - No network calls (everything is local file inspection) - No overlap with `openclaw doctor` schema validation or channel connectivity checks

clawcheck

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

clawcheck

clawcheck

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement