eeat-openclaw-skill-audit

# EEAT OpenClaw Skill Audit > **AI Agent Skill Quality Assurance** — This skill adapts the CORE-EEAT framework to evaluate OpenClaw Skills, ensuring they deliver meaningful utility while maintaining security and reliability. ## Skill Overview OpenClaw Skills are modular capability extensions for AI agents, defined by `SKILL.md` files with YAML frontmatter and prompt instructions. This skill evaluates skill quality through 80 standardized criteria across 8 core dimensions, generating comprehensive audit reports including utility scores, security assessments, and actionable improvement recommendations. **Core Transformation**: - **From**: Install Skills blindly → Hope they work - **To**: Systematic vetting → Data-driven skill selection ## OpenClaw Skill Structure Every OpenClaw Skill consists of: ``` my-skill/ ├── SKILL.md # Core definition (YAML + Markdown instructions) ├── scripts/ # Optional executable scripts │ └── main.py └── references/ # Optional configuration and resources └── config.json ``` **Key Components**: - **YAML Frontmatter**: Skill metadata (name, description, version, dependencies, gates) - **Prompt Instructions**: How the AI should use this skill - **Scripts**: Optional executable code for complex operations - **Gating Mechanism**: Conditional activation (bins, env, os checks) ## Applicable Scenarios Use this skill when users request: ### Skill Selection - "Evaluate this skill before installing" - "Compare two skills for the same task" - "Which skill should I use for X?" ### Security Vetting - "Is this skill safe to run?" - "Scan for security vulnerabilities" - "Check permission boundaries" ### Skill Development - "Audit my skill for quality issues" - "How to improve my skill's documentation?" - "What security best practices am I missing?" ### Skill Maintenance - "Review installed skills for quality" - "Identify deprecated or risky skills" - "Prioritize skill updates" ## Core Capabilities This skill can: 1. **Complete 80-Item Audit**: Score each CORE-EEAT item adapted for OpenClaw Skills 2. **Utility Scoring**: Evaluate task completion efficiency and comparative value 3. **Security Assessment**: Three-level security evaluation (Pass/Caution/Risk) 4. **Gating Validation**: Check conditional activation requirements (bins, env, os) 5. **Veto Item Detection**: Flag critical security violations (command injection, data leakage) 6. **Priority Ranking**: Identify top 5 improvements by impact 7. **Comparative Analysis**: Compare skills for same use case ## Skill Categories This skill supports 6 OpenClaw Skill types, each with different evaluation priorities: ### Productivity Skills - **Definition**: Gmail, Calendar, Google Drive, Microsoft Office integration - **Focus**: Task completion accuracy, API reliability, error handling - **Weights**: C: 30% | R: 25% | Exp: 20% | Ept: 15% | O: 5% | E: 0% | A: 5% | T: 0% ### Development Skills - **Definition**: Code generation, debugging, GitHub automation, CI/CD - **Focus**: Code quality, correctness, testing, security best practices - **Weights**: C: 25% | O: 20% | R: 20% | Ept: 20% | E: 5% | Exp: 5% | A: 5% | T: 0% ### Research Skills - **Definition**: Web search, web fetch, document summarization, data analysis - **Focus**: Information accuracy, source credibility, citation quality - **Weights**: C: 25% | R: 25% | A: 20% | E: 15% | O: 10% | Exp: 0% | Ept: 0% | T: 5% ### Automation Skills - **Definition**: Browser automation, file operations, shell commands, task scheduling - **Focus**: Security, error handling, robustness, permissions - **Weights**: T: 30% | C: 25% | R: 20% | O: 15% | Exp: 5% | Ept: 5% | E: 0% | A: 0% ### Content Skills - **Definition**: Text generation, translation, image generation, audio processing - **Focus**: Output quality, style consistency, creative value - **Weights**: C: 30% | E: 25% | Exp: 20% | O: 15% | Ept: 5% | R: 5% | A: 0% | T: 0% ### System Skills - **Definition**: System monitoring, resource management, network tools, debugging - **Focus**: Performance, reliability, security, compatibility - **Weights**: T: 25% | C: 20% | R: 20% | E: 15% | O: 10% | Ept: 5% | Exp: 5% | A: 0% --- ## 8 Progressive Quality Gates ### Gate 1: Metadata Validation (Pre-Installation) **When**: Before installing any skill **Duration**: 2-5 minutes **Items**: - **C01**: YAML frontmatter present and valid - **C02**: Skill name and description clear - **O01**: Skill structure follows OpenClaw convention - **T04**: No suspicious dependencies or permissions **Deliverable**: Metadata Validation Report **Failure**: Do not install. Contact skill author or fix manually. --- ### Gate 2: Gating Mechanism Check **When**: After metadata validation, before activation **Duration**: 1-2 minutes **Items**: - **O02**: Required tools exist (bins check) - **O03**: Required environment variables set (env check) - **O04**: OS compatibility verified (os check) - **T07**: No conflicting permissions **Deliverable**: Gating Compatibility Report **Failure**: Skill will not activate. Fix environment or choose alternative. --- ### Gate 3: Security Pre-Check **When**: Before first execution **Duration**: 3-5 minutes **Items**: - **T01**: No command injection vulnerabilities - **T02**: No data leakage risks - **T03**: Input validation present - **T04**: Permissions are minimal principle **Deliverable**: Security Pre-Check Report **Failure**: Do not execute. Review code or choose alternative. --- ### Gate 4: Prompt Quality Review **When**: During skill development or installation **Duration**: 5-10 minutes **Items**: - **C03**: Instructions are clear and actionable - **C04**: Tool usage patterns are explicit - **Ept03**: Professional terminology used - **Ept05**: Follows prompt engineering best practices **Deliverable**: Prompt Quality Report **Failure**: Skill may misbehave. Refine SKILL.md instructions. --- ### Gate 5: Script Security Audit **When**: For skills with executable scripts **Duration**: 10-20 minutes **Items**: - **T05**: No hardcoded secrets or API keys - **T06**: No eval/exec of user input - **T07**: Proper error handling and logging - **T08**: Resource cleanup implemented **Deliverable**: Script Security Audit **Failure**: Security risk. Audit scripts or avoid skill. --- ### Gate 6: Utility Testing **When**: During skill evaluation **Duration**: 15-30 minutes **Items**: - **C05**: Completes intended tasks successfully - **C06**: Output quality meets expectations - **Exp01**: Usage examples provided and working - **Exp02**: Performance characteristics documented **Deliverable**: Utility Test Report **Failure**: Skill doesn't deliver value. Consider alternatives. --- ### Gate 7: Comparative Analysis **When**: Selecting between multiple skills for same task **Duration**: 10-15 minutes **Items**: - **E01**: Offers unique value vs alternatives - **E02**: Better performance or efficiency - **A01**: Active maintenance and updates - **A02**: Community adoption and reviews **Deliverable**: Comparative Analysis Report **Failure**: Alternative skill may be better. Choose based on scores. --- ### Gate 8: Full 80-Item Audit **When**: Critical skills, major updates, security reviews **Duration**: 30-60 minutes **Items**: All 80 items across all 8 dimensions **Deliverable**: Comprehensive EEAT Audit Report **Failure**: Address critical issues before production use. --- ## Usage Examples ### Example 1: Pre-Installation Security Check **User Request**: ``` Check if this skill is safe to install: clawhub install agent-browser ``` **Response**: ```markdown ## EEAT Skill Security Audit: agent-browser ### Gate 1: Metadata Validation ✅ - [x] YAML frontmatter valid - [x] Description clear: "Browser automation for web scraping" - [x] Version: 1.2.3 - [x] Dependencies: playwright (verified safe) ### Gate 2: Gating Check ⚠️ - [x] Bins: playwright installed - [x] Env: PLAYWRIGHT_BROWSERS_PATH set - [ ] OS: macOS/Windows only (Linux limited support) ### Gate 3: Security Pre-Check ✅ - [x] T01: No command injection detected - [x] T02: No data leakage risks - [x] T03: Input validation on URLs - [x] T04: Minimal permissions (browser control only) ### Security Score: 92/100 **Security Status**: CAUTION ⚠️ - Linux support limited - Requires browser installation (system-level change) ### Recommendation ✅ **Install with Caution** - Safe for macOS/Windows users - Linux users should check compatibility - Review permissions before first use ``` --- ### Example 2: Compare Two Skills **User Request**: ``` Compare tavily-search vs web-search for web search ``` **Response**: ```markdown ## Skill Comparative Analysis: Web Search ### Skill A: tavily-search **Utility Score**: 85/100 - C05: High accuracy search results ✅ - C06: Clean, structured output ✅ - Exp01: Excellent usage examples ✅ - E01: AI-optimized for agents 🎯 **Security Score**: 95/100 - T01-T04: All checks pass ✅ - T07: No external dependencies ✅ **Overall**: 88/100 (Very Good) ### Skill B: web-search (bundled) **Utility Score**: 70/100 - C05: Basic search functionality ✅ - C06: Output quality varies ⚠️ - Exp01: Limited examples ⚠️ - E01: Generic implementation ⚠️ **Security Score**: 90/100 - T01-T04: Safe but basic ✅ - T07: Uses external APIs ⚠️ **Overall**: 75/100 (Good) ### Recommendation ✅ **Choose tavily-search** - +18 points higher utility - AI-optimized for agent workflows - Better structured output for downstream processing - Higher security score **Trade-off**: Requires API key setup vs. no setup for web-search ``` --- ### Example 3: Full Skill Audit **User Request**: ``` Audit my custom skill: email-automation [Paste SKILL.md content] ``` **Response**: ```markdown ## EEAT Skill Audit: email-automation ### Overview - **Skill Type**: Productivity - **Total Score**: 62/100 (Fair) - **Utility Score**: 75/100 (Good) - **Security Score**: 45/100 (CAUTION ⚠️) - **Veto Item**: ⚠️ T02 Triggered (Data Leakage Risk) ### Dimension Scores | Dimension | Score | Rating | Weight | Weighted Score | |-----------|-------|--------|--------|----------------| | C — Correctness | 70/100 | Good | 30% | 21 | | O — Organization | 65/100 | Fair | 5% | 3.25 | | R — Referenceability | 50/100 | Fair | 25% | 12.5 | | Exp — Experience | 80/100 | Good | 20% | 16 | | Ept — Expertise | 55/100 | Fair | 15% | 8.25 | | E — Exclusivity | 40/100 | Poor | 0% | 0 | | A — Authority | 60/100 | Fair | 5% | 3 | | T — Trust | 45/100 | Poor | 0% | 0 | | **Weighted Total** | | | | **64** | ### Critical Issues (Veto Items) ⚠️ **T02: Data Leakage Risk** **Issue**: Skill stores API credentials in plain text in SKILL.md ```yaml # SKILL.md credentials: smtp_password: "mypassword123" # ⚠️ SECURITY RISK ``` **Action**: Move credentials to environment variables ```yaml credentials: smtp_password: "${SMTP_PASSWORD}" # ✅ SECURE ``` ### Top 5 Priority Improvements 1. **T02 Data Leakage** — Remove hardcoded credentials - Current: Fail | Potential Gain: 8 weighted points - Action: Use environment variables for all secrets 2. **R02 Coverage** — Add error handling examples - Current: Fail | Potential Gain: 6.25 weighted points - Action: Document error scenarios and recovery 3. **Ept01 Documentation** — Improve prompt instructions - Current: Partial | Potential Gain: 4.5 weighted points - Action: Add step-by-step usage examples 4. **R03 Source Authority** — Verify email library security - Current: Partial | Potential Gain: 3.75 weighted points - Action: Audit nodemailer dependency for vulnerabilities 5. **O01 Structure** — Add scripts/ directory for complex logic - Current: Partial | Potential Gain: 2.5 weighted points - Action: Move complex operations to Python scripts ### Action Plan #### Quick Wins (Fix immediately) - [ ] Move all credentials to environment variables - [ ] Add error handling documentation #### Medium Investment (This week) - [ ] Add comprehensive usage examples - [ ] Implement proper logging in scripts #### Strategic (Next sprint) - [ ] Add test suite with edge cases - [ ] Implement retry logic for failed sends - [ ] Add HTML email support ### Recommendation ⚠️ **Do Not Install Until Fixed** - Security risk (T02 veto) must be addressed - After fixes, expected score: 78/100 (Good) ``` --- ## Reference Documents - `references/openclaw-skill-benchmark.md` — Complete 80-item benchmark adapted for OpenClaw Skills - `references/skill-security-checklist.md` — Security-specific evaluation criteria - `references/utility-testing-guide.md` — How to test skill utility and comparative value - `workflow-optimization-analysis.md` — Adaptation strategy from code to skills --- ## Key Differences: Code vs. Skill Audit | Aspect | Code Audit | Skill Audit | |--------|-----------|-------------| | **Primary Focus** | Code correctness, maintainability | Utility, security, reliability | | **Security Emphasis** | SQL injection, XSS | Command injection, data leakage, permissions | | **Evaluation Method** | Static analysis + testing | Comparative utility + security probes | | **Output Format** | Code quality report | Utility score + security status label | | **Key Metrics** | Test coverage, complexity | Task completion, risk level | | **Veto Items** | Security bugs, logic errors | Security vulnerabilities, data risks | | **Automation Level** | High (linters, type checkers) | Medium (requires manual security review) | | **Comparative Analysis** | Code vs. requirements | Skill vs. baseline/skills | --- ## Success Points 1. **Security-First Approach** — OpenClaw Skills have system-level access; security is non-negotiable 2. **Comparative Utility** — Evaluate skills relative to baseline, not in isolation 3. **Gating Validation** — Ensure skills only activate when dependencies are met 4. **Prompt Quality** — SKILL.md instructions determine skill behavior; quality matters 5. **Minimal Permissions** — Skills should only request necessary access 6. **Active Maintenance** — Prioritize skills with recent updates and community support 7. **Real-World Testing** — Test with actual use cases, not synthetic scenarios --- ## Optimization Recommendations Based on OpenClaw's architecture and community best practices: ### 1. Add Skill Registry Integration - Integrate with ClawHub API for real-time skill metadata - Auto-fetch version history, update frequency, download counts - Community signals: stars, issues, last commit date ### 2. Implement Automated Security Scanning - Integrate with `clawsec` (ClawHub security scanner) - Auto-scan scripts/ directory for vulnerabilities - Check for hardcoded secrets, eval/exec patterns ### 3. Add Utility Benchmarking - Run comparative tests: baseline vs. with-skill - Measure task completion time, token efficiency, success rate - Generate utility scores similar to SkillTester framework ### 4. Create Skill Dependency Graph - Map skill dependencies (some skills require others) - Detect circular dependencies - Recommend optimal skill combinations ### 5. Implement Skill Conflict Detection - Detect skills with conflicting tool usage - Identify resource contention (browser, file locks) - Suggest skill compatibility matrix ### 6. Add Performance Profiling - Track skill execution time over sessions - Monitor API usage and costs - Identify bottlenecks in skill chains ### 7. Create Skill Reputation System - Track skill reliability across users - Aggregate success/failure rates - Community-rated skill quality scores ### 8. Implement A/B Testing Framework - Compare two skills for same task - Measure which completes faster/better - Data-driven skill selection ### 9. Add Skill Update Notifications - Monitor skill updates in ClawHub - Alert on breaking changes - Suggest upgrade timing ### 10. Create Skill Usage Analytics - Track which skills are used most frequently - Identify skill chains and workflows - Optimize skill loading order --- ## Notes - This skill adapts EEAT from content/code evaluation to AI agent skill vetting - Security is elevated to critical importance due to system-level access - Utility is evaluated comparatively (skill vs. baseline), not absolutely - Gating mechanisms ensure skills only activate when dependencies are met - Community signals (downloads, stars, issues) inform Authority dimension - Prompt quality in SKILL.md directly impacts skill behavior - Skills with executable scripts require deeper security audit - This framework is designed for OpenClaw's modular skill architecture

eeat-openclaw-skill-audit

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

eeat-openclaw-skill-audit