self-improving-intent-security-agent

# Self-Improving Intent Security Agent ## Install ```bash npx skills add nishantapatil3/self-improving-intent-security-agent ``` Use this skill to structure and document intent validation workflows. It does not ship a production runtime engine that automatically intercepts agent actions; instead, it provides templates, examples, and local scripts that help you build, simulate, or document that workflow. ## Scope Clarification - This package includes markdown templates, examples, and helper shell scripts - The helper shell scripts operate on local files only - Automatic enforcement, anomaly detection, rollback execution, and learning application must be implemented by the host agent or surrounding system ## Quick Reference | Situation | Action | |-----------|--------| | Starting autonomous task | Capture intent specification (goal, constraints, expected behavior) | | Before each action | Validate against intent, check authorization | | Action violates intent | Document the violation and follow the rollback workflow | | Unusual behavior detected | Log an anomaly, assess severity, and decide whether to halt or roll back | | Task completes | Analyze outcome, extract patterns, update strategies | | High-risk operation | Require human approval before execution | | Need transparency | Review audit log with full action history | | Strategy improves | A/B test new approach, adopt if better | | Recurring violation | Promote to permanent constraint in CLAUDE.md | ## Setup Create `.agent/` directory in project root: ```bash mkdir -p .agent/{intents,violations,learnings,audit} ``` Copy templates from `assets/` or create files with headers. Review the included shell scripts before running them if you want to understand exactly what they do. For a complete conversation-driven working folder, scaffold a run pack: ```bash ./scripts/scaffold-run.sh examples/my-demo customer_feedback medium ``` This creates: - `conversation.md` for the user/agent transcript - `report.md` for the final summary - a local `.agent/` tree with intent, audit, violation, rollback, learning, and strategy files ## Intent Specification Format Before executing autonomous tasks, capture structured intent: ```markdown ## [INT-YYYYMMDD-XXX] task_name **Created**: ISO-8601 timestamp **Risk Level**: low | medium | high **Status**: active | completed | violated ### Goal What you want to achieve (single clear objective) ### Constraints - Boundary 1 (e.g., "Only modify files in ./src") - Boundary 2 (e.g., "Do not make network calls") - Boundary 3 (e.g., "Preserve existing test coverage") ### Expected Behavior - Pattern 1 (e.g., "Read files before modifying") - Pattern 2 (e.g., "Run tests after changes") - Pattern 3 (e.g., "Create backups of modified files") ### Context - Relevant files: path/to/file.ext - Environment: development | staging | production - Previous attempts: INT-20250115-001 (if retry) --- ``` Save to `.agent/intents/INT-YYYYMMDD-XXX.md`. ## Validation Workflow ## Conversation-Driven Workflow Use this when you want the skill to document not just the intent, but the full user and agent interaction over time. ### Recommended Sequence 1. Capture the user request in `conversation.md` 2. Translate it into a structured intent in `.agent/intents/` 3. Record allowed and blocked actions in `.agent/audit/` 4. Log suspicious behavior in `.agent/violations/ANOMALIES.md` 5. Log hard validation failures in `.agent/violations/` 6. Record recovery steps in `.agent/audit/ROLLBACKS.md` 7. Extract reusable learnings in `.agent/learnings/` 8. Promote stable improvements into `.agent/learnings/STRATEGIES.md` 9. Summarize the run in `report.md` ### Good Fit - High-risk or privacy-sensitive tasks - Tasks where you need a human-readable transcript - Demos and evaluations - Incident reviews and postmortems ### Example See `examples/customer-feedback-demo/` for a full run showing: - intent capture - per-action validation - anomaly detection - blocked violation - rollback - learning promotion ### Pre-Execution Validation Before each action, validate: 1. **Goal Alignment**: Does this action serve the stated goal? 2. **Constraint Check**: Does it respect all boundaries? 3. **Behavior Match**: Does it fit expected patterns? 4. **Authorization**: Do we have permission for this? If ANY check fails → block action, log violation. ### Example Validation ```yaml Intent: "Process customer feedback files" Constraints: ["Only read ./feedback", "No file modifications"] Action: "delete ./feedback/temp.txt" Validation: - Goal Alignment: ❌ Deleting isn't "processing" - Constraint Check: ❌ Violates "no modifications" - Behavior Match: ❌ Not expected for this task - Authorization: ✓ (but blocked by other checks) Result: BLOCKED → Log violation → Consider rollback ``` ## Logging Violations When validation fails, log to `.agent/violations/`: ```markdown ## [VIO-YYYYMMDD-XXX] violation_type **Logged**: ISO-8601 timestamp **Severity**: low | medium | high | critical **Intent**: INT-20250115-001 **Status**: pending_review ### What Happened Action that was attempted ### Validation Failures - Goal Alignment: [reason] - Constraint Check: [which constraint violated] - Behavior Match: [how it deviated] ### Action Taken - [ ] Action blocked - [ ] Checkpoint rollback - [ ] Alert sent - [ ] Execution halted ### Root Cause Why the agent attempted this (if analyzable) ### Prevention How to prevent this in the future ### Metadata - Related Intent: INT-20250115-001 - Action Type: file_delete | api_call | command_execution - Risk Level: high - See Also: VIO-20250110-002 (if recurring) --- ``` ## Anomaly Detection Monitor execution for behavioral anomalies: ### Anomaly Types | Type | Description | Response | |------|-------------|----------| | **Goal Drift** | Actions diverging from stated goal | Halt, request clarification | | **Capability Misuse** | Using tools inappropriately | Rollback to checkpoint | | **Side Effects** | Unexpected consequences detected | Log warning, continue with monitoring | | **Resource Exceeded** | CPU/memory/time limits breached | Throttle or halt | | **Pattern Deviation** | Behavior differs from expected | Log for analysis | ### Anomaly Logging Log to `.agent/violations/ANOMALIES.md`: ```markdown ## [ANO-YYYYMMDD-XXX] anomaly_type **Detected**: ISO-8601 timestamp **Severity**: low | medium | high **Intent**: INT-20250115-001 ### Anomaly Details What unusual behavior was detected ### Evidence - Metric that triggered alert - Baseline vs. actual values - Timeline of deviation ### Assessment Why this is anomalous ### Response Taken - [ ] Continued with monitoring - [ ] Applied constraints - [ ] Rolled back - [ ] Halted execution --- ``` ## Learning Workflow After task completion, log learnings to `.agent/learnings/`: ```markdown ## [LRN-YYYYMMDD-XXX] category **Logged**: ISO-8601 timestamp **Intent**: INT-20250115-001 **Outcome**: success | failure | partial ### What Was Learned Pattern or insight discovered ### Evidence - Success rate: 95% - Execution time: 2.3s - Actions taken: 15 - Checkpoints: 3 ### Strategy Impact How this affects future executions ### Application Scope - Tasks: file_processing, data_transformation - Risk Levels: low, medium - Conditions: when X and Y are true ### Safety Check - Complexity: low | medium | high - Performance: baseline_comparison - Risk: assessment ### Metadata - Category: pattern | optimization | error_handling | security - Confidence: low | medium | high - Sample Size: N tasks observed - Pattern-Key: file.batch_processing (if recurring) --- ``` ## Rollback Operations ### Creating Checkpoints Before risky operations: ```typescript const checkpoint = await agent.checkpoint.create({ intent: currentIntent, reason: "Before bulk file operations" }); ``` ### Rollback on Violation Automatic rollback when intent violated: ```typescript // Happens automatically, but can also trigger manually: await agent.rollback.restore(checkpointId, { reason: "Detected constraint violation", notify: true }); ``` ### Rollback Log Track in `.agent/audit/ROLLBACKS.md`: ```markdown ## [RBK-YYYYMMDD-XXX] checkpoint_id **Executed**: ISO-8601 timestamp **Intent**: INT-20250115-001 **Trigger**: automatic | manual ### Reason Why rollback was necessary ### Actions Reversed - Action 1 (reversed successfully) - Action 2 (reversed successfully) - Action 3 (reversal failed - manual intervention needed) ### Checkpoint Restored - Checkpoint: CHK-20250115-001 - Created: 2025-01-15T10:00:00Z - Actions since checkpoint: 15 ### Status - [ ] Fully restored - [ ] Partially restored (see notes) - [ ] Manual intervention required --- ``` ## Strategy Evolution When agent learns better approaches: ### A/B Testing 1. **Baseline**: Current strategy (90% of tasks) 2. **Candidate**: New strategy (10% of tasks) 3. **Measure**: Compare success rate, time, resource usage 4. **Validate**: Safety checks pass 5. **Adopt**: Roll out if candidate is 10%+ better 6. **Rollback**: Revert if candidate degrades performance ### Strategy Log Track in `.agent/learnings/STRATEGIES.md`: ```markdown ## [STR-YYYYMMDD-XXX] strategy_name **Created**: ISO-8601 timestamp **Domain**: file_processing | api_interaction | error_handling **Status**: testing | adopted | rejected | superseded ### Approach What this strategy does differently ### Performance - Baseline: 85% success, 3.2s avg - Candidate: 92% success, 2.1s avg - Improvement: +7% success, -34% time ### A/B Test Results - Test Tasks: 50 - Candidate Used: 5 tasks - Wins: 4, Losses: 1, Ties: 0 ### Safety Validation - Complexity: within limits (complexity: 45/100) - Permissions: no expansion - Risk: acceptable (no high-risk changes) ### Adoption Decision - [ ] Adopt (outperforms baseline) - [ ] Reject (underperforms baseline) - [ ] Extend testing (inconclusive) --- ``` ## Promoting to Permanent Memory When learnings are broadly applicable, promote to project files: ### Promotion Targets | Target | What Belongs There | |--------|-------------------| | `CLAUDE.md` | Intent patterns, common constraints for this project | | `AGENTS.md` | Agent-specific workflows, validation rules | | `.github/copilot-instructions.md` | Security guidelines, constraint templates | | `SECURITY.md` | Security-critical constraints and validation rules | ### When to Promote Promote when: - Violation occurs 3+ times (recurring constraint) - Learning applies across multiple task types - Strategy is adopted and proven (success rate 90%+) - Security pattern prevents entire class of violations ### Promotion Examples **Violation** (recurring): > VIO-20250115-001: Attempted to modify files outside ./src > VIO-20250118-002: Attempted to modify files outside ./src > VIO-20250120-003: Attempted to modify files outside ./src **Promote to CLAUDE.md**: ```markdown ## File Modification Constraints - Only modify files within `./src` directory - Other directories are read-only unless explicitly authorized ``` **Learning** (proven strategy): > LRN-20250115-005: Batch processing with checkpoints every 10 files > Results: 95% success, 40% faster, easy rollback on failures **Promote to AGENTS.md**: ```markdown ## File Processing Strategy - Use batch processing (10 files per batch) - Create checkpoint before each batch - Enables fast rollback on errors ``` ## Configuration ### Environment Variables **Important**: All environment variables are **optional**. The skill works with sensible defaults without any configuration. **Security Note**: This skill does NOT require any credentials or secrets. All data stays local in the `.agent/` directory. No data is transmitted externally. ```bash # Paths (optional - defaults shown) export AGENT_INTENT_PATH=".agent/intents" # Default: .agent/intents export AGENT_AUDIT_PATH=".agent/audit" # Default: .agent/audit # Security Settings (optional tuning) export AGENT_RISK_THRESHOLD="medium" # low | medium | high export AGENT_AUTO_ROLLBACK="true" # true | false export AGENT_ANOMALY_THRESHOLD="0.8" # 0.0 - 1.0 # Learning Settings (optional tuning) export AGENT_LEARNING_ENABLED="true" # true | false export AGENT_MIN_SAMPLE_SIZE="10" # Min observations before adopting export AGENT_AB_TEST_RATIO="0.1" # 10% of tasks for A/B testing # Monitoring (optional tuning) export AGENT_METRICS_INTERVAL="1000" # Metrics collection (ms) export AGENT_AUDIT_LEVEL="detailed" # minimal | standard | detailed ``` ### Configuration File Create `.agent/config.json`: ```json { "security": { "requireApproval": ["file_delete", "api_write", "command_execution"], "autoRollback": true, "anomalyThreshold": 0.8, "maxPermissionScope": "read-write" }, "learning": { "enabled": true, "minSampleSize": 10, "abTestRatio": 0.1, "maxStrategyComplexity": 100 }, "monitoring": { "metricsInterval": 1000, "auditLevel": "detailed", "retentionDays": 90 } } ``` ## ID Generation Format: `TYPE-YYYYMMDD-XXX` - `INT`: Intent specification - `VIO`: Violation (failed validation) - `ANO`: Anomaly (behavioral deviation) - `LRN`: Learning (insight from execution) - `STR`: Strategy (new approach) - `RBK`: Rollback operation - `CHK`: Checkpoint Examples: `INT-20250115-001`, `VIO-20250115-A3F`, `LRN-20250115-002` ## Priority Guidelines | Priority/Severity | When to Use | |-------------------|-------------| | `critical` | Immediate security risk, data loss, system compromise | | `high` | Intent violation, unauthorized action, goal drift | | `medium` | Anomaly detected, suboptimal strategy, warning condition | | `low` | Minor deviation, optimization opportunity, observation | ## Best Practices ### Intent Specification 1. **Be specific** - Vague goals lead to validation failures 2. **List all constraints** - Implicit boundaries often get violated 3. **Define expected behavior** - Helps catch deviations early 4. **Set correct risk level** - Triggers appropriate approval gates ### Validation 1. **Validate early** - Before execution, not after 2. **Fail safe** - Block on doubt, don't assume permission 3. **Log all violations** - Even if they seem minor 4. **Review regularly** - Patterns emerge over time ### Learning 1. **Let it learn** - Requires sample size to be effective 2. **Monitor A/B tests** - Don't adopt blindly 3. **Safety first** - Reject strategies that reduce safety 4. **Promote proven patterns** - Turn learnings into permanent rules ### Audit 1. **Keep detailed logs** - Debugging requires context 2. **Archive old logs** - Retention policies prevent bloat 3. **Review anomalies** - Often reveal edge cases 4. **Share learnings** - Team benefits from documented patterns ## Detection Triggers Automatically apply intent security when: **High-Risk Operations**: - File deletion or bulk modifications - API calls with write permissions - Command execution with elevated privileges - Database modifications - Deployment operations **Autonomous Workflows**: - Multi-step task sequences - Background job execution - Scheduled automation - Agent-initiated operations **Learning Opportunities**: - Task completes successfully - Failure with identifiable cause - User provides correction - Better approach discovered ## Hook Integration (Optional) Enable automatic intent validation through agent hooks. ### Setup (Claude Code / Codex) Create `.claude/settings.json`: ```json { "hooks": { "UserPromptSubmit": [{ "matcher": "", "hooks": [{ "type": "command", "command": "./skills/self-improving-intent-security-agent/scripts/intent-capture.sh" }] }], "PostToolUse": [{ "matcher": "Bash|Edit|Write", "hooks": [{ "type": "command", "command": "./skills/self-improving-intent-security-agent/scripts/action-validator.sh" }] }] } } ``` ### Available Hook Scripts | Script | Hook Type | Purpose | |--------|-----------|---------| | `scripts/intent-capture.sh` | UserPromptSubmit | Prompts for intent specification | | `scripts/action-validator.sh` | PostToolUse | Validates actions against intent | | `scripts/learning-capture.sh` | TaskComplete | Captures learnings after tasks | See `references/hooks-setup.md` for detailed configuration. ## Quick Commands ```bash # Initialize agent structure mkdir -p .agent/{intents,violations,learnings,audit} # Count active intents grep -h "Status**: active" .agent/intents/*.md | wc -l # List high-severity violations grep -B5 "Severity**: high" .agent/violations/*.md | grep "^## \[" # Find learnings for file processing grep -l "Domain**: file_processing" .agent/learnings/*.md # Review recent rollbacks ls -lt .agent/audit/ROLLBACKS.md | head -5 # Check strategy adoption rate grep "Status**: adopted" .agent/learnings/STRATEGIES.md | wc -l ``` ## Examples See [examples/README.md](examples/README.md) for detailed usage examples: - Basic intent specification and validation - Handling violations and rollbacks - Learning from task outcomes - Strategy evolution through A/B testing - Security monitoring and anomaly detection ## References - [Architecture](references/architecture.md) - System design and components - [Intent Security](references/intent-security.md) - Validation and authorization - [Self-Improvement](references/self-improvement.md) - Learning mechanisms - [Hooks Setup](references/hooks-setup.md) - Automation configuration - [API Reference](references/api.md) - Programmatic usage ## Multi-Agent Support Works with Claude Code, Codex CLI, GitHub Copilot, and OpenClaw. See `references/multi-agent.md` for agent-specific configurations. ## Safety Guarantees ✓ Intent Alignment - Every action validated against goal ✓ Permission Boundaries - Cannot exceed authorized scope ✓ Reversibility - Checkpoint-based rollback ✓ Auditability - Complete action history ✓ Bounded Learning - Safety-constrained improvements ✓ Human Oversight - Approval gates for high-risk operations ## License MIT --- **Note**: This skill provides strong safety mechanisms but requires proper configuration and usage. Always: - Define clear, specific intents - Review violation logs regularly - Monitor learning effectiveness - Keep approval gates enabled for high-risk operations - Test in non-production environments first **Intent-based security is a powerful approach, but human judgment remains essential.**

self-improving-intent-security-agent

self-improving-intent-security-agent

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

self-improving-intent-security-agent