返回顶部
g

guard

Deep AI safety guardrails workflow—policy definition, input/output filtering, monitoring, escalation, and false-positive handling. Use when reducing harmful outputs, misuse, or policy violations in LLM products.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
91
下载量
0
收藏
概述
安装方式
版本历史

guard

# AI Guardrails (Deep Workflow) Guardrails turn **product and legal policy** into **enforced behavior**: blocking, rewriting, logging, and human review—with attention to **false positives** and **latency**. ## When to Offer This Workflow **Trigger conditions:** - Launching consumer-facing LLM features - Jailbreak attempts, policy violations, or PII leakage risks - Region-specific compliance (minors, regulated advice) **Initial offer:** Use **six stages**: (1) policy scope, (2) threat model, (3) controls stack, (4) implementation patterns, (5) monitoring & review, (6) iteration & appeals). Confirm latency budget and jurisdictions. --- ## Stage 1: Policy Scope **Goal:** Define prohibited categories (hate, sexual content, violence, self-harm, malware instructions, etc.) and required disclaimers for sensitive domains (medical, legal). **Exit condition:** Policy document owned by legal/product; escalation path for gray areas. --- ## Stage 2: Threat Model **Goal:** Identify adversaries (prompt injection, data exfiltration, tool abuse) and assets (user data, system prompts, connectors). --- ## Stage 3: Controls Stack **Goal:** Layer defenses: input screening, model safety APIs, output classifiers, tool sandboxing, allowlists for tools and URLs. --- ## Stage 4: Implementation Patterns **Goal:** Structured refusal messages; telemetry on every block; distinguish block vs rewrite vs warn; avoid silent failures. --- ## Stage 5: Monitoring & Review **Goal:** Sample borderline cases for human review; dashboards on block rates by category; abuse spike alerts. --- ## Stage 6: Iteration & Appeals **Goal:** User appeals path where appropriate; version policy changes; measure false positives by locale and use case. --- ## Final Review Checklist - [ ] Policy categories and owners defined - [ ] Threat model aligned with product - [ ] Layered controls with clear responsibilities - [ ] Telemetry and review for edge cases - [ ] Appeals and iteration process where applicable ## Tips for Effective Guidance - Defense in depth—no single classifier is sufficient. - Pair with **moderation** for UGC and **tool-calling** for agent safety. ## Handling Deviations - Enterprise internal bots: emphasize data-leak prevention and connector scope over public “safety” categories alone.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 guard-1775975221 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 guard-1775975221 技能

通过命令行安装

skillhub install guard-1775975221

下载 Zip 包

⬇ 下载 guard v1.0.0

文件大小: 1.84 KB | 发布时间: 2026-4-13 10:31

v1.0.0 最新 2026-4-13 10:31
Version 1.0.0 – Initial Release

- Introduces a comprehensive deep AI safety guardrails workflow for LLM-based products.
- Details a six-stage process: policy scope, threat modeling, controls stack, implementation patterns, monitoring & review, and iteration & appeals.
- Provides specific guidance on policy definition, input/output filtering, monitoring, escalation, and false-positive handling.
- Includes review checklist and tips for best practices in deploying safety guardrails for AI features.
- Addresses enterprise-specific considerations (e.g., data-leak prevention for internal bots).

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部