返回顶部
w

watchdog-heartbeat

Monitor service health, heartbeat freshness, stuck workflows, and trigger recovery or degraded mode. Use on: high-frequency schedule, after system startup, when a workflow stalls, when heartbeat freshness must be verified. Triggered by watchdog cron jobs or health check requests.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
75
下载量
0
收藏
概述
安装方式
版本历史

watchdog-heartbeat

# Watchdog Heartbeat Provide observability and recovery awareness for a resident OpenClaw system. Verify process aliveness, heartbeat freshness, and workflow integrity. ## Input Required: - `service_list` — list of monitored services and their expected health states - `health_endpoints` — map of service → health check endpoint or method - `heartbeat_records` — recent heartbeat timestamps per agent/skill - `workflow_status_records` — current status of all active workflows - `restart_records` — history of service restarts and recovery events ## Output Schema ``` service_health_summary: { service: string status: "healthy" | "degraded" | "down" | "unknown" last_check: string # ISO-8601 latency_ms: number | null error: string | null }[] expired_heartbeat_list: { agent_or_skill: string last_heartbeat: string # ISO-8601 seconds_expired: number severity: "warning" | "critical" }[] stuck_workflow_list: { workflow_id: string workflow_name: string stuck_since: string # ISO-8601 stuck_duration_min: number last_progress: string | null severity: "warning" | "critical" }[] recovery_recommendation: { action: "restart" | "notify" | "escalate" | "no_action" | "degraded_mode" target: string reason: string }[] degraded_mode_recommendation: { affected_services: string[] degraded_features: string[] estimated_recovery_time: string | null user_impact: string } watchdog_log: { check_id: string check_time: string # ISO-8601 services_checked: number heartbeats_checked: number workflows_checked: number issues_found: number observability_gap: string[] | null } ``` ## Rules 1. **Process alive ≠ healthy.** Check recent success, not just process existence. 2. **Expired heartbeat triggers attention.** Do not ignore stale heartbeats. 3. **Stuck workflows must be explicitly surfaced.** Don't let them disappear into silence. 4. **Silent failure is unacceptable.** If something fails and no one is notified, that's a system failure. 5. **Distinguish warning from critical.** Warning = may self-recover. Critical = requires intervention. ## Heartbeat Expiry Thresholds | Seconds Expired | Severity | |----------------|----------| | < 60s | healthy | | 60s – 300s | warning | | > 300s | critical | ## Workflow Stuck Thresholds | Duration | Severity | |----------|----------| | < 10 min | healthy (in progress) | | 10 – 30 min | warning | | > 30 min | critical | ## Recovery Actions - `no_action` — within normal parameters - `notify` — alert human, no automatic restart - `restart` — attempt automatic restart - `escalate` — human intervention required - `degraded_mode` — reduce functionality, maintain partial service ## Failure Handling If monitoring data is incomplete: - Set `observability_gap` with missing field names - Report `status = "unknown"` for affected services - Do not fabricate health states - Recommend `escalate` if critical services have observability gaps

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 watchdog-heartbeat-1775978641 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 watchdog-heartbeat-1775978641 技能

通过命令行安装

skillhub install watchdog-heartbeat-1775978641

下载 Zip 包

⬇ 下载 watchdog-heartbeat v1.0.0

文件大小: 2.04 KB | 发布时间: 2026-4-13 12:31

v1.0.0 最新 2026-4-13 12:31
Initial release: monitor service health, heartbeat freshness, stuck workflows

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部