返回顶部
o

observability-slos

Deep SLO/SLI workflow—user-centric SLIs, SLO targets and windows, error budgets, multi-window burn alerts, and policy when budget is exhausted. Use when defining reliability targets or aligning eng and product on trade-offs.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
89
下载量
0
收藏
概述
安装方式
版本历史

observability-slos

# Observability & SLOs (Deep Workflow) SLOs connect **engineering work** to **user-perceived reliability**. SLIs must be **measurable from systems** but **grounded in user journeys**. ## When to Offer This Workflow **Trigger conditions:** - Defining **99.9%** without defining **for what** - Too many pages or none; need **error budget** discipline - **Product** wants features while **stability** degrades **Initial offer:** Use **six stages**: (1) pick user journeys, (2) define SLIs, (3) set SLO targets & windows, (4) error budget policy, (5) alerting on budget burn, (6) review & iterate). Confirm **metric** stack and **dependency** SLOs from vendors. --- ## Stage 1: User Journeys **Goal:** **Critical paths** that matter if broken—checkout, login, API sync, not “CPU low”. ### Output 3–10 journeys ranked by **business impact** and **frequency**. **Exit condition:** One paragraph per journey: user intent + failure symptom. --- ## Stage 2: Define SLIs **Goal:** **Ratio** of good events over total over a window—**implementation** explicit. ### Examples - **Availability**: successful requests / valid requests (define “valid”) - **Latency**: proportion of requests faster than **T** ms ### Good SLIs - **Objective**, **low-cardinality** enough to measure reliably **Exit condition:** SLI formula + data source (metrics, logs, probes). --- ## Stage 3: SLO Targets & Windows **Goal:** **Target** (e.g., 99.9% monthly) implies **allowed** bad minutes—make it explicit. ### Practices - **Rolling** 30d common; align with **release** cadence - **Tier** services: not everything needs same SLO **Exit condition:** Published table: journey → SLI → target → window. --- ## Stage 4: Error Budget Policy **Goal:** **What we do** when budget is healthy vs exhausted. ### Policy ideas - Budget healthy → ship features; low → freeze risky changes, focus on reliability - **Escalation** when budget burns fast (multi-window alerts) **Exit condition:** Written policy with product sign-off. --- ## Stage 5: Alerting on Burn **Goal:** Page on **budget burn rate**, not every blip—**multi-window** **multi-burn-rate** pattern when using Google-style SLO alerting. ### Practices - **Fast burn** = page soon; **slow burn** = ticket/track **Exit condition:** Alert rules linked to runbooks. --- ## Stage 6: Review & Iterate **Goal:** SLOs **drift** with architecture—**quarterly** review; adjust targets with data. --- ## Final Review Checklist - [ ] Journeys and SLIs tied to real user pain - [ ] Targets realistic vs dependencies and cost - [ ] Error budget policy agreed with product - [ ] Alerts on burn, not noisy symptom spam - [ ] Review cadence scheduled ## Tips for Effective Guidance - Translate **99.9%** to **minutes/month** of allowed badness. - **SLA** (contract) vs **SLO** (internal)—don’t confuse. - Dependency SLO caps what you can promise—surface that early. ## Handling Deviations - **No metrics yet**: start with **proxy SLI** (synthetic probes) and improve instrumentation. - **Batch systems**: event processing lag as SLI instead of HTTP.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 observability-slos-1776028882 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 observability-slos-1776028882 技能

通过命令行安装

skillhub install observability-slos-1776028882

下载 Zip 包

⬇ 下载 observability-slos v1.0.0

文件大小: 2.31 KB | 发布时间: 2026-4-13 11:14

v1.0.0 最新 2026-4-13 11:14
- Initial release of the observability-slos skill with a deep, user-centric SLO/SLI workflow.
- Guides users through six structured stages: selecting user journeys, defining SLIs, setting SLO targets and windows, establishing error budget policy, configuring burn-rate alerting, and continuous review.
- Emphasizes actionable output, practical exit conditions, and alignment between engineering and product.
- Includes a final review checklist, real-world tips, and deviation handling for systems without metrics or with batch processing.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部