返回顶部
d

data-pipelines

Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines, debugging job failures, or hardening ETL/ELT.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
81
下载量
0
收藏
概述
安装方式
版本历史

data-pipelines

# Data Pipelines Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages. ## When to Offer This Workflow **Trigger conditions:** - Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.) - Late data, backfills, or schema changes breaking jobs - SLA misses on freshness or row counts **Initial offer:** Use **six stages**: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack. --- ## Stage 1: Requirements & SLAs **Goal:** Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line). **Exit condition:** SLA table: pipeline → metric → threshold. --- ## Stage 2: Source Contracts **Goal:** Schema versioning; CDC vs snapshot pulls; API rate limits. ### Practices - Raw landing zone immutable; curated layers downstream --- ## Stage 3: Transforms & Idempotency **Goal:** Deterministic transforms; upsert keys; partition strategy for rewinds. ### Practices - Watermark progress for incremental loads --- ## Stage 4: Orchestration & Dependencies **Goal:** Clear DAG; retry policy; backfill without double counting; SLA miss alerts. --- ## Stage 5: Quality & Monitoring **Goal:** Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate. --- ## Stage 6: Lineage & Operations **Goal:** Column-level lineage where valuable; on-call runbook; ownership per pipeline. --- ## Final Review Checklist - [ ] SLAs and failure policy explicit - [ ] Source contracts and schema evolution path - [ ] Idempotent writes and checkpointing - [ ] Orchestration with retries and safe backfill - [ ] Data quality checks and alerts - [ ] Lineage and ownership documented ## Tips for Effective Guidance - Separate compute from storage cost awareness for large shuffles. - Pair with **etl-design** for batch patterns and **message-queues** for streaming handoffs. ## Handling Deviations - Single-script pipelines: still document inputs, outputs, and schedule.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 data-pipelines-1775984118 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 data-pipelines-1775984118 技能

通过命令行安装

skillhub install data-pipelines-1775984118

下载 Zip 包

⬇ 下载 data-pipelines v1.0.0

文件大小: 1.84 KB | 发布时间: 2026-4-13 09:58

v1.0.0 最新 2026-4-13 09:58
- Initial release of the "data-pipelines" skill.
- Provides a comprehensive workflow covering ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage.
- Includes six structured stages: requirements & SLAs, source contracts, transforms & idempotency, orchestration & dependencies, quality & monitoring, and lineage & operations.
- Offers trigger conditions for when the workflow is relevant and a detailed checklist for final review.
- Contains practical tips and guidance for both batch and streaming pipelines, with emphasis on reliability and clarity.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部