返回顶部
e

etl-design

Deep ETL/ELT design workflow—extract patterns, transforms, loading strategies, idempotency, validation, and reconciliation. Use when designing batch data flows between systems or hardening pipelines for correctness.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
104
下载量
0
收藏
概述
安装方式
版本历史

etl-design

# ETL Design ETL is **correctness under change**: schema drift, partial loads, retries, and reconciliation with upstream systems. ## When to Offer This Workflow **Trigger conditions:** - Batch loads into warehouse or data lake - Choosing between CDC, snapshots, and incremental watermarks - Missing rows, duplicates, or inconsistent aggregates downstream **Initial offer:** Use **six stages**: (1) source contract, (2) extract strategy, (3) transform rules, (4) load & dedupe, (5) validation, (6) operations & backfill). Confirm batch window and SLA. --- ## Stage 1: Source Contract **Goal:** Document schema, primary keys, change indicators (`updated_at`, CDC log position), and access constraints (rate limits, read replicas). --- ## Stage 2: Extract Strategy **Goal:** Full dump vs incremental watermark vs CDC—trade freshness, source load, and complexity. ### Practices - CDC for large sources; snapshots for small or infrequent tables --- ## Stage 3: Transform Rules **Goal:** Deterministic transforms; surrogate keys; business rules versioned; handling of deletes (tombstones vs hard deletes). --- ## Stage 4: Load & Dedupe **Goal:** Upsert keys; partitions; rerunnable jobs with same batch id producing the same outcome (idempotent load). --- ## Stage 5: Validation **Goal:** Row counts, checksums, key uniqueness, referential checks; alert on threshold breaches. --- ## Stage 6: Operations & Backfill **Goal:** Replay by date range; monitor lag; dead-letter or quarantine bad rows with reason codes. --- ## Final Review Checklist - [ ] Source contract and keys documented - [ ] Extract mode matches SLA and source constraints - [ ] Transforms deterministic and versioned - [ ] Idempotent load strategy - [ ] Validation and reconciliation defined ## Tips for Effective Guidance - Plan for late-arriving facts and slowly changing dimensions in analytics paths. - Pair with **data-pipelines** for orchestration and monitoring. ## Handling Deviations - Near-real-time: document micro-batch or streaming semantics separately.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 etl-design-1776028621 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 etl-design-1776028621 技能

通过命令行安装

skillhub install etl-design-1776028621

下载 Zip 包

⬇ 下载 etl-design v1.0.0

文件大小: 1.75 KB | 发布时间: 2026-4-13 10:11

v1.0.0 最新 2026-4-13 10:11
Initial release of the etl-design skill, providing a structured ETL/ELT workflow for robust data pipeline design.

- Introduces a six-stage ETL design framework: source contract, extract strategy, transform rules, load & dedupe, validation, and operations & backfill.
- Covers best practices for handling schema drift, partial loads, retries, and data reconciliation.
- Includes detailed guidance for batch data loads, choosing extract modes, and ensuring idempotency and correctness.
- Provides a comprehensive final review checklist and operational tips.
- Clarifies when to use this workflow and considerations for deviations like near-real-time scenarios.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部