返回顶部
r

rag-pipelines

Deep RAG workflow—document ingestion, chunking, metadata, retrieval and reranking, grounding and citations, evaluation, and failure modes (hallucination, staleness). Use when building or debugging retrieval-augmented generation systems.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
129
下载量
0
收藏
概述
安装方式
版本历史

rag-pipelines

# RAG Pipelines (Deep Workflow) RAG quality is dominated by **chunking**, **retrieval**, and **evaluation**—not the LLM alone. Treat the system as data engineering plus generation with explicit failure modes. ## When to Offer This Workflow **Trigger conditions:** - Building Q&A over internal docs, support assistants, or copilots - Hallucinations, wrong citations, or stale answers - New content types (PDF, HTML, code repositories) **Initial offer:** Use **six stages**: (1) task & success criteria, (2) ingestion & cleaning, (3) chunking & metadata, (4) retrieval & rerank, (5) generation & grounding, (6) evaluation & monitoring). Confirm embedding model and retrieval stack (vector DB, search engine, hybrid). --- ## Stage 1: Task & Success Criteria **Goal:** Define what a “good” answer contains: required citations, length, tone, and when to refuse. **Exit condition:** Written rubric with examples of acceptable vs unacceptable answers. --- ## Stage 2: Ingestion & Cleaning **Goal:** Deterministic text extraction (strip boilerplate, handle PDF/OCR if needed); deduplicate documents; track source URL and `updated_at` for staleness. ### Practices - Version pipelines when parsers change (re-embed job) --- ## Stage 3: Chunking & Metadata **Goal:** Tune chunk size and overlap to query patterns—not one global token count for all content. ### Practices - Attach metadata for ACL filtering (tenant, product area) - Prefer structure-aware splits for docs (headings, sections) --- ## Stage 4: Retrieval & Rerank **Goal:** Hybrid lexical + dense retrieval often beats vector-only for keyword-heavy queries. ### Practices - Cross-encoder reranking on top-k for quality (watch latency) - Query rewriting for multi-turn contexts --- ## Stage 5: Generation & Grounding **Goal:** System prompts that require using only provided context; explicit “not found” behavior; optional citation format (snippet, doc id, link). --- ## Stage 6: Evaluation & Monitoring **Goal:** Offline golden questions with expected supporting docs; online thumbs-down reasons; monitor retrieval hit rate, nDCG@k, and age of sources used. --- ## Final Review Checklist - [ ] Rubric and refusal behavior defined - [ ] Ingestion deterministic; dedupe and versioning - [ ] Chunking and metadata match queries and ACLs - [ ] Hybrid retrieval and rerank tuned with metrics - [ ] Grounding and citation behavior enforced in prompts - [ ] Offline eval plus production monitoring ## Tips for Effective Guidance - Debug retrieval before blaming the LLM. - Long chunks hurt precision; short chunks hurt context—sweep experiments. - See also **vector-databases** and **llm-evaluation** skills for depth. ## Handling Deviations - **Code RAG:** symbol- or AST-aware chunking often beats line-based splits. - **High-stakes domains:** add human review gates and audit logs for sources cited.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 rag-pipelines-1776028941 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 rag-pipelines-1776028941 技能

通过命令行安装

skillhub install rag-pipelines-1776028941

下载 Zip 包

⬇ 下载 rag-pipelines v1.0.0

文件大小: 2.16 KB | 发布时间: 2026-4-13 11:43

v1.0.0 最新 2026-4-13 11:43
- Initial release of the "rag-pipelines" skill, featuring a comprehensive six-stage workflow for building and debugging retrieval-augmented generation (RAG) systems.
- Covers document ingestion, chunking, metadata, retrieval and reranking, grounding with citations, evaluation, and handling of failure modes like hallucination and staleness.
- Includes practical checkpoints, best practices, and a review checklist to ensure robust pipeline construction.
- Provides targeted guidance for debugging and optimizing RAG pipelines, with special notes for handling code and high-stakes domains.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部