Search Engine

## Setup On first use, read `setup.md` and establish activation behavior, system scope, and data constraints before proposing implementation steps. ## When to Use User needs to create, redesign, or scale a search engine for applications, documentation, products, or internal knowledge bases. Agent handles architecture planning, indexing strategy, retrieval design, relevance controls, evaluation loops, and rollout safety. ## Architecture Memory lives in `~/search-engine/`. See `memory-template.md` for baseline structure and status values. ```text ~/search-engine/ |-- memory.md # Persistent context, constraints, and active priorities |-- requirements.md # Retrieval goals, latency targets, and relevance expectations |-- experiments.md # Offline experiments and tuning decisions `-- incidents.md # Production issues, root cause, and remediation notes ``` ## Quick Reference Use the smallest relevant file for the task. | Topic | File | |-------|------| | Setup and activation behavior | `setup.md` | | Memory template and status model | `memory-template.md` | | Architecture options and component choices | `architecture-blueprint.md` | | Retrieval and ranking strategy patterns | `retrieval-patterns.md` | | Quality measurement and evaluation loops | `evaluation-metrics.md` | | Delivery and rollout gates | `implementation-checklist.md` | ## Data Storage Local notes stay in `~/search-engine/`: - requirements and relevance objectives - data source assumptions and indexing decisions - experiment outcomes and deployment safeguards ## Core Rules ### 1. Start with a Retrieval Contract, Not with Tools Before selecting engines, define the contract: - query types to support (keyword, phrase, semantic, hybrid) - response format, latency budget, and freshness target - error tolerance and fallback behavior A search engine without a contract becomes an untestable collection of features. ### 2. Design Ingestion and Indexing as a Deterministic Pipeline Every document should pass explicit stages: - ingestion source validation and deduplication - normalization and field extraction - chunking policy with stable identifiers - indexing with repeatable transforms Deterministic pipelines reduce drift between environments and simplify debugging. ### 3. Separate Recall Layers from Precision Layers Treat retrieval as a staged system: - broad candidate retrieval first (lexical, vector, or hybrid) - reranking and business rules second - formatting and explanation last Mixing all concerns in one step hides failures and makes tuning unpredictable. ### 4. Define Relevance Features as Versioned Policy Relevance changes must be tracked as policy versions: - feature weights and boosts - typo tolerance and synonym policy - filtering, faceting, and tie-break rules Never ship silent relevance changes without versioned notes and measured deltas. ### 5. Evaluate Offline Before Production Writes For each relevance or indexing change: - run benchmark queries with labeled expectations - measure hit quality, ordering quality, and coverage - compare against current baseline and note regressions If evaluation evidence is weak, keep the current configuration and iterate. ### 6. Build Idempotent Index Operations and Safe Rollback Index updates must be replay-safe: - stable document ids and version checks - resumable batch jobs with checkpoints - alias-based or dual-index rollback plan Without idempotency and rollback, incident recovery becomes guesswork. ### 7. Match Complexity to Workload Reality Use the minimum architecture that meets requirements: - avoid distributed complexity for small datasets - avoid simplistic models for multilingual or high-noise corpora - revisit design as scale and usage patterns change Over-engineering and under-engineering both create expensive rework. ## Common Traps - Starting with vendor selection before defining retrieval requirements -> architecture lock-in with unclear success criteria - Indexing raw data without field-level normalization -> poor filters, weak facets, and noisy matching - Tuning relevance on one happy-path query set -> brittle results in real user traffic - Applying business boosts without guardrails -> top results become commercially biased and less useful - Shipping retrieval changes without offline baseline comparison -> regressions discovered only by users - Running full reindex jobs without resumability -> long outages and partial data corruption - Ignoring multilingual tokenization differences -> severe precision drop for non-English users ## Security & Privacy Data that leaves your machine: - none by default in this instruction set - only user-approved integration traffic when the user explicitly connects external services Data that stays local: - planning notes and experiment logs under `~/search-engine/` - constraints, relevance decisions, and rollback records This skill does NOT: - collect unrelated files or credentials - require hidden network calls - bypass user-confirmed environment boundaries ## Related Skills Install with `clawhub install <slug>` if user confirms: - `api` - Define stable APIs for indexing, querying, and retrieval orchestration - `elasticsearch` - Implement production indexing and query execution on Elasticsearch - `meilisearch` - Ship lightweight retrieval stacks with fast iteration cycles - `engineering` - Structure implementation workstreams and technical decision logs - `software-engineer` - Improve delivery quality with testable architecture and rollout discipline ## Feedback - If useful: `clawhub star search-engine` - Stay updated: `clawhub sync`

Search Engine

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

Search Engine

Search Engine

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement