返回顶部
E

Embeddings

Generate, store, and search vector embeddings with provider selection, chunking strategies, and similarity search optimization.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
1,007
下载量
2
收藏
概述
安装方式
版本历史

Embeddings

## When to Use User wants to convert text/images to vectors, build semantic search, or integrate embeddings into applications. ## Quick Reference | Topic | File | |-------|------| | Provider comparison & selection | `providers.md` | | Chunking strategies & code | `chunking.md` | | Vector database patterns | `storage.md` | | Search & retrieval tuning | `search.md` | ## Core Capabilities 1. **Generate embeddings** — Call provider APIs (OpenAI, Cohere, Voyage, local models) 2. **Chunk content** — Split documents with overlap, semantic boundaries, token limits 3. **Store vectors** — Insert into Pinecone, Weaviate, Qdrant, pgvector, Chroma 4. **Similarity search** — Query with top-k, filters, hybrid search 5. **Batch processing** — Handle large datasets with rate limiting and retries 6. **Model comparison** — Evaluate embedding quality for specific use cases ## Decision Checklist Before recommending approach, ask: - [ ] What content type? (text, code, images, multimodal) - [ ] Volume and update frequency? - [ ] Latency requirements? (real-time vs batch) - [ ] Budget constraints? (API costs vs self-hosted) - [ ] Existing infrastructure? (cloud provider, database) ## Critical Rules - **Same model everywhere** — Query embeddings MUST use identical model as document embeddings - **Normalize before storage** — Most similarity metrics assume unit vectors - **Chunk with overlap** — 10-20% overlap prevents context loss at boundaries - **Batch API calls** — Never embed one item at a time in production - **Cache embeddings** — Regenerating is expensive; store with source hash - **Monitor dimensions** — Higher isn't always better; 768-1536 is usually optimal ## Provider Quick Selection | Need | Provider | Why | |------|----------|-----| | Best quality, any cost | OpenAI `text-embedding-3-large` | Top benchmarks | | Cost-sensitive | OpenAI `text-embedding-3-small` | 5x cheaper, 80% quality | | Multilingual | Cohere `embed-multilingual-v3` | 100+ languages | | Code/technical | Voyage `voyage-code-2` | Optimized for code | | Privacy/offline | Local (e5, bge, nomic) | No data leaves machine | | Images | OpenAI CLIP, Cohere multimodal | Cross-modal search | ## Common Patterns ```python # Batch embedding with retry def embed_batch(texts, model="text-embedding-3-small"): results = [] for chunk in batched(texts, 100): # API limit response = client.embeddings.create(input=chunk, model=model) results.extend([e.embedding for e in response.data]) return results # Similarity search with filter results = index.query( vector=query_embedding, top_k=10, filter={"category": "technical"}, include_metadata=True ) ```

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 embeddings-1776420006 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 embeddings-1776420006 技能

通过命令行安装

skillhub install embeddings-1776420006

下载 Zip 包

⬇ 下载 Embeddings v1.0.0

文件大小: 8.69 KB | 发布时间: 2026-4-17 18:25

v1.0.0 最新 2026-4-17 18:25
Initial release

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部