semantic-cache

# Semantic Cache Cache LLM responses by meaning using Redis vector search. Similar questions return cached answers instantly instead of making expensive API calls. ## How It Works 1. User asks a question or makes an LLM request 2. The question is embedded into a vector using OpenAI text-embedding-3-small 3. Redis vector search finds semantically similar cached queries (cosine similarity > 0.80) 4. **Cache hit**: Return the cached response instantly (~100ms) 5. **Cache miss**: Pass through to the LLM, cache the response for future similar queries ## Commands ### Cache a query and response ```bash node scripts/cache.js store "What is the capital of France?" "The capital of France is Paris." ``` ### Check cache for a similar query ```bash node scripts/cache.js lookup "What's France's capital city?" ``` ### Cache stats ```bash node scripts/cache.js stats ``` ### Clear all cached entries ```bash node scripts/cache.js clear ``` ### Interactive mode — wraps any LLM call with caching ```bash node scripts/cache.js query "Your question here" ``` This checks cache first. On miss, calls OpenAI, caches the result, and returns it. ## When to Use This Skill - Before making any LLM API call, check if a semantically similar query was already answered - When building agents that answer repetitive questions (support bots, FAQ systems) - When you want to reduce OpenAI/Anthropic API costs by 40-80% - When you need faster response times for common queries ## Configuration Set these environment variables: - `REDIS_URL` — Redis connection string with vector search support (Redis Cloud or Redis Stack) - `OPENAI_API_KEY` — For generating embeddings - `SEMANTIC_CACHE_THRESHOLD` — Similarity threshold 0-1 (default: 0.80, higher = stricter matching) - `SEMANTIC_CACHE_TTL` — Cache TTL in seconds (default: 86400 = 24 hours) ## Example Workflow ``` User: "How do I reset my password?" -> Embed query -> Search Redis -> MISS -> Call LLM -> Get response -> Cache it -> Return response User: "I forgot my password, how do I change it?" -> Embed query -> Search Redis -> HIT (92.7% similar) -> Return cached response in 8ms (saved ~2 seconds + API cost) ``` ## Performance - Cache lookup: ~5-15ms (vs 1-5 seconds for LLM call) - Embedding generation: ~50-100ms - Storage per entry: ~6KB (1536-dim vector + metadata) - Supports millions of cached entries

semantic-cache

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

semantic-cache

semantic-cache

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement