返回顶部
e

extract

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
321
下载量
1
收藏
概述
安装方式
版本历史

extract

# Extract Skill Extract clean content from specific URLs. Ideal when you know which pages you want content from. ## Authentication The script uses OAuth via the Tavily MCP server. **No manual setup required** - on first run, it will: 1. Check for existing tokens in `~/.mcp-auth/` 2. If none found, automatically open your browser for OAuth authentication > **Note:** You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. [Sign up at tavily.com](https://tavily.com) first if you don't have an account. ### Alternative: API Key If you prefer using an API key, get one at https://tavily.com and add to `~/.claude/settings.json`: ```json { "env": { "TAVILY_API_KEY": "tvly-your-api-key-here" } } ``` ## Quick Start ### Using the Script ```bash ./scripts/extract.sh '<json>' ``` **Examples:** ```bash # Single URL ./scripts/extract.sh '{"urls": ["https://example.com/article"]}' # Multiple URLs ./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}' # With query focus and chunks ./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}' # Advanced extraction for JS pages ./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}' ``` ### Basic Extraction ```bash curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": ["https://example.com/article"] }' ``` ### Multiple URLs with Query Focus ```bash curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": [ "https://example.com/ml-healthcare", "https://example.com/ai-diagnostics" ], "query": "AI diagnostic tools accuracy", "chunks_per_source": 3 }' ``` ## API Reference ### Endpoint ``` POST https://api.tavily.com/extract ``` ### Headers | Header | Value | |--------|-------| | `Authorization` | `Bearer <TAVILY_API_KEY>` | | `Content-Type` | `application/json` | ### Request Body | Field | Type | Default | Description | |-------|------|---------|-------------| | `urls` | array | Required | URLs to extract (max 20) | | `query` | string | null | Reranks chunks by relevance | | `chunks_per_source` | integer | 3 | Chunks per URL (1-5, requires query) | | `extract_depth` | string | `"basic"` | `basic` or `advanced` (for JS pages) | | `format` | string | `"markdown"` | `markdown` or `text` | | `include_images` | boolean | false | Include image URLs | | `timeout` | float | varies | Max wait (1-60 seconds) | ### Response Format ```json { "results": [ { "url": "https://example.com/article", "raw_content": "# Article Title\n\nContent..." } ], "failed_results": [], "response_time": 2.3 } ``` ## Extract Depth | Depth | When to Use | |-------|-------------| | `basic` | Simple text extraction, faster | | `advanced` | Dynamic/JS-rendered pages, tables, structured data | ## Examples ### Single URL Extraction ```bash curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": ["https://docs.python.org/3/tutorial/classes.html"], "extract_depth": "basic" }' ``` ### Targeted Extraction with Query ```bash curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": [ "https://example.com/react-hooks", "https://example.com/react-state" ], "query": "useState and useEffect patterns", "chunks_per_source": 2 }' ``` ### JavaScript-Heavy Pages ```bash curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": ["https://app.example.com/dashboard"], "extract_depth": "advanced", "timeout": 60 }' ``` ### Batch Extraction ```bash curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3", "https://example.com/page4", "https://example.com/page5" ], "extract_depth": "basic" }' ``` ## Tips - **Max 20 URLs per request** - batch larger lists - **Use `query` + `chunks_per_source`** to get only relevant content - **Try `basic` first**, fall back to `advanced` if content is missing - **Set longer `timeout`** for slow pages (up to 60s) - **Check `failed_results`** for URLs that couldn't be extracted

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 tavily-extract-1776284020 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 tavily-extract-1776284020 技能

通过命令行安装

skillhub install tavily-extract-1776284020

下载 Zip 包

⬇ 下载 extract v1.0.0

文件大小: 4.49 KB | 发布时间: 2026-4-16 17:54

v1.0.0 最新 2026-4-16 17:54
Initial release of the extract skill for Tavily's extraction API.

- Extracts clean markdown or text from up to 20 URLs per request.
- Supports both OAuth authentication (automatic browser flow) and API key for flexible setup.
- Offers options for targeted extraction using query and chunking, as well as depth selection for dynamic/JavaScript-heavy pages.
- Provides example usage with both shell scripts and direct curl commands.
- Response includes extracted content, failed URLs, and response time for full transparency.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部