返回顶部
f

firecrawl-local

|

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
94
下载量
0
收藏
概述
安装方式
版本历史

firecrawl-local

# Firecrawl Local Skill Self-hosted Firecrawl integration using the **v1 REST API**. Tests connectivity first, executes scrape/crawl/map, handles async crawl polling automatically. ## Setup (one-time) ```bash mkdir -p ~/.openclaw/skills/firecrawl-local cp run.sh ~/.openclaw/skills/firecrawl-local/run.sh chmod +x ~/.openclaw/skills/firecrawl-local/run.sh ``` The script lives at `scripts/run.sh` in this skill folder — copy it into place as above. **Prerequisites:** `curl`, `jq` installed. Firecrawl running at `localhost:3002`. **Optional env vars:** ```bash export FIRECRAWL_LOCAL_URL="http://localhost:3002" # default export FIRECRAWL_API_KEY="fc-your-key" # only needed if auth enabled ``` --- ## Commands ### Default — scrape a single page (URL only, no subcommand needed) ```bash firecrawl-local https://docs.example.com/api ``` ### Scrape — explicit, with format options ```bash firecrawl-local scrape https://docs.example.com/api firecrawl-local scrape https://docs.example.com/api --formats markdown,html ``` ### Map — discover all URLs on a site ```bash firecrawl-local map https://docs.example.com firecrawl-local map https://docs.example.com --limit 200 ``` ### Crawl — bulk extract multiple pages (async, auto-polled) ```bash firecrawl-local crawl https://docs.example.com firecrawl-local crawl https://docs.example.com --limit 30 --max-depth 2 firecrawl-local crawl https://docs.example.com --include /docs --exclude /blog ``` --- ## Agent Instructions ### When to use each command | Goal | Command | |------|---------| | Get content from one URL (quickest) | `firecrawl-local <url>` | | Discover what pages exist | `map` | | Get content from one URL with format control | `scrape` | | Ingest an entire docs site | `crawl` | | RAG pipeline ingestion | `map` → targeted `scrape` or `crawl` | ### Optimal workflows **Documentation RAG pipeline:** ``` 1. map https://docs.example.com → get full URL list 2. scrape <specific key pages> → targeted extraction 3. Pass markdown to embedding pipeline ``` **Full site ingestion:** ``` 1. crawl https://docs.example.com --limit 50 --max-depth 3 2. Results auto-polled and returned as JSON array of {url, markdown} ``` ### Parameters | Flag | Applies to | Description | |------|-----------|-------------| | `--limit N` | map, crawl | Max pages (default: 50 for crawl, 500 for map) | | `--max-depth N` | crawl | How deep to follow links (default: 2) | | `--include /path` | crawl | Only crawl URLs matching this path prefix | | `--exclude /path` | crawl | Skip URLs matching this path prefix | | `--formats list` | scrape | Comma-separated: `markdown`, `html`, `rawHtml`, `links` | ### Reading the output - **scrape**: Returns `{success, data: {markdown, html, metadata}}` - **map**: Returns `{success, links: [...]}` - **crawl**: Returns `{success, data: [{url, markdown, metadata}, ...]}` ← after polling completes ### Failure signals and fixes | Error | Cause | Fix | |-------|-------|-----| | `Local Firecrawl unavailable` | Service not running | Start Firecrawl, check port 3002 | | `success: false` | Bad URL or blocked | Check URL is reachable, try `--formats html` | | Empty `markdown` field | JS-rendered page | Firecrawl handles most JS — check if site blocks bots | | Crawl times out | Site is large | Reduce `--limit` or `--max-depth` | --- ## Script reference See `scripts/run.sh` for the full implementation. Key design decisions: - Health check uses `/health` endpoint with 3s timeout - Auth header only sent when `FIRECRAWL_API_KEY` is set - Crawl polling retries every 5s up to 60 attempts (5 minutes) - All parameters are passed via `jq` to prevent shell injection in JSON

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 firecrawl-local-1776019561 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 firecrawl-local-1776019561 技能

通过命令行安装

skillhub install firecrawl-local-1776019561

下载 Zip 包

⬇ 下载 firecrawl-local v1.0.0

文件大小: 4.51 KB | 发布时间: 2026-4-13 10:18

v1.0.0 最新 2026-4-13 10:18
- Initial release of Firecrawl Local skill for web scraping and site crawling with a self-hosted Firecrawl instance.
- Supports commands for single-page scraping, site mapping, and async multi-page crawling with format and filtering options.
- Automatically detects Firecrawl availability and handles crawl polling.
- Easy command-line integration with robust parameterization (URL filtering, limits, depth, output format).
- Clear agent guidance for documentation ingestion and RAG pipeline workflows.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部