firecrawl-local

# Firecrawl Local Skill Self-hosted Firecrawl integration using the **v1 REST API**. Tests connectivity first, executes scrape/crawl/map, handles async crawl polling automatically. ## Setup (one-time) ```bash mkdir -p ~/.openclaw/skills/firecrawl-local cp run.sh ~/.openclaw/skills/firecrawl-local/run.sh chmod +x ~/.openclaw/skills/firecrawl-local/run.sh ``` The script lives at `scripts/run.sh` in this skill folder — copy it into place as above. **Prerequisites:** `curl`, `jq` installed. Firecrawl running at `localhost:3002`. **Optional env vars:** ```bash export FIRECRAWL_LOCAL_URL="http://localhost:3002" # default export FIRECRAWL_API_KEY="fc-your-key" # only needed if auth enabled ``` --- ## Commands ### Default — scrape a single page (URL only, no subcommand needed) ```bash firecrawl-local https://docs.example.com/api ``` ### Scrape — explicit, with format options ```bash firecrawl-local scrape https://docs.example.com/api firecrawl-local scrape https://docs.example.com/api --formats markdown,html ``` ### Map — discover all URLs on a site ```bash firecrawl-local map https://docs.example.com firecrawl-local map https://docs.example.com --limit 200 ``` ### Crawl — bulk extract multiple pages (async, auto-polled) ```bash firecrawl-local crawl https://docs.example.com firecrawl-local crawl https://docs.example.com --limit 30 --max-depth 2 firecrawl-local crawl https://docs.example.com --include /docs --exclude /blog ``` --- ## Agent Instructions ### When to use each command | Goal | Command | |------|---------| | Get content from one URL (quickest) | `firecrawl-local <url>` | | Discover what pages exist | `map` | | Get content from one URL with format control | `scrape` | | Ingest an entire docs site | `crawl` | | RAG pipeline ingestion | `map` → targeted `scrape` or `crawl` | ### Optimal workflows **Documentation RAG pipeline:** ``` 1. map https://docs.example.com → get full URL list 2. scrape <specific key pages> → targeted extraction 3. Pass markdown to embedding pipeline ``` **Full site ingestion:** ``` 1. crawl https://docs.example.com --limit 50 --max-depth 3 2. Results auto-polled and returned as JSON array of {url, markdown} ``` ### Parameters | Flag | Applies to | Description | |------|-----------|-------------| | `--limit N` | map, crawl | Max pages (default: 50 for crawl, 500 for map) | | `--max-depth N` | crawl | How deep to follow links (default: 2) | | `--include /path` | crawl | Only crawl URLs matching this path prefix | | `--exclude /path` | crawl | Skip URLs matching this path prefix | | `--formats list` | scrape | Comma-separated: `markdown`, `html`, `rawHtml`, `links` | ### Reading the output - **scrape**: Returns `{success, data: {markdown, html, metadata}}` - **map**: Returns `{success, links: [...]}` - **crawl**: Returns `{success, data: [{url, markdown, metadata}, ...]}` ← after polling completes ### Failure signals and fixes | Error | Cause | Fix | |-------|-------|-----| | `Local Firecrawl unavailable` | Service not running | Start Firecrawl, check port 3002 | | `success: false` | Bad URL or blocked | Check URL is reachable, try `--formats html` | | Empty `markdown` field | JS-rendered page | Firecrawl handles most JS — check if site blocks bots | | Crawl times out | Site is large | Reduce `--limit` or `--max-depth` | --- ## Script reference See `scripts/run.sh` for the full implementation. Key design decisions: - Health check uses `/health` endpoint with 3s timeout - Auth header only sent when `FIRECRAWL_API_KEY` is set - Crawl polling retries every 5s up to 60 attempts (5 minutes) - All parameters are passed via `jq` to prevent shell injection in JSON

firecrawl-local

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

firecrawl-local

firecrawl-local

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement