返回顶部
a

arxiv-summarizer-orchestrator

End-to-end orchestration skill for periodic arXiv collection and reporting using three sub-skills: arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter. Supports manual language control across all markdown outputs and Stage-B processing strategy (`subagent_parallel` default max 5, or serial)."

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 0.1.1
安全检测
已通过
767
下载量
0
收藏
概述
安装方式
版本历史

arxiv-summarizer-orchestrator

# ArXiv Summarizer Orchestrator Run the full pipeline by composing three sub-skills. ## Sub-skill Order 1. `arxiv-search-collector` 2. `arxiv-paper-processor` 3. `arxiv-batch-reporter` ## Workflow Parameters - `language`: manual language parameter used by all stages. Default is English when omitted. - `paper_processing_mode`: `subagent_parallel` or `serial`. - `max_parallel_papers`: default `5` when `paper_processing_mode=subagent_parallel`. ## Workflow ### Stage A: Collection Setup + Query Retrieval 1. Initialize one run with `arxiv-search-collector/scripts/init_collection_run.py`. 2. Model generates multiple focused queries from original topic and writes a minimal `query_plan.json` (`label` + `query` only). 3. Run `arxiv-search-collector/scripts/fetch_queries_batch.py` with the plan file (recommended). 4. (Optional fallback) call `arxiv-search-collector/scripts/fetch_query_metadata.py` manually for one-by-one fetch. 5. Model reads each indexed query list and decides keep indexes. 6. Merge selected items with `arxiv-search-collector/scripts/merge_selected_papers.py`. 7. If relevance/coverage is still not good, iterate Stage A: - generate another query plan with new labels, - fetch again, - re-merge with `--incremental` and updated `selection-json`. - set weak labels to empty keep list (`[]`) to explicitly drop them. Pass `--language <LANG>` to collector scripts so all generated markdown files in Stage A follow the selected language. Use serial query fetch in Stage A with conservative controls (for example `--min-interval-sec 5`, `--retry-max 4`). Default collector settings already include retries/backoff and run-local throttle state (`<run_dir>/.runtime/arxiv_api_state.json`), so manual tuning is usually unnecessary. Prefer cache reuse (no `--force`) unless query parameters changed or data refresh is required. Output: one run directory with per-paper metadata subdirectories. ### Stage B: Per-paper Artifact Download + Manual Summary For each paper directory, invoke sub-skill `arxiv-paper-processor` once and let that skill produce `<paper_dir>/summary.md`. Recommended pre-step for many papers: 1. Run one batch artifact download before per-paper reading: ```bash python3 arxiv-paper-processor/scripts/download_papers_batch.py \ --run-dir /path/to/run \ --artifact source_then_pdf \ --max-workers 3 \ --min-interval-sec 5 \ --language <LANG> ``` Per-paper execution steps (inside `arxiv-paper-processor`): 1. If `<paper_dir>/summary.md` already exists and is complete, skip this paper. 2. If usable source (`source/source_extract/*.tex`) or PDF (`source/paper.pdf`) already exists, skip download. 3. If artifacts are missing, download source with `arxiv-paper-processor/scripts/download_arxiv_source.py`. 4. If source is unusable, download PDF with `arxiv-paper-processor/scripts/download_arxiv_pdf.py`. 5. Model reads content and manually writes `<paper_dir>/summary.md` by reference format, in `language`. Parallel strategy for many papers: - Default: `paper_processing_mode=subagent_parallel` with `max_parallel_papers=5`. - Optional: `paper_processing_mode=serial` to process one paper at a time. - In parallel mode, run multiple `arxiv-paper-processor` instances in batches; concurrent papers must not exceed `max_parallel_papers`. - Wait for one batch to finish before starting the next batch. - In serial mode, run exactly one `arxiv-paper-processor` instance at a time. - Subagent workers should only own one paper directory each to avoid file conflicts. - Do not use scripts to auto-compose summary text; scripts are download-only tools. Output: all paper directories contain `summary.md`. ### Stage C: Bundle + Final Hierarchical Report 1. Run `arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language <LANG>`. 2. Model reads `summaries_bundle.md` and writes `collection_report_template.md` in base dir. 3. In template, each paper leaf entry must include one standalone placeholder line: `{{ARXIV_BRIEF:<arxiv_id>}}`. 4. Run `arxiv-batch-reporter/scripts/render_collection_report.py` to generate final `collection_report.md`. 5. Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper `summary.md` section 10 via script injection. If `language` is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language. ## Periodic Scheduling This orchestrator is suitable for cron/scheduled execution in OpenClaw: - Frequency examples: daily, weekly, monthly. - For rolling windows, use lookback (`1d`, `7d`, `30d`) when initializing runs. ## Output Layout `<output-root>/<topic>-<timestamp>-<range>/` - `task_meta.json`, `task_meta.md` - `query_results/`, `query_selection/` - `<arxiv_id>/metadata.md` + downloaded source/pdf + `summary.md` - `summaries_bundle.md` - `collection_report_template.md` - final rendered collection report (e.g. `collection_report.md`) Use `references/workflow-checklist.md` as execution checklist. ## Related Skills This is the top-level orchestration skill. Before using it, install and enable these three sub-skills: - `arxiv-search-collector` - `arxiv-paper-processor` - `arxiv-batch-reporter` Execution order inside this orchestrator: 1. `arxiv-search-collector` (Stage A) 2. `arxiv-paper-processor` (Stage B) 3. `arxiv-batch-reporter` (Stage C)

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 arxiv-summarizer-orchestrator-1776419948 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 arxiv-summarizer-orchestrator-1776419948 技能

通过命令行安装

skillhub install arxiv-summarizer-orchestrator-1776419948

下载 Zip 包

⬇ 下载 arxiv-summarizer-orchestrator v0.1.1

文件大小: 4.87 KB | 发布时间: 2026-4-17 18:16

v0.1.1 最新 2026-4-17 18:16
Document cross-skill relationships in all SKILL.md files

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部