semanticscholar-skill

# Semantic Scholar Search Workflow Search academic papers via the Semantic Scholar API using a structured 4-phase workflow. **Critical rule:** NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside `s2.py` automatically. ## Phase 1: Understand & Plan Parse the user's intent and choose a search strategy: ### Decision Tree | User wants... | Strategy | Function | |---------------|----------|----------| | Broad topic exploration | Relevance search | `search_relevance()` | | Precise technical terms, exact phrases | Bulk search with boolean operators | `search_bulk()` with `build_bool_query()` | | Specific passages or methods | Snippet search | `search_snippets()` | | Known paper by title | Title match | `match_title()` | | Known paper by DOI/PMID/ArXiv | Direct lookup | `get_paper()` | | Papers citing a known work | Citation traversal | `get_citations()` | | Related to one paper | Single-seed recommendations | `find_similar()` | | Related to multiple papers | Multi-seed recommendations | `recommend()` | | Find a researcher | Author search | `search_authors()` | | Researcher's profile | Author details | `get_author()` | | Researcher's publications | Author papers | `get_author_papers()` | ### Query Construction Rules - **Ambiguous terms** (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use `build_bool_query()` with exact phrases and exclusions - Example: `build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])` - **Multi-context queries** (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with `deduplicate()` - **Broad topics**: Use `search_relevance()` with filters (year, venue, fieldsOfStudy, minCitationCount) ### Plan Filters | Filter | Use when | |--------|----------| | `year="2020-"` | Recent work only | | `publication_date="2024-01-01:2024-06-30"` | Precise date range (YYYY-MM-DD) | | `fields_of_study="Medicine"` | Restrict to domain | | `min_citations=10` | Only established papers | | `pub_types="Review"` | Find reviews/meta-analyses | | `pub_types="ClinicalTrial"` | Clinical trials only | | `open_access=True` | Only open access papers | **Checkpoint:** Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results. ## Phase 2: Execute Search Write ONE Python script. Example: ```python import sys, os SKILL_DIR = next((p for p in [ os.path.expanduser("~/.claude/skills/semanticscholar-skill"), os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"), ] if os.path.isdir(p)), ".") sys.path.insert(0, SKILL_DIR) from s2 import * # Build precise query q = build_bool_query( phrases=["stem-like T cells"], required=["CD4", "IBD"], excluded=["mesenchymal"] ) papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine") papers = deduplicate(papers) print(format_results(papers, "Stem-like CD4 T cells in IBD")) ``` Execute with: `python3 /tmp/s2_search.py` **Rules:** - Import everything from s2: `from s2 import *` - Write script to `/tmp/s2_search.py` (or similar temp path) - One Bash call to execute. Never chain multiple API calls via separate Bash invocations. - Rate limiting, retries, and backoff are automatic inside s2.py **Checkpoint:** Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting. ### Worked Examples **Example 1: Author workflow** — "Find papers by Yann LeCun on self-supervised learning" ```python import sys, os SKILL_DIR = next((p for p in [ os.path.expanduser("~/.claude/skills/semanticscholar-skill"), os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"), ] if os.path.isdir(p)), ".") sys.path.insert(0, SKILL_DIR) from s2 import * authors = search_authors("Yann LeCun", max_results=5) print(format_authors(authors)) # Use the first match's ID to get their papers author_id = authors[0]["authorId"] papers = get_author_papers(author_id, max_results=50) # Filter locally for topic ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()] print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning")) ``` **Example 2: Citation chain** — "Who cited the Transformer paper and what did they build on?" ```python import sys, os SKILL_DIR = next((p for p in [ os.path.expanduser("~/.claude/skills/semanticscholar-skill"), os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"), ] if os.path.isdir(p)), ".") sys.path.insert(0, SKILL_DIR) from s2 import * paper = get_paper("DOI:10.48550/arXiv.1706.03762") print(f"Title: {paper['title']}, Citations: {paper['citationCount']}") # Get top-cited papers that cite this one citing = get_citations(paper["paperId"], max_results=50) citing_papers = [c["citingPaper"] for c in citing if c.get("citingPaper")] citing_papers.sort(key=lambda p: p.get("citationCount", 0), reverse=True) print(format_results(citing_papers, "Most-cited papers citing Attention Is All You Need")) ``` **Example 3: Multi-seed recommendations with BibTeX export** — "Find papers like these two but not about NLP" ```python import sys, os SKILL_DIR = next((p for p in [ os.path.expanduser("~/.claude/skills/semanticscholar-skill"), os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"), ] if os.path.isdir(p)), ".") sys.path.insert(0, SKILL_DIR) from s2 import * recs = recommend( positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"], negative_ids=["ARXIV:1706.03762"], limit=20 ) print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP")) # Export BibTeX for top results bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles") print(export_bibtex(bib_data)) ``` ## Phase 3: Summarize & Present - Use `format_results()` for consistent output (summary table + top-10 details) - If user's language is Chinese, present summaries in Chinese - Always note total results count and search strategy used - Highlight most relevant papers based on the user's specific question ## Phase 4: User Interaction Loop After presenting results, **always offer these options:** 1. **Translate** — titles/summaries to Chinese (or other language) 2. **Details** — full abstract for specific paper numbers 3. **Refine** — narrow or expand search with different terms/filters 4. **Similar** — find papers similar to a specific result (`find_similar()`) 5. **Citations** — who cited a specific paper (`get_citations()`) 6. **Export** — save results via `export_bibtex()`, `export_markdown()`, or `export_json()` 7. **Done** — end search session Loop until user says done. Each follow-up uses the same single-script pattern. --- ## API Quick Reference ### Helper Module (`s2.py`) ```python import sys, os SKILL_DIR = next((p for p in [ os.path.expanduser("~/.claude/skills/semanticscholar-skill"), os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"), ] if os.path.isdir(p)), ".") sys.path.insert(0, SKILL_DIR) from s2 import * ``` ### Paper Search Functions | Function | Purpose | Max Results | |----------|---------|-------------| | `search_relevance(query, **filters)` | Simple broad search | 1,000 | | `search_bulk(query, sort=..., **filters)` | Boolean precise search | 10,000,000 | | `search_snippets(query, **filters)` | Full-text passage search | 1,000 | | `match_title(title)` | Exact title match | 1 | | `get_paper(paper_id)` | Single paper details | — | | `get_citations(paper_id, max_results)` | Who cited this | 10,000 | | `get_references(paper_id, max_results)` | What this cites | 10,000 | | `find_similar(paper_id, limit, pool)` | Single-seed recommendations | 500 | | `recommend(positive_ids, negative_ids, limit)` | Multi-seed recommendations | 500 | | `batch_papers(ids, fields)` | Batch lookup (≤500) | — | ### Author Functions | Function | Purpose | Max Results | |----------|---------|-------------| | `search_authors(query, max_results)` | Find researchers by name | 1,000 | | `get_author(author_id)` | Author profile (affiliations, h-index) | — | | `get_author_papers(author_id, max_results)` | Author's publications | 10,000 | | `get_paper_authors(paper_id, max_results)` | Paper's author list | 1,000 | | `batch_authors(ids, fields)` | Batch author lookup (≤1000) | — | ### Filter Parameters (kwargs) `year`, `publication_date`, `venue`, `fields_of_study`, `min_citations`, `pub_types`, `open_access` - `year`: `"2020-"`, `"-2019"`, `"2016-2020"` - `publication_date`: `"2024-01-01:2024-06-30"` (YYYY-MM-DD range, open-ended OK) - `pub_types`: `Review`, `JournalArticle`, `Conference`, `ClinicalTrial`, `MetaAnalysis`, `Dataset`, `Book`, `CaseReport`, `Editorial`, `LettersAndComments`, `News`, `Study`, `BookSection` ### Boolean Query Syntax (bulk search only) | Syntax | Example | Meaning | |--------|---------|---------| | `"..."` | `"deep learning"` | Exact phrase | | `+` | `+transformer` | Must include | | `-` | `-survey` | Exclude | | `\|` | `CNN \| RNN` | OR | | `*` | `neuro*` | Prefix wildcard | | `()` | `(CNN \| RNN) +attention` | Grouping | Use `build_bool_query(phrases, required, excluded, or_terms)` to construct safely. ### Output Functions | Function | Purpose | |----------|---------| | `format_table(papers, max_rows=30)` | Markdown summary table | | `format_details(papers, max_papers=10)` | Detailed entries with TLDR/abstract | | `format_results(papers, query_desc)` | Combined: summary + table + details | | `format_authors(authors, max_rows=20)` | Author table (name, affiliations, h-index) | | `export_bibtex(papers)` | BibTeX entries (requires `citationStyles` field) | | `export_markdown(papers, query_desc)` | Full markdown report saved to file | | `export_json(papers, path)` | JSON export saved to file | | `deduplicate(papers)` | Remove duplicates by paperId | ### Supported ID Formats `DOI:10.1038/...`, `ARXIV:2106.15928`, `PMID:19872477`, `PMCID:PMC2323569`, `CorpusId:215416146`, `ACL:2020.acl-main.447`, `DBLP:conf/acl/...`, `MAG:3015453090`, `URL:https://...` ### Paper Fields Default: `title,year,citationCount,authors,venue,externalIds,tldr` Additional: `abstract`, `references`, `citations`, `openAccessPdf`, `publicationDate`, `publicationVenue`, `fieldsOfStudy`, `s2FieldsOfStudy`, `journal`, `isOpenAccess`, `referenceCount`, `influentialCitationCount`, `citationStyles`, `embedding`, `textAvailability` Author fields: `name`, `affiliations`, `paperCount`, `citationCount`, `hIndex`, `homepage`, `externalIds`, `papers` ### Rate Limiting Handled automatically by `s2.py`: 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries. ### Troubleshooting | Error | Cause | Fix | |-------|-------|-----| | `HTTPError 403` | Missing or invalid API key | Verify `S2_API_KEY` is set: `echo $S2_API_KEY` | | `HTTPError 429` after 5 retries | Sustained rate limit exceeded | Wait 60s, reduce `max_results`, or split into smaller batches | | `ModuleNotFoundError: s2` | Skill directory not on path | Verify skill is installed at `~/.claude/skills/` or `~/.openclaw/skills/` | | `ModuleNotFoundError: requests` | `requests` not installed | `pip install requests` or `uv pip install requests` | | 0 results returned | Query too specific or filters too narrow | Broaden query, remove filters, try `search_relevance()` instead of `search_bulk()` | | `KeyError: 'data'` | Endpoint returned error object | Check `r.get("message")` for API error details | | `tldr` field is empty | Not all papers have TLDR | Fall back to `abstract` field; bulk search never returns `tldr` |

semanticscholar-skill

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

semanticscholar-skill