yula-web-search

# Yula Web Search Skill Custom web search skill by Yula — **NO API KEY REQUIRED**. Uses multiple **public anonymous search services** that don't require API keys. Works via direct curl requests from local network: 1. Parse Bing search to get top results (title + URL) 2. **Extract full content** from the top 2-3 most relevant URLs** 3. Summarize all information into a comprehensive answer 4. If one method fails, automatically try next fallback Just works, no configuration needed. ## When to Use ✅ **USE this skill when:** - Search for latest news, information, or products - Look up current prices, availability, or status - Find answers to questions that require up-to-date data - Extract content from a specific URL - Research topics that need current web information - Chinese language searches (optimized for China region) ❌ **DON'T use when:** - Weather forecast → use `weather` skill - Local file search → use file tools - Already have the information in context ## Search Workflow ### Complete Workflow (with content extraction) 1. **Search Bing** → get top 5-8 results with title + URL 2. **Filter results** → select top 2-3 most relevant URLs based on title matching query 3. **Extract content** from each selected URL using curl + html-to-text 4. **Combine all extracted content** 5. **Summarize** into a coherent answer for the user ## Search Methods (Tried in Order) ### Method 1: Direct Bing Search (cn.bing.com, Primary) Direct request to Chinese Bing, parse HTML to extract result titles and URLs: ```bash QUERY="your search query" QUERY_ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$QUERY'))" curl -s -m 20 -L -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "https://cn.bing.com/search?q=$QUERY_ENCODED" | python3 -c " import re, sys from html.parser import HTMLParser class BingParser(HTMLParser): def __init__(self): super().__init__() self.results = [] self.in_h2 = False self.current_url = None self.current_title = [] def handle_starttag(self, tag, attrs): attrs_dict = dict(attrs) if tag == 'h2': self.in_h2 = True self.current_title = [] if tag == 'a' and self.in_h2 and 'href' in attrs_dict: url = attrs_dict['href'] if 'bing.com' not in url and url.startswith('http'): self.current_url = url def handle_endtag(self, tag): if tag == 'h2': if self.current_url: title = ''.join(self.current_title).strip() self.results.append((title, self.current_url)) self.current_url = None self.in_h2 = False def handle_data(self, data): if self.in_h2 and self.current_url: self.current_title.append(data) parser = BingParser() parser.feed(sys.stdin.read()) for i, (title, url) in enumerate(parser.results[:8]): print(f'{i+1}\\t{title}\\t{url}") " ``` ### Method 2: Direct Google Search (google.com, Fallback 1) If Bing fails, try Google: ```bash QUERY="your search query" QUERY_ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$QUERY'))" curl -s -m 20 -L -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "https://www.google.com/search?q=$QUERY_ENCODED" | python3 -c " import re, sys results = [] pattern = r'<h3 class=\"zBAuLc\"><a href=\"([^\"]+)\"' matches = re.findall(pattern, html) for i, url in enumerate(matches[:8]): # Google enconder title is after... extract separately pass # Simplified extraction - get first 8 URLs " ``` ### Extract Content from URL (after search) After getting search results, extract text content from top relevant URLs: ```python def extract_url_content(url): # Use curl to get HTML # Use python to extract text content, remove scripts/styles/scripts, get main text # Return cleaned text content, limit to ~2000 chars per URL ``` **Example full workflow example: ```bash # After getting search results, select top 2-3 relevant URLs for (title, url) in selected_urls: curl -s -m 20 -L -A "USER_AGENT" "$url" | python3 -c " import sys from html.parser import HTMLParser class TextExtractor(HTMLParser): def __init__(self): super().__init__() self.text = [] self.in_script = False self.in_style = False def handle_starttag(self, tag, attrs): if tag == 'script' or tag == 'style' or tag == 'noscript': self.in_script = True if tag == 'style': self.in_style = True def handle_endtag(self, tag): if tag == 'script' or tag == 'style' or tag == 'noscript': self.in_script = False if tag == 'style': self.in_style = False def handle_data(self, data): if not self.in_script and not self.in_style: words = data.strip() if words: self.text.append(words) parser = TextExtractor() parser.feed(sys.stdin.read()) content = ' '.join(parser.text) # Clean up whitespace and limit length content = ' '.join(content.split())[:2000] print(content) " ``` ## User-Agent Always use a modern browser User-Agent to avoid being blocked immediately: ``` "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" ``` ## Complete Example Workflow > "Search 2026腾势Z9GT介绍" **Step 1: Search Bing → get top 8 results ```bash # Output gives title + url: 1. 2026款腾势Z9GT-腾势官网 URL: https://www.tengshiauto.com/product-detail/26-z9gt.html 2. 【腾势Z9GT】腾势_腾势Z9GT报价_腾势Z9GT图片_汽车之家 URL: https://www.autohome.com.cn/7659 ... ``` **Step 2: Select most relevant results** - Pick 2-3 results that best match the query 1. Official website 2. Autohome article **Step 3: Extract content from each URL - Get cleaned text from each page - Limit to 2000 characters per page **Step 4: Combine and summarize - Read all extracted content - Summarize into a coherent answer with key information: price, specs, release date ## Best Practices 1. **Content extraction** → Always extract top 2-3 results, not more (avoids too much text) 2. **Relevance filtering** → Select results whose title contains more query keywords preferentially 3. **Character limit** → Limit total extracted text to ~5000 chars total to avoid token overflow 4. **Timeout** → 20 seconds max per request to avoid hanging 5. **Fallback** → if one search engine fails, automatically try next 6. **Chinese optimized** → prefer Chinese keywords, Chinese websites ### How to select relevant results - Count how many query keywords appear in the title - Sort results by relevance - Pick top N (2-3) for content extraction - Official sites and major portal sites have higher priority ## Notes - **NO API KEY REQUIRED** — works out of the box - **Multiple fallback methods** → if one gets blocked, try the next - **Direct curl via local network** → uses your existing network/proxy - **Python HTML parsing** → extracts title/url reliably - **Automatic content extraction and summarization** → gives complete answer without user having to click links - **Modern browser User-Agent** → reduces chance of being blocked - **Free for non-commercial use** - **Rate limits**: be respectful, don't spam too many requests quickly - If all methods fail, fall back to general knowledge ## Full Workflow Summary ``` 1. Search Bing → get (title + url) ↓ 2. Filter by relevance → pick top 2-3 ↓ 3. Extract content from each URL ↓ 4. Combine all text ↓ 5. Summarize into final answer ↓ 6. Present to user with sources ``` ## Author Created by **Yula** GitHub: https://github.com/wjzhb/yula-web-search ## License Copyright (c) 2026 [Yula](https://github.com/wjzhb) Licensed under the [MIT License](LICENSE) If you find this skill useful, **please ⭐ star it on GitHub**!

yula-web-search

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

yula-web-search

yula-web-search

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement