ClickHouse

# ClickHouse 🏠 Real-time analytics on billions of rows. Sub-second queries. No indexes needed. ## Setup On first use, read `setup.md` for connection configuration. ## When to Use User needs OLAP analytics, log analysis, time-series data, or real-time dashboards. Agent handles schema design, query optimization, data ingestion, and cluster administration. ## Architecture Memory lives in `~/clickhouse/`. See `memory-template.md` for structure. ``` ~/clickhouse/ ├── memory.md # Connection profiles + query patterns ├── schemas/ # Table definitions per database └── queries/ # Saved analytical queries ``` ## Quick Reference | Topic | File | |-------|------| | Setup & connection | `setup.md` | | Memory template | `memory-template.md` | | Query patterns | `queries.md` | | Performance tuning | `performance.md` | | Data ingestion | `ingestion.md` | ## Core Rules ### 1. Always Specify Engine Every table needs an explicit engine. Default to MergeTree family: ```sql -- Time-series / logs CREATE TABLE events ( timestamp DateTime, event_type String, data String ) ENGINE = MergeTree() ORDER BY (timestamp, event_type); -- Aggregated metrics CREATE TABLE daily_stats ( date Date, metric String, value AggregateFunction(sum, UInt64) ) ENGINE = AggregatingMergeTree() ORDER BY (date, metric); ``` ### 2. ORDER BY is Your Index ClickHouse has no traditional indexes. The `ORDER BY` clause determines data layout: - Put high-cardinality filter columns first - Put range columns (dates, timestamps) early - Match your most common WHERE patterns ```sql -- Good: filters by user_id, then date range ORDER BY (user_id, date, event_type) -- Bad: date first when you filter by user_id ORDER BY (date, user_id, event_type) ``` ### 3. Use Appropriate Data Types | Use Case | Type | Why | |----------|------|-----| | Timestamps | `DateTime` or `DateTime64` | Native time functions | | Low-cardinality strings | `LowCardinality(String)` | 10x compression | | Enums with few values | `Enum8` or `Enum16` | Smallest footprint | | Nullable only if needed | `Nullable(T)` | Adds overhead | | IPs | `IPv4` or `IPv6` | 4 bytes vs 16+ | ### 4. Batch Inserts Never insert row-by-row. ClickHouse is optimized for batch writes: ```bash # Good: batch insert clickhouse-client --query="INSERT INTO events FORMAT JSONEachRow" < batch.json # Bad: individual inserts in a loop for row in data: INSERT INTO events VALUES (...) ``` Minimum batch: 1,000 rows. Optimal: 10,000-100,000 rows. ### 5. Prewarm Queries with FINAL Queries on ReplacingMergeTree/CollapsingMergeTree need `FINAL` for accuracy: ```sql -- May return duplicates/old versions SELECT * FROM users WHERE id = 123; -- Guaranteed latest version SELECT * FROM users FINAL WHERE id = 123; ``` `FINAL` has performance cost. For dashboards, consider materialized views. ### 6. Materialized Views for Speed Pre-aggregate expensive computations: ```sql CREATE MATERIALIZED VIEW hourly_events ENGINE = SummingMergeTree() ORDER BY (hour, event_type) AS SELECT toStartOfHour(timestamp) AS hour, event_type, count() AS events FROM events GROUP BY hour, event_type; ``` ### 7. Check System Tables First Before debugging, check system tables: ```sql -- Running queries SELECT * FROM system.processes; -- Recent query performance SELECT query, elapsed, read_rows, memory_usage FROM system.query_log WHERE type = 'QueryFinish' ORDER BY event_time DESC LIMIT 10; -- Table sizes SELECT database, table, formatReadableSize(total_bytes) as size FROM system.tables ORDER BY total_bytes DESC; ``` ## Common Traps - **String instead of LowCardinality** → 10x larger storage for status/type columns - **Wrong ORDER BY** → Full table scans instead of index lookups - **Row-by-row inserts** → Massive part fragmentation, slow writes - **Missing TTL** → Unbounded table growth, disk full - **SELECT *** → Reads all columns, kills columnar advantage - **Nullable everywhere** → Overhead + NULL handling complexity - **Forgetting FINAL** → Stale/duplicate data in merge tables ## Performance Checklist Before running expensive queries: 1. **Check EXPLAIN**: `EXPLAIN SELECT ...` shows execution plan 2. **Sample first**: `SELECT ... FROM table SAMPLE 0.01` for 1% sample 3. **Limit columns**: Only SELECT what you need 4. **Use PREWHERE**: Filters before reading all columns 5. **Check parts**: `SELECT count() FROM system.parts WHERE table='X'` ```sql -- PREWHERE optimization SELECT user_id, event_type, data FROM events PREWHERE date = today() WHERE event_type = 'click'; ``` ## Cluster Administration ### Adding TTL for Data Retention ```sql -- Delete old data ALTER TABLE events MODIFY TTL timestamp + INTERVAL 90 DAY; -- Move to cold storage ALTER TABLE events MODIFY TTL timestamp + INTERVAL 30 DAY TO VOLUME 'cold'; ``` ### Monitoring Disk Usage ```sql SELECT database, table, formatReadableSize(sum(bytes_on_disk)) as disk_size, sum(rows) as total_rows, count() as parts FROM system.parts WHERE active GROUP BY database, table ORDER BY sum(bytes_on_disk) DESC; ``` ## External Endpoints | Endpoint | Data Sent | Purpose | |----------|-----------|---------| | localhost:8123 | SQL queries | HTTP interface | | localhost:9000 | SQL queries | Native TCP interface | No external services contacted. All queries run against user-specified ClickHouse instances. ## Security & Privacy **Data saved locally (with user consent):** - Connection profiles (host, port, database) in ~/clickhouse/memory.md - Query patterns and schema documentation - Authentication method preferences (password vs certificate) **Important:** If you provide database passwords, they are stored in plain text in ~/clickhouse/. Consider using environment variables or connection profiles managed by clickhouse-client instead. **This skill does NOT:** - Connect to any ClickHouse without explicit user configuration - Send data to external services - Automatically collect or store credentials without asking ## Related Skills Install with `clawhub install <slug>` if user confirms: - `sql` — SQL query patterns - `analytics` — data analysis workflows - `data-analysis` — structured data exploration ## Feedback - If useful: `clawhub star clickhouse` - Stay updated: `clawhub sync`

ClickHouse

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

ClickHouse

ClickHouse

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement