split
# Split — Data Splitting Reference
Quick-reference skill for data splitting techniques, partitioning strategies, and practical patterns.
## When to Use
- Splitting strings by delimiters, patterns, or fixed widths
- Partitioning datasets for ML training/validation/test
- Dividing large files into manageable chunks
- Database sharding and horizontal partitioning
- Understanding split strategies for distributed systems
## Commands
### `intro`
```bash
scripts/script.sh intro
```
Overview of data splitting — concepts, common use cases, and terminology.
### `string`
```bash
scripts/script.sh string
```
String splitting techniques — delimiters, regex, fixed-width, tokenization.
### `file`
```bash
scripts/script.sh file
```
File splitting methods — by size, lines, patterns, and round-robin.
### `dataset`
```bash
scripts/script.sh dataset
```
ML dataset splitting — train/val/test, stratified, time-series, k-fold.
### `database`
```bash
scripts/script.sh database
```
Database partitioning — horizontal, vertical, hash, range, and list.
### `strategies`
```bash
scripts/script.sh strategies
```
Splitting strategies for distributed systems — consistent hashing, sharding keys.
### `examples`
```bash
scripts/script.sh examples
```
Practical split examples across languages and tools.
### `pitfalls`
```bash
scripts/script.sh pitfalls
```
Common pitfalls and best practices when splitting data.
### `help`
```bash
scripts/script.sh help
```
### `version`
```bash
scripts/script.sh version
```
## Configuration
| Variable | Description |
|----------|-------------|
| `SPLIT_DIR` | Data directory (default: ~/.split/) |
---
*Powered by BytesAgain | bytesagain.com | hello@bytesagain.com*
标签
skill
ai