survival-analysis-km
# Survival Analysis (Kaplan-Meier)
Kaplan-Meier survival analysis tool for clinical and biological research. Generates publication-ready survival curves with statistical tests.
## Features
- **Kaplan-Meier Curve Generation**: Publication-quality survival plots with confidence intervals
- **Statistical Tests**: Log-rank test, Wilcoxon test, Peto-Peto test
- **Hazard Ratios**: Cox proportional hazards regression with 95% CI
- **Summary Statistics**: Median survival time, restricted mean survival time (RMST)
- **Multi-group Analysis**: Supports 2+ comparison groups
- **Risk Tables**: Optional at-risk table below curves
## Usage
### Python Script
```bash
python scripts/main.py --input data.csv --time time_col --event event_col --group group_col --output results/
```
### Arguments
| Argument | Description | Required |
|----------|-------------|----------|
| `--input` | Input CSV file path | Yes |
| `--time` | Column name for survival time | Yes |
| `--event` | Column name for event indicator (1=event, 0=censored) | Yes |
| `--group` | Column name for grouping variable | Optional |
| `--output` | Output directory for results | Yes |
| `--conf-level` | Confidence level (default: 0.95) | Optional |
| `--risk-table` | Include risk table in plot | Optional |
### Input Format
CSV with columns:
- **Time column**: Numeric, time to event or censoring
- **Event column**: Binary (1 = event occurred, 0 = censored/right-censored)
- **Group column**: Categorical variable for stratification
Example:
```csv
patient_id,time_months,death,treatment_group
P001,24.5,1,Drug_A
P002,36.2,0,Drug_A
P003,18.7,1,Placebo
```
### Output Files
- `km_curve.png`: Kaplan-Meier survival curve
- `km_curve.pdf`: Vector version for publications
- `survival_stats.csv`: Statistical summary (median survival, confidence intervals)
- `hazard_ratios.csv`: Cox regression results with HR and 95% CI
- `logrank_test.csv**: Pairwise comparison p-values
- `report.txt**: Human-readable summary report
## Technical Details
### Statistical Methods
1. **Kaplan-Meier Estimator**: Non-parametric maximum likelihood estimate of survival function
- Product-limit estimator: Ŝ(t) = Π(tᵢ≤t) (1 - dᵢ/nᵢ)
- Greenwood's formula for variance estimation
2. **Log-Rank Test**: Most widely used test for comparing survival curves
- Null hypothesis: No difference between groups
- Weighted by number at risk at each event time
3. **Cox Proportional Hazards**: Semi-parametric regression model
- h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ...)
- Proportional hazards assumption checked via Schoenfeld residuals
### Dependencies
- `lifelines`: Core survival analysis library
- `matplotlib`, `seaborn`: Visualization
- `pandas`, `numpy`: Data handling
- `scipy`: Statistical tests
### Technical Difficulty: High ⚠️
This skill involves advanced statistical modeling. Results should be reviewed by a biostatistician, especially for:
- Proportional hazards assumption violations
- Small sample sizes (< 30 per group)
- Heavy censoring (> 50%)
- Time-varying covariates
## References
See `references/` folder for:
- Kaplan EL, Meier P (1958) original paper
- Cox DR (1972) regression models paper
- Sample datasets for testing
- Clinical reporting guidelines (ATN, CONSORT)
## Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `--input` | str | Required | Input CSV file path |
| `--time` | str | Required | Column name for survival time |
| `--event` | str | Required | |
| `--group` | str | Required | |
| `--output` | str | Required | Output directory for results |
| `--conf-level` | float | 0.95 | |
| `--risk-table` | str | Required | Include risk table in plot |
| `--figsize` | str | '10 | |
| `--dpi` | int | 300 | |
## Example
```bash
# Basic survival curve
python scripts/main.py \
--input clinical_data.csv \
--time overall_survival_months \
--event death \
--group treatment_arm \
--output ./results/ \
--risk-table
```
Output includes:
- Survival curves with 95% confidence bands
- Median survival: Drug A = 28.4 months (95% CI: 24.1-32.7), Placebo = 18.2 months (95% CI: 15.3-21.1)
- Log-rank test p-value: 0.0023
- Hazard ratio: 0.62 (95% CI: 0.45-0.85), p = 0.003
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```bash
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
标签
skill
ai