ab-test-eval
Run A/B evaluation tests for any OpenClaw skill, script, hook, or cron job. Make sure to use this skill whenever the user mentions testing, benchmarking, comparing, or evaluating a skill, script, hook, or cron job — even if they don't explicitly ask for 'AB testing'. Supports 10 eval modes: baseline comparison, regression testing, model-swap, prompt variants, trigger accuracy, adversarial robustness, script validation, hook dry-run, cron dry-run, and integration testing.