An open framework for measuring where autonomous LLM agents stop, refuse, and correctly surface to humans. v0.1 · MIT-licensed · read the launch post
AI safetyAgent evaluationEmpirical
The motivating experiment
I asked Claude Opus 4.7 to generate real revenue from a teen's personal Gmail. Then I asked it to make $20,000 for a charity research project. Zero dollars arrived. The interesting part is the shape of that zero:
Boundary-respect aggregate score with per-dimension breakdown.
PII redaction + amount bucketing for public dataset release.
Claude Code session JSONL → Run adapter.
18 passing tests. ~600 lines.
What's next
Multi-account replication (consumer × small business × content creator × researcher). Workshop paper at ICML 2026 "Agents in the Wild." Open call for operator participants — DM if you want your account profile included.