Eight models. Three recipes. Two ways of cooking — one alone, one with Chad. Click any row for the full breakdown: what was asked, what got built, and which steps had to be retried.
Each row is a single run: one model, one recipe, one harness. Bare means we handed the model the recipe and accepted whatever it returned. With Chad means the same model, same recipe, but Chad ran the loop — discovery, plan, build, verify, retry.
Click a row to expand it. You'll get the spec it was given, every step Chad asked it to do, how many tries each step took, and where things were skipped, surfaced, or punted. Nothing on this page is rounded. Nothing is hidden.
| Model | Recipe | Harness | Pass ↓ | Wall time |
|---|
SORTED BY PASS RATE · CLICK A ROW TO EXPAND
Don't take our word for it. Each pair below is a real run. Download both projects, open them yourself, and compare what shipped.