Blog
Notes and longer posts. Subscribe via RSS.
Sonnet 4.6 silently drops a format rule on 80% of long agent runs
Sonnet 4.6 silently dropped a one-line format rule on 80% of long agent runs. The fix turned out to be one extra sentence in the initial user message. Tested across Opus 4.7, Sonnet 4.6, Haiku 4.5, GPT-4.1, and GPT-4o — 270 trajectories, ~$40 of API spend, four perfect end-to-end.
16 min read
Compressing Prompts with an Autoresearch Loop
I tested 5 production system prompts to find out what's safe to cut, what's load-bearing, and why the popular "compress by 75%" claims fall apart under scrutiny.
11 min read