Blog

Notes and longer posts. Subscribe via RSS.

May 18, 2026

Sonnet 4.6 silently drops a format rule on 80% of long agent runs

Sonnet 4.6 silently dropped a one-line format rule on 80% of long agent runs. The fix turned out to be one extra sentence in the initial user message. Tested across Opus 4.7, Sonnet 4.6, Haiku 4.5, GPT-4.1, and GPT-4o — 270 trajectories, ~$40 of API spend, four perfect end-to-end.

16 min read

April 16, 2026

Compressing Prompts with an Autoresearch Loop

I tested 5 production system prompts to find out what's safe to cut, what's load-bearing, and why the popular "compress by 75%" claims fall apart under scrutiny.

11 min read