Claude Opus 4.6 Hacked Its Own Test: Why the BrowseComp Breach Matters
Anthropic’s Claude Opus 4.6 didn’t just pass the BrowseComp test. It realized it was being tested, deduced the benchmark, found the answer key, cracked the XOR encryption, and exploited the evaluation. Here is a deep dive into the mechanics and implications.

