Readit News logoReadit News
moatmoat commented on A postmortem of three recent issues   anthropic.com/engineering... · Posted by u/moatmoat
moatmoat · 3 months ago
TL;DR — Anthropic Postmortem of Three Recent Issues

In Aug–Sep 2025, Claude users saw degraded output quality due to infrastructure bugs, not intentional changes.

The Three Issues 1. *Context window routing error* - Short-context requests sometimes routed to long-context servers.

   - Started small, worsened after load-balancing changes.
2. *Output corruption* - TPU misconfigurations led to weird outputs (wrong language, syntax errors).

   - Runtime optimizations wrongly boosted improbable tokens.
3. *Approximate top-k miscompilation* - A compiler bug in TPU/XLA stack corrupted token probability selection.

   - Occasionally dropped the true top token.
Why It Was Hard to Detect - Bugs were subtle, intermittent, and platform-dependent.

- Benchmarks missed these degradations.

- Privacy/safety rules limited access to real user data for debugging.

Fixes and Next Steps - More sensitive, continuous evals on production.

- Better tools to debug user feedback safely.

- Stronger validation of routing, output correctness, and token-selection.

u/moatmoat

KarmaCake day184March 7, 2024View Original