The Dwarkesh Reference
← Back
Mental model

Unlike weights (distributable by pipelining) and compute (amortizable by batching), KV-cache memory resists both, making it the binding constraint on context length.

Who
Reiner Pope
Topic
KV cache wall