← BackMental modelUnlike weights (distributable by pipelining) and compute (amortizable by batching), KV-cache memory resists both, making it the binding constraint on context length.WhoReiner PopeTopicKV cache wallSourceReiner Pope — The math behind how LLMs are trained and served (01:03:37)