The Dwarkesh Reference

Mental model

Almost all LLM inference economics, latency floors, cost minimums, batch optima, context-length pricing, follow from two hardware numbers (memory bandwidth, FLOPs) and two model numbers (total params, KV bytes per token).

Who: Reiner Pope
Topic: Roofline analysis
Source: Reiner Pope — The math behind how LLMs are trained and served (00:00:00)