The Dwarkesh Reference
← Back
Mental model

Almost all LLM inference economics, latency floors, cost minimums, batch optima, context-length pricing, follow from two hardware numbers (memory bandwidth, FLOPs) and two model numbers (total params, KV bytes per token).

Who
Reiner Pope
Topic
Roofline analysis