Mental model
Almost all LLM inference economics, latency floors, cost minimums, batch optima, context-length pricing, follow from two hardware numbers (memory bandwidth, FLOPs) and two model numbers (total params, KV bytes per token).
- Who
- Reiner Pope
- Topic
- Roofline analysis