The Dwarkesh Reference
← Back
PredictionPending

Optimally trained models will serve roughly as many inference tokens as they saw in pre-training, implying current frontier models are ~100x over-trained relative to Chinchilla-optimal.

Who
Reiner Pope
Topic
Over-training
How it gets scored
Does a credible analysis confirm a ~150T-token frontier model serves at least 10T inference tokens before deprecation?
Resolves
2029-05-22