PredictionPending

Optimally trained models will serve roughly as many inference tokens as they saw in pre-training, implying current frontier models are ~100x over-trained relative to Chinchilla-optimal.

Who: Reiner Pope
Topic: Over-training
How it gets scored: Does a credible analysis confirm a ~150T-token frontier model serves at least 10T inference tokens before deprecation?
Resolves: 2029-05-22
Source: Reiner Pope — The math behind how LLMs are trained and served (01:18:59)