The Dwarkesh Reference
← Back
Claim≈ Approx

Using ~100B active params and ~150T pre-training tokens, frontier models look ~100x over-trained versus Chinchilla-optimal.

Who
Reiner Pope
Topic
Over-training math
Verification note
The 150T figure was an uncited rumor offered by the host; the Chinchilla math is standard. Conclusion is internally consistent but depends on unverified inputs.