The Dwarkesh Reference

Prediction Tracker

Every forward-looking, falsifiable claim made on the show, with a resolution date. As each date passes, predictions get scored. This is the thing a chat-first archive structurally cannot do.

PendingJensen Huang · Nvidia roadmap · resolves 2028-12-31

Vera Rubin ships this year, Vera Rubin Ultra next year, Feynman the year after, a new architecture every single year.

Scored on: Are Vera Rubin / Vera Rubin Ultra / Feynman generally available in 2026 / 2027 / 2028?

Permalink →Jensen Huang, 00:41:06
PendingJensen Huang · Supply chain · resolves 2029-04-15

No supply-chain bottleneck (CoWoS, logic, memory, EUV) lasts longer than two or three years.

Scored on: Is any single component still the binding constraint on Nvidia output in 2029?

Permalink →Jensen Huang, 00:00:00
PendingJensen Huang · Cost curve · resolves annually

Token cost decreases by roughly an order of magnitude every single year.

Scored on: Does Nvidia's cost per token fall ~10x year over year?

Permalink →Jensen Huang, 00:41:06
PendingJensen Huang · Competition · resolves 2027-04-15

TPU and Trainium growth is '100% Anthropic', a unique instance, not a trend.

Scored on: Does material non-Anthropic custom-ASIC demand emerge by 2027?

Permalink →Jensen Huang, 00:16:25
PendingJensen Huang · Manufacturing · resolves 2028-04-15

Logic and EUV capacity can be scaled 2x/year, easy within two or three years once there is a demand signal.

Scored on: Does TSMC roughly double AI-logic output year over year through 2028?

Permalink →Jensen Huang, 00:00:00
UnscoreableJensen Huang · AI security

The agentic-security future, one capable AI agent surrounded by thousands of agents keeping it safe, surely is going to happen.

Scored on: No crisp criterion or date, flagged as low-falsifiability.

Permalink →Jensen Huang, 00:57:36
PendingReiner Pope · Sparse attention · resolves 2029-05-22

Sparse attention will become a more widely adopted architecture at frontier labs, with DeepSeek's published mechanism pointing the direction.

Scored on: Do at least two top-5 frontier providers ship a production model with sparse attention as the primary mechanism by 2029?

Permalink →Reiner Pope, 00:00:00
PendingReiner Pope · Scale-up domain · resolves 2027-12-31

Nvidia's Rubin generation will ship with a scale-up domain of ~500+ GPUs, roughly 4x Blackwell's 72, unlocking larger MoE models in one interconnect domain.

Scored on: Do Rubin NVL racks ship with a scale-up domain of at least 400 GPUs by end of 2027?

Permalink →Reiner Pope, 00:32:09
PendingReiner Pope · Context length · resolves 2028-05-22

Frontier context lengths will stay roughly in the 100-200K range because memory bandwidth is the hard wall and HBM is not improving fast enough.

Scored on: Does no top-5 frontier model offer a >500K-token window at standard pricing by 2028?

Permalink →Reiner Pope, 01:33:02
PendingReiner Pope · Over-training · resolves 2029-05-22

Optimally trained models will serve roughly as many inference tokens as they saw in pre-training, implying current frontier models are ~100x over-trained relative to Chinchilla-optimal.

Scored on: Does a credible analysis confirm a ~150T-token frontier model serves at least 10T inference tokens before deprecation?

Permalink →Reiner Pope, 01:18:59
PendingReiner Pope · KV cache storage · resolves 2028-05-22

The one-hour KV-cache pricing tier on frontier APIs likely corresponds to spinning disk; the drain-time math points to it.

Scored on: Does a frontier lab confirm or credibly leak that long-duration KV-cache persistence uses spinning disk by 2028?

Permalink →Reiner Pope, 01:33:02
PendingEric Jang · Forward search in LLMs · resolves 2027-12-31

Forward search and simulation to estimate value will make a comeback in LLMs, even if not in AlphaGo's exact MCTS form.

Scored on: Does a widely adopted LLM paradigm incorporate explicit multi-step forward tree search (beyond chain-of-thought) and get credited as a breakthrough by end of 2027?

Permalink →Eric Jang, 01:45:47
PendingEric Jang · Automated research · resolves 2027-12-31

Today's models can't reliably pick the next experiment or do lateral 'return to first principles' thinking; successor models may close this gap.

Scored on: Does a benchmarked agent autonomously pivot away from a dead-end research track without human prompting, confirmed in a peer-reviewed eval by end of 2027?

Permalink →Eric Jang, 02:22:16
PendingEric Jang · Games as training loop · resolves 2028-12-31

A verifiable game like Go could be the outer-loop environment for training automated AI researchers, with skills transferring to harder domains.

Scored on: Does a published agent improve a measurable AI metric through self-directed experiment selection using a game as the verification loop by end of 2028?

Permalink →Eric Jang, 02:22:16
PendingEric Jang · Algorithmic multipliers · resolves 2027-12-31

Many of KataGo's algorithmic compute multipliers will become irrelevant as GPUs improve; any given multiplier's benefit is transitory.

Scored on: Does a replication show at least three of KataGo's tricks are redundant on Blackwell-class hardware?

Permalink →Eric Jang, 02:22:16
PendingDwarkesh Patel · RL efficiency · resolves 2027-12-31

As RL tasks get longer-horizon, samples-per-FLOP fall, making naive policy-gradient RL increasingly inefficient, a structural problem for agentic training.

Scored on: Do studies confirm declining bits-per-FLOP for policy-gradient RL as task horizon grows, by end of 2027?

Permalink →Eric Jang, 02:12:02
PendingMichael Nielsen · Deep principles · resolves 2075-01-01

We'll keep finding very fundamental new principles, analogous to Church-Turing or Noether's theorem, rather than exhausting the supply of deep ideas.

Scored on: If no principle of comparable depth is articulated within ~50 years, that weighs against it.

Permalink →Michael Nielsen, 01:15:26
PendingMichael Nielsen · Quantum computing · resolves 2060-01-01

Quantum computers may handle a strictly larger class of interesting computations, and a quantum AGI would be qualitatively different from a classical one.

Scored on: A proof that BQP = BPP would falsify the first part; broad practical quantum advantage would support it.

Permalink →Michael Nielsen, 01:15:26
PendingMichael Nielsen · AI for science · resolves 2035-01-01

AI will help with data-intensive, well-specified problems (like protein structure) but won't automatically resolve the deeper bottlenecks needing long hostile verification loops or paradigm shifts.

Scored on: If AI produces multiple Nobel-class genuine paradigm shifts (not pattern-matching) within 10 years, the claim weakens.

Permalink →Michael Nielsen, 00:29:52
UnscoreableMichael Nielsen · Tech tree

The science-and-technology tree is so large that different civilizations explore radically different branches, implying large gains from trade between them indefinitely.

Scored on: Requires observing multiple advanced civilizations; not empirically testable now.

Permalink →Michael Nielsen, 00:50:54
PendingDavid Reich · Selection signals · resolves annually

As ancient-DNA samples grow beyond the current ~16,000 individuals, many more positions under selection will be detected; today's findings are only what's visible at this scale.

Scored on: Do larger ancient-DNA datasets reveal substantially more selected positions in follow-up studies?

Permalink →David Reich, 00:00:00
PendingDavid Reich · Neanderthal model · resolves 2031-05-08

The model that Neanderthals are genetically-swamped modern humans sharing a ~300,000-year-old Middle Stone Age origin will prove more parsimonious than the current sister-lineage consensus.

Scored on: Does integrating modern-human substructure with archaic ancient-DNA data align in timing and displace the sister-lineage model?

Permalink →David Reich, 01:17:13
PendingDavid Reich · Regional comparison · resolves annually

Applying the same methodology beyond Europe and the Middle East will reveal comparable or stronger selection signals, making cross-region comparison a major near-term frontier.

Scored on: Do studies in other world regions surface selection signals of similar strength?

Permalink →David Reich, 01:54:10