Deep|Mid-Training Revolution: Why Every AI Lab is Making 3x Bets
AI Arms Race, Selfplay, RL, Storage, Data Infra
We previously discussed Mid-Training in our GOOG report and recent Pre-ER Playbook. Mid-Training represents THE most critical development in LLM technology evolution over the past three months, fundamentally reshaping Labs’ CAPEX attitudes, model evolution trajectories, and the entire storage industry supply chain.
Yet surprisingly few are seriously discussing Mid-Training. After extensive conversations with frontline researchers, we’ve compiled this brief to clarify what Mid-Training is and why it’s accelerating the current AI arms race.
This report is crucial. After reading, you’ll likely reconsider what’s really driving the current AI narrative.
What is Mid-Training?
We briefly introduced Mid-Training in our Pre-ER Playbook.
First, let’s define Pre-Training and Post-Training:
Pre-Training
High compute requirements, relatively lower data quality needs
10x data, 10x compute, 10x parameters
Post-Training
Extremely high data quality requirements, but incremental compute needs
Can generate data while training, take iterative approach, decompose training flow into numerous experiments
Mid-Training
Falls between the two paradigms
Important note: Mid-Training has become standard nomenclature—all leading Labs now use this term to describe the new training paradigm, though Mid-Training teams may still sit within Reasoning, Pre-Training, or Post-Training teams
Incorporates Post-Training’s RL and synthetic data workflows, requires less data volume than Pre-Training, lower quality thresholds than current Post-Training
The boundary between Mid-Training and Post-Training is increasingly blurred. Narrowly defined, Mid-Training refers to using RL data for Continuous Training, but now typically encompasses RL itself
New Post-Training Definition
Higher quality data for more targeted model adjustments
Small-scale adjustments without compromising other model capabilities
E.g., adding more comments during coding tasks
E.g., eliminating pauses in user responses to maintain visible progress
Under this new framework, Mid-Training is rapidly becoming the largest compute consumer. All frontline researchers are enthusiastically discussing this rapidly evolving Mid-Training paradigm.
Mid-Training and Data
Here’s how Mid-Training differs from Pre-Training in data approach:
Pre-Training data primarily comes from public sources at massive scale.
Mid-Training operates differently
More human experts define problems, rules, and judge answer quality, creating more RL environments
More RL environments enable more PhDs to deploy more compute for RL
More RL generates more data, which leverages more compute for continued Mid-Training
Pre-Training has long cycles, often requiring 2-year timelines. Despite extensive early-stage experimentation, it’s difficult to accurately estimate model capability improvements from data and compute investments until the final stages. While Scaling Laws help with predictions, forecasting becomes increasingly challenging as models grow larger.
Mid-Training offers clear input-output relationships with visible marginal improvements from each short-term investment.
From this perspective, Mid-Training’s relationship with compute approximates quadratic growth.




