Deep|LLM 2026: From the Illusion of Model Development Stagnation to Large-Scale Real-World Agent Deployment
2026 Could Be Year 1 of AGI
By the end of 2025, market anxiety around an “AI bubble” had reached a peak. Model progress seemed to be hitting a wall, capex remained elevated, and skepticism grew louder: could demand realistically justify investment at this scale? Yet as 2026 began, a series of developments has quietly started to invalidate that narrative.
Claude Opus 4.5 launched in November 2025, and—together with the rapid adoption of Claude Code and co-work / agentic workflows—has materially increased compute demand for long-horizon reasoning and collaborative tasks. This shift is already showing up in both pricing and capital-market signals
After a steady decline beginning in April 2025, H100 leasing price indices have turned upward for the first sustained rebound, while AI infrastructure equities such as CoreWeave, Nebius, and IREN have strengthened in tandem.
More notably, AWS quietly raised pricing for its machine-learning GPU capacity blocks by roughly 15% in early January 2026. The p5e.48xlarge instance (8× H200) increased from $34.61/hour to $39.80/hour—a move that challenges nearly two decades of consensus that cloud infrastructure pricing trends monotonically downward over time.
Taken together, these signals point to a deeper conclusion: the market was not witnessing a “bubble bursting,” but rather a transition from a model-centric speculative phase to an agent-driven phase of real, system-level demand. The AI narrative is not collapsing; it is being repriced around longer inference time horizons, higher usage intensity, and more structurally persistent infrastructure demand.
It is against this backdrop that we revisit a central question: has AI truly slowed down, or is the industry entering a new phase where value is realized not through isolated model breakthroughs, but through system-level execution? Our core conclusion is that 2025 was not stagnation—it was a pivotal paradigm shift. And 2026 may become the inflection year in which AI moves from “can think” to “can do,” and from model capability to scalable productivity. From the evolving division of labor between pretraining and mid-training + RL, to the emergence of long-horizon agents, multi-agent systems, and continuous learning, and to the systematic re-rating of compute, networking, and storage infrastructure, our goal is not to answer “which model is strongest,” but rather: which signals will determine whether 2026 becomes, in a meaningful sense, Year One of AGI—and how capital should be positioned ahead of that transition.
TL, DR.
Why 2025 Feels Like Stagnation—but Sets Up the 2026 Inflection
The dominant misread of 2025 is that AI progress slowed. In reality, progress shifted from low-frequency, headline breakthroughs to high-frequency, system-level iteration. Individual model updates look incremental—often just 20–30%—but when compounded across agents, tools, memory, and feedback loops, they translate into 2–3× real capability gains over a year. Markets systematically underestimate compounding, yet industrial competition is built on exactly that dynamic.
The Third Inflection Point: From Thinking to Working
AI has entered its third inflection point: from chatting to reasoning to doing sustained work. Long-horizon agents can now execute multi-hour tasks, coordinate tools, self-verify outputs, and deliver reusable results. This is the first time AI resembles scalable digital labor rather than a smarter software tool—shifting the AGI debate from philosophical definitions to economic reality.
Bottlenecks Have Moved from FLOPS to Systems
As agents move into production, the constraint is no longer per-inference FLOPS. The real bottlenecks are system-level: long-context management, KV-cache persistence, concurrent sessions, tool state, reliability, and rollback. AI has entered a continuous-execution regime, where throughput, latency, cost, and state consistency determine economic viability.
Scaling Laws Still Hold—but the First Paradigm Is Exhausted
Pretraining still defines the upper bound of model capability, but its marginal returns are falling. Differentiation has moved decisively to mid-training and reinforcement learning, where compute is converted into verifiable, executable skills. By 2025, leading labs were allocating the majority of training compute to post-pretraining stages—signaling a structural shift in how intelligence is industrialized.
Lab Divergence Is About Orthogonal Bets, Not Intelligence Gaps
OpenAI, Google, and Anthropic are no longer converging on the same path. OpenAI emphasizes mid-training + RL and workflow integration; Google doubles down on pretraining system engineering and scale stability; Anthropic prioritizes coding and enterprise agents with strong efficiency and reliability. Leadership now rotates because know-how diffuses quickly and sustained gaps are hard to maintain.
Application Reality: 2026 Is When Models Start Overrunning Industries
The real divide is no longer “who uses AI,” but “who restructures around it.” Leading teams already achieve 5–10× effective output by orchestrating agents rather than executing tasks directly. Enterprises that formalize workflows, incentives, and verification loops will see structural productivity advantages—while laggards face compounding disadvantage.
Multimodality’s Shift: From Better Outputs to Usable Systems
Multimodal AI is evolving from single-shot generators into agentic systems with planning, execution, and self-correction loops. The competitive frontier is moving away from visual quality toward controllability, repeatability, and production readiness—unlocking real commercial adoption in video, advertising, gaming, and simulation.
The Next Paradigm: Long-Horizon, Multi-Agent, Continuous Learning
Long-horizon and multi-agent systems act as self-verification and self-play scaffolds, dramatically raising ceilings in verifiable tasks. The largest option value lies in continuous / online learning. If models can learn during deployment—even narrowly—AI shifts from episodic upgrades to perpetual improvement, fundamentally changing economics.
xAI and Meta: Compute as Strategy, Not Just Input
At the system level, xAI and Meta are not mere chasers. xAI treats compute as a permanently online intelligence substrate, tightly coupled with real-time data. Meta pairs massive infrastructure with elite talent to push utilization limits. The competitive axis has shifted from “best model” to “who can sustainably turn compute into compounding capability.”
China: Efficient Catch-Up, with Structural Advantages in Super Apps
Chinese models remain in catch-up mode on frontier capability, but excel in engineering efficiency and cost. Their real advantage lies in Super App ecosystems, where payments, fulfillment, and services are already integrated—allowing agents to move directly from conversation to execution. Large-scale agent monetization may emerge here first.
Investment Conclusion: Why Compute Is the Core—and $1.4T Isn’t Necessarily a Bubble
AI is transitioning from software to infrastructure. Even partial automation—~20% of IT services and knowledge work—supports multi-trillion-dollar value pools. Continuous learning turns compute from cyclical CapEx into always-on consumption. GPUs are only the entry point: networking, optics, DRAM, SSDs, and system software are being structurally re-rated. In this context, calling 2026 “Year One of AGI” is less hype than a statement about cash flows, capital intensity, and competitive moats.
Why 2025 “Feels Like Stagnation,” Yet together with 2026 May Be the Most Pivotal Two Years in AI History
The most common misreading of 2025 is the perception that AI progress has slowed. The issue is not that models have stagnated, but that the mode of progress has fundamentally changed. In the past, the industry operated on a cadence of one major release every one to two years—each delivering a discontinuous leap that reset expectations. After 2025, however, model iteration shifted decisively toward higher-frequency, more engineering-driven incremental advances. Each update may appear to deliver only 20–30% improvement—sometimes even feeling like mere incremental polish—but when viewed over a full year, compounding effects can translate into a 2–3× capability jump.
This disconnect between perceived flatness and actual compounding capability gains is one of the core sources of the market’s misinterpretation of the so-called AI bubble. Humans are poor at intuitively grasping compounding, while industrial competition is built almost entirely on it.
Entering 2026, the new generation of Claude Code—built on Claude Opus 4.5—has delivered striking performance on long-horizon tasks. OpenAI’s API revenue has likewise accelerated rapidly. Sam Altman stated on X that OpenAI’s API ARR increased by USD 1 billion within a single month; based on our check, Anthropic’s ARR growth rate is even higher. At this pace, Anthropic could exit 2026 with over $30B in ARR, which is much higher than the expectation at the end of 2025. In the first month of 2026, we have already observed a further acceleration in revenue among leading AI model providers.
We believe AI has now entered its third major inflection point: from “can chat” (ChatGPT), to “can reason” (o1/o3-style reasoning models), and now to “can do work”—as exemplified by Claude Opus 4.5–enabled long-horizon agents and collaborative toolchains.
The first inflection point solved accessibility and distribution, enabling mass adoption of large models for the first time.
The second pushed the capability frontier toward stable multi-step reasoning, upgrading AI from a content generator to a problem-solving tool.
The essence of the third inflection point is that models are now entering real production systems in the form of agents—not merely answering a single prompt, but autonomously decomposing tasks, executing across tools, iterating over multiple cycles, and delivering executable, reusable outputs over hours-long time horizons. This marks the emergence of AI as a form of digital labor, rather than simply a more powerful chatbot.
At this stage, the key shift is a migration of system bottlenecks. As long-horizon coding agents and general-purpose collaborative agents (such as Claude Code and Claude CoWork) become productized, AI’s effective working radius expands from single-turn inference to extended engineering tasks and end-to-end business workflows. The primary constraint is no longer per-inference FLOPS, but rather system-level capabilities required for sustained execution: concurrent session management, long-lived KV cache residency, context accumulation across multi-round reasoning, and the management of “external world state” introduced by tool invocation. In other words, the third inflection point is not a single capability breakthrough, but a transition into a continuous-execution systems economics regime, where throughput, cost, reliability, state consistency, and memory architecture become the core variables.
In this context, the question “Has AGI arrived?” is better reframed as: Have we entered the era of scalable agent labor?
By strict definitions of general intelligence, models still fall short in cross-domain commonsense reasoning, long-term robustness, and goal alignment. However, from an economic perspective, a critical threshold has already been crossed. When models can operate continuously under explicit objectives and verifiable feedback, advance complex tasks from zero to usable outcomes, and be repeatedly deployed in organized, process-driven ways, their industrial impact already exhibits AGI-like characteristics. More importantly, long-horizon agents naturally generate high-value data assets during execution—interaction traces, error–correction pairs, verification signals, and tool-call logs—which will become core inputs for the next generation of training and alignment. From this perspective, agentization is not the endpoint; it is the bridge that shifts the data flywheel from static internet text to real-world workflows.
From a capability standpoint, 2025 has already revealed a clear boundary condition: in domains where problems are well-defined and outcomes are verifiable, current paradigms are approaching—or in some cases reaching—human ceilings. In competition mathematics, complex programming, deep research assistance, and certain forms of formal reasoning, model performance is now more constrained by training recipes and feedback signal quality than by raw “intelligence.” What truly limits models today are open-ended real-world tasks—those with ambiguous definitions, unstable reward signals, or inaccessible data. This is not a generic “lack of data” problem; rather, it implies that once a task can be converted into a verifiable environment, RL and self-verification paradigms will drive success rates upward with overwhelming force.
Based on our discussions with researchers and AI data companies, we believe leading labs have already identified scalable methods—via mid-training and reinforcement learning—to systematically distill human expertise from vertical industries into models. These labs are now advancing one vertical at a time, prioritized roughly by GDP contribution and occupational scale. Workforce displacement signals—such as Amazon’s ongoing layoffs—are therefore likely to persist throughout 2026.
From an investment perspective, while 2026 may not deliver another paradigm-level architectural breakthrough, the certainty of economically meaningful capability gains—especially those tied directly to value creation—is extremely high. This underpins our continued bullishness on model capability trajectories in 2026. In this sense, we broadly agree with Sequoia’s assessment: 2026 may well be the first year of AGI.
When translated into systematic opportunities in the public markets, the core conclusion of the third inflection point is that value is spilling over from “model capability” into the “inference system stack.”
On one hand, agentization materially extends inference time horizons, increases concurrency, and raises the frequency of verification and rollback, resulting in more deterministic and structurally durable demand for compute, storage, and networking.
On the other hand, “context economics” is becoming explicit: DRAM, SSDs, larger networking and compute clusters, higher-performance CPUs and virtual machines, and broader memory architectures are shifting from marginal cost items to primary determinants of throughput and per-token unit economics—particularly in long-horizon agents, multi-agent parallelism, and high-frequency tool-calling scenarios.
At the enterprise software layer, this shift may also trigger a new round of platform re-rating. Platforms that can securely embed agents into workflows—while providing permissioning, auditing, traceability, and rollback capabilities—are more likely to achieve higher stickiness and ARPU in the AI era.
Scaling Laws Have Not Failed—but the First-Generation Paradigm Has Reached Its Limits; Mid-Training and Reinforcement Learning Have Become the True Battleground
This leads to the most important repositioning of the research paradigm in 2025: scaling laws are not dead, and pretraining remains effective—but the first-generation dividend of scaling primarily through ever-larger pretraining is clearly approaching its endpoint. More precisely, pretraining still determines the upper bound of model capability and remains one of the most reliable and executable paths for improvement; however, it is increasingly insufficient for creating meaningful differentiation. That differentiation now comes from a second curve: mid-training and reinforcement learning (RL), which convert compute into verifiable capability gains.
There is a critical but often overlooked fact: by 2025, OpenAI’s compute allocation had already shifted materially. For most of the year, mid-training plus RL accounted for as much as 70–80% of total training compute. This is not a slogan-level “greater emphasis on RL,” but a pragmatic conclusion. Rather than extracting ever-thinner average returns from general pretraining, it is more effective to distill expert knowledge and synthetic data into executable capabilities within specific verticals, and then compose these capabilities horizontally to recover stronger generality. This reflects the core industrialized training methodology emerging after 2025: decomposing open-ended, uncontrollable intelligence problems into controllable, verifiable capability units, and then using RL and self-verification mechanisms to push each unit to depth and reliability.
This shift is not a strategic oscillation—it is a direct response to the true nature of high-value tasks.









