The artificial intelligence landscape underwent seismic changes throughout 2025, with transformations so fundamental that they reshaped how we think about machine learning, software development, and human-computer interaction. Andrej Karpathy, a prominent AI researcher and technologist, identified six major evolutionary shifts that have fundamentally altered the field. These aren’t incremental improvements—they represent breakthrough moments that challenged existing assumptions and opened entirely new possibilities.
The Emergence of Verifiable Reward Learning: Beyond Human Feedback
For years, the production training stack for large language models followed a predictable three-stage process: pre-training (like GPT-2 and GPT-3 from 2020), supervised fine-tuning (InstructGPT in 2022), and reinforcement learning from human feedback (RLHF, also 2022). This approach proved stable and mature, dominating the industry’s approach to building production-grade LLMs.
By 2025, a fundamental shift occurred. Reinforcement learning based on verifiable rewards (RLVR) became the core technology embraced by leading AI laboratories. The distinction is crucial: instead of relying on human judgment to score model outputs, RLVR leverages automatically verifiable environments—mathematical problem-solving, programming challenges, and similar domains where correctness can be objectively determined.
Models trained this way spontaneously develop what humans would recognize as “reasoning strategies.” They learn to break complex problems into intermediate computational steps and discover multiple solution pathways through iterative refinement. OpenAI’s o1 model (released in late 2024) offered the first glimpse of this capability, while the subsequent launch of o3 (early 2025) demonstrated the dramatic potential of this approach. The DeepSeek-R1 paper provided additional evidence of how these verifiable environments enable models to construct explicit reasoning chains.
What makes RLVR different from previous approaches is the computational intensity required. Unlike supervised fine-tuning and RLHF—which involve relatively brief, computationally modest phases—verifiable reward training demands extended optimization cycles against objective, deterministic reward functions. This means the computational resources originally allocated to pre-training are being redirected toward this new training paradigm. The key innovation: model capability can now be adjusted as a function of test-time computational cost by generating longer inference chains and providing more “thinking time.” This represents an entirely new dimension of scaling behavior.
Understanding AI Intelligence: Ghostly Entities Rather Than Digital Creatures
In 2025, the industry gained fresh perspective on how artificial intelligence actually works. Andrej Karpathy articulated an insight that resonated throughout the field: we are not “breeding digital animals” but rather “summoning ghosts”—fundamentally different entities whose intelligence emerges from completely different optimization objectives than biological systems.
The distinction matters profoundly. Human neural networks evolved through natural selection in tribal survival scenarios. Large language models are optimized to replicate human text, achieve high scores on mathematical problems, and win approval in human evaluations. Given these entirely different evolutionary pressures, it should be unsurprising that the resulting intelligence manifests in radically different ways.
This leads to a striking observation: artificial intelligence displays a jagged, sawtooth pattern rather than smooth capability curves. Models may demonstrate encyclopedic expertise in one moment while struggling with elementary reasoning the next. They may show both brilliance and profound confusion, capable of generating remarkable solutions or leaking sensitive data under adversarial pressure.
This insight has profound implications for how we evaluate AI progress. Benchmarks, which represent verifiable environments, have become susceptible to RLVR optimization. AI teams increasingly construct training environments closely mirrored to benchmark embeddings, efficiently covering these specific capability zones. “Training on the test set” has become industry standard practice. The result: models may sweep every available benchmark while remaining far from achieving general artificial intelligence.
The Cursor Phenomenon: A New Application Layer Emerges
The rapid ascent of Cursor throughout 2025 revealed something unexpected about AI application architecture. What started as a specialized code editor evolved into a broader paradigm, sparking discussions about “Cursor for X domain” across multiple industries.
Cursor’s true breakthrough lies in demonstrating how to build a new layer of LLM applications. The fundamental principle: specialized applications orchestrate multiple LLM calls into increasingly sophisticated directed acyclic graphs, balancing performance against computational cost. These systems handle “context engineering”—identifying, retrieving, and prioritizing the most relevant information for each query. They provide domain-specific graphical interfaces that maintain humans in decision-making loops and offer adjustment mechanisms that let users dial model autonomy up or down based on task requirements.
Andrej Karpathy’s perspective on this layering suggests a future where large language model platforms evolve into “generalist graduate-level capabilities,” while specialized applications transform those generalists into “expert teams” by providing private data, environmental sensors, actuators, and continuous feedback loops for specific vertical markets.
Claude Code: Intelligent Agents Running on Your Computer
Anthropic’s Claude Code marked a watershed moment in how AI agents operate within human environments. It convincingly demonstrated how tool use and inference can cycle together iteratively, enabling complex, persistent problem-solving across extended interactions.
What distinguished Claude Code from competing approaches was its radical localization strategy. Rather than deploying agents in cloud-based containerized environments (OpenAI’s approach), Claude Code runs directly on the user’s personal computer. This local execution model deeply integrates the AI with the user’s private files, applications, development environment, and contextual knowledge—information that would be extraordinarily difficult to transmit to remote servers.
In a transitional period characterized by uneven capability development, this design choice reveals genuine strategic thinking. Deploying agents directly alongside developers in their working environments represents a more logical development path than constructing distributed cloud clusters. Claude Code distilled this insight into an elegant, commanding interface—transforming AI from a website requiring deliberate visits into a tiny, intelligent presence embedded within the user’s digital workspace.
Vibe Coding: Programming Without Code
By mid-2025, AI had crossed a critical capability threshold: the ability to build sophisticated applications using natural language descriptions, with programmers never needing to understand the underlying implementation. The concept captured imaginations quickly enough that Andrej Karpathy’s casual coining of the term “Vibe Coding” in a passing social media post evolved into an industry-wide movement.
Vibe Coding democratizes programming fundamentally. Professional barriers dissolve when anyone can describe what they want in natural language and receive working code. Andrej Karpathy documented his own experience using Vibe Coding to rapidly develop a custom BPE tokenizer in Rust while bypassing deep language expertise—code that “would never have been written otherwise” had traditional programming demands remained in place.
The implications extend beyond accessibility. Professional developers gain newfound freedom to build exploratory prototypes, test architectural ideas at minimal cost, and write single-use applications for specific investigations. Code becomes ephemeral and disposable. The boundaries between users and creators blur. Software development transforms into a domain where ordinary people and professional developers alike can contribute meaningfully, reshaping career definitions and technical skill expectations.
Nano Banana and Beyond: Why AI Needs Visual Interfaces
Google’s Gemini Nano and similar developments represent, in Andrej Karpathy’s assessment, one of 2025’s most transformative shifts. The broader insight: large language models represent the next computing paradigm following the desktop and microcomputer eras of the 1970s and 1980s.
If this parallel holds, we should expect similar innovations emerging from similar technological foundations. Personal computing’s graphical user interface revolution didn’t arrive because text commands were impossible—they worked fine for experts—but because visual representations matched human cognitive preferences more closely.
Text, while computationally primitive, aligns poorly with human input preferences and information consumption patterns. Humans visually process spatial and graphical information far more efficiently than parsed text. They naturally prefer receiving information through images, diagrams, slides, whiteboards, and multimedia rather than parsing sentences.
Current LLM interfaces operate via dialogue—essentially command-line interactions with text, similar to computing in the 1980s. The question of who will build the graphical layer for artificial intelligence remains partially open, but products like Nano Banana point toward the answer. What distinguishes Nano Banana isn’t merely image generation capability, but rather the integrated synthesis of text generation, visual creation, and world knowledge woven throughout the model’s weight structure.
These six shifts—from verifiable reward optimization to visual interfaces, from human-dependent feedback to AI agents running locally, from specialized expertise to accessible programming—reveal an industry in radical transformation. The frameworks that guided AI development in the early 2020s have given way to fundamentally new approaches, each opening possibilities that seemed impossible just months before. As Andrej Karpathy’s observations underscore, 2025 will be remembered not for incremental progress but for the moment when artificial intelligence fundamentally reinvented itself.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Six AI Revolutions in 2025: Andrej Karpathy's Guide to the Industry's Biggest Shifts
The artificial intelligence landscape underwent seismic changes throughout 2025, with transformations so fundamental that they reshaped how we think about machine learning, software development, and human-computer interaction. Andrej Karpathy, a prominent AI researcher and technologist, identified six major evolutionary shifts that have fundamentally altered the field. These aren’t incremental improvements—they represent breakthrough moments that challenged existing assumptions and opened entirely new possibilities.
The Emergence of Verifiable Reward Learning: Beyond Human Feedback
For years, the production training stack for large language models followed a predictable three-stage process: pre-training (like GPT-2 and GPT-3 from 2020), supervised fine-tuning (InstructGPT in 2022), and reinforcement learning from human feedback (RLHF, also 2022). This approach proved stable and mature, dominating the industry’s approach to building production-grade LLMs.
By 2025, a fundamental shift occurred. Reinforcement learning based on verifiable rewards (RLVR) became the core technology embraced by leading AI laboratories. The distinction is crucial: instead of relying on human judgment to score model outputs, RLVR leverages automatically verifiable environments—mathematical problem-solving, programming challenges, and similar domains where correctness can be objectively determined.
Models trained this way spontaneously develop what humans would recognize as “reasoning strategies.” They learn to break complex problems into intermediate computational steps and discover multiple solution pathways through iterative refinement. OpenAI’s o1 model (released in late 2024) offered the first glimpse of this capability, while the subsequent launch of o3 (early 2025) demonstrated the dramatic potential of this approach. The DeepSeek-R1 paper provided additional evidence of how these verifiable environments enable models to construct explicit reasoning chains.
What makes RLVR different from previous approaches is the computational intensity required. Unlike supervised fine-tuning and RLHF—which involve relatively brief, computationally modest phases—verifiable reward training demands extended optimization cycles against objective, deterministic reward functions. This means the computational resources originally allocated to pre-training are being redirected toward this new training paradigm. The key innovation: model capability can now be adjusted as a function of test-time computational cost by generating longer inference chains and providing more “thinking time.” This represents an entirely new dimension of scaling behavior.
Understanding AI Intelligence: Ghostly Entities Rather Than Digital Creatures
In 2025, the industry gained fresh perspective on how artificial intelligence actually works. Andrej Karpathy articulated an insight that resonated throughout the field: we are not “breeding digital animals” but rather “summoning ghosts”—fundamentally different entities whose intelligence emerges from completely different optimization objectives than biological systems.
The distinction matters profoundly. Human neural networks evolved through natural selection in tribal survival scenarios. Large language models are optimized to replicate human text, achieve high scores on mathematical problems, and win approval in human evaluations. Given these entirely different evolutionary pressures, it should be unsurprising that the resulting intelligence manifests in radically different ways.
This leads to a striking observation: artificial intelligence displays a jagged, sawtooth pattern rather than smooth capability curves. Models may demonstrate encyclopedic expertise in one moment while struggling with elementary reasoning the next. They may show both brilliance and profound confusion, capable of generating remarkable solutions or leaking sensitive data under adversarial pressure.
This insight has profound implications for how we evaluate AI progress. Benchmarks, which represent verifiable environments, have become susceptible to RLVR optimization. AI teams increasingly construct training environments closely mirrored to benchmark embeddings, efficiently covering these specific capability zones. “Training on the test set” has become industry standard practice. The result: models may sweep every available benchmark while remaining far from achieving general artificial intelligence.
The Cursor Phenomenon: A New Application Layer Emerges
The rapid ascent of Cursor throughout 2025 revealed something unexpected about AI application architecture. What started as a specialized code editor evolved into a broader paradigm, sparking discussions about “Cursor for X domain” across multiple industries.
Cursor’s true breakthrough lies in demonstrating how to build a new layer of LLM applications. The fundamental principle: specialized applications orchestrate multiple LLM calls into increasingly sophisticated directed acyclic graphs, balancing performance against computational cost. These systems handle “context engineering”—identifying, retrieving, and prioritizing the most relevant information for each query. They provide domain-specific graphical interfaces that maintain humans in decision-making loops and offer adjustment mechanisms that let users dial model autonomy up or down based on task requirements.
Andrej Karpathy’s perspective on this layering suggests a future where large language model platforms evolve into “generalist graduate-level capabilities,” while specialized applications transform those generalists into “expert teams” by providing private data, environmental sensors, actuators, and continuous feedback loops for specific vertical markets.
Claude Code: Intelligent Agents Running on Your Computer
Anthropic’s Claude Code marked a watershed moment in how AI agents operate within human environments. It convincingly demonstrated how tool use and inference can cycle together iteratively, enabling complex, persistent problem-solving across extended interactions.
What distinguished Claude Code from competing approaches was its radical localization strategy. Rather than deploying agents in cloud-based containerized environments (OpenAI’s approach), Claude Code runs directly on the user’s personal computer. This local execution model deeply integrates the AI with the user’s private files, applications, development environment, and contextual knowledge—information that would be extraordinarily difficult to transmit to remote servers.
In a transitional period characterized by uneven capability development, this design choice reveals genuine strategic thinking. Deploying agents directly alongside developers in their working environments represents a more logical development path than constructing distributed cloud clusters. Claude Code distilled this insight into an elegant, commanding interface—transforming AI from a website requiring deliberate visits into a tiny, intelligent presence embedded within the user’s digital workspace.
Vibe Coding: Programming Without Code
By mid-2025, AI had crossed a critical capability threshold: the ability to build sophisticated applications using natural language descriptions, with programmers never needing to understand the underlying implementation. The concept captured imaginations quickly enough that Andrej Karpathy’s casual coining of the term “Vibe Coding” in a passing social media post evolved into an industry-wide movement.
Vibe Coding democratizes programming fundamentally. Professional barriers dissolve when anyone can describe what they want in natural language and receive working code. Andrej Karpathy documented his own experience using Vibe Coding to rapidly develop a custom BPE tokenizer in Rust while bypassing deep language expertise—code that “would never have been written otherwise” had traditional programming demands remained in place.
The implications extend beyond accessibility. Professional developers gain newfound freedom to build exploratory prototypes, test architectural ideas at minimal cost, and write single-use applications for specific investigations. Code becomes ephemeral and disposable. The boundaries between users and creators blur. Software development transforms into a domain where ordinary people and professional developers alike can contribute meaningfully, reshaping career definitions and technical skill expectations.
Nano Banana and Beyond: Why AI Needs Visual Interfaces
Google’s Gemini Nano and similar developments represent, in Andrej Karpathy’s assessment, one of 2025’s most transformative shifts. The broader insight: large language models represent the next computing paradigm following the desktop and microcomputer eras of the 1970s and 1980s.
If this parallel holds, we should expect similar innovations emerging from similar technological foundations. Personal computing’s graphical user interface revolution didn’t arrive because text commands were impossible—they worked fine for experts—but because visual representations matched human cognitive preferences more closely.
Text, while computationally primitive, aligns poorly with human input preferences and information consumption patterns. Humans visually process spatial and graphical information far more efficiently than parsed text. They naturally prefer receiving information through images, diagrams, slides, whiteboards, and multimedia rather than parsing sentences.
Current LLM interfaces operate via dialogue—essentially command-line interactions with text, similar to computing in the 1980s. The question of who will build the graphical layer for artificial intelligence remains partially open, but products like Nano Banana point toward the answer. What distinguishes Nano Banana isn’t merely image generation capability, but rather the integrated synthesis of text generation, visual creation, and world knowledge woven throughout the model’s weight structure.
These six shifts—from verifiable reward optimization to visual interfaces, from human-dependent feedback to AI agents running locally, from specialized expertise to accessible programming—reveal an industry in radical transformation. The frameworks that guided AI development in the early 2020s have given way to fundamentally new approaches, each opening possibilities that seemed impossible just months before. As Andrej Karpathy’s observations underscore, 2025 will be remembered not for incremental progress but for the moment when artificial intelligence fundamentally reinvented itself.