Jensen Huang Shapes "Token Economics" NVIDIA Embraces the Era of Intelligent Agents

2026-04-01 22:21:36

On the early morning of March 17, at the opening of Nvidia’s GTC—dubbed the AI “Super Bowl” and the AI “Spring Festival”—“AI guru” Jensen Huang, Nvidia’s founder and CEO, once again took center stage, refreshing the limits of physics with a technological storm.

Nvidia announced that Vera Rubin (the latest chip architecture) currently has seven new chip models already fully committed to mass production. The Vera Rubin platform is entering an Agentic AI era, building the world’s largest AI factory.

More specifically, these chip products include: NVIDIA Vera CPU (Nvidia has already moved into server CPUs), NVIDIA Rubin GPU (the flagship GPU product), NVIDIA NVLink 6 (the sixth-generation NVLink switch chip, with on-chip interconnect), NVIDIA ConnectX-9 SuperNIC (a SuperNIC card), NVIDIA BlueField-4 DPU (a storage chip), NVIDIA Spectrum-6 (an Ethernet switch chip that supports CPO technology), and the newly integrated NVIDIA Groq 3 LPU (the first chip after Nvidia’s acquisition of Groq).

As you can see, within the chip family there are not only the CPU and GPU products that everyone is familiar with, but also LPU chips from Groq, along with full-line products such as storage chips and switch chips. These chips can then form five server racks and run in data centers.

“Vera Rubin is a generational leap—seven breakthrough chips, five server racks, and a giant supercomputer—providing power for every stage of AI,” Huang said. “With the launch of Vera Rubin, the inflection point for Agentic AI has arrived, and it will kick off the largest infrastructure buildout in history.”

During the keynote, Huang also predicted that by the end of 2027, revenue from Blackwell and Rubin AI chips will reach $1 trillion. Compared with the $500 billion sales forecast from last October, that is already double.

This launch event could be called “unprecedented.” It’s not just GPUs, and it’s not just a specific technology upgrade either. Huang once again emphasized “Token” economics and implemented the AI “five-layer cake” theory.

One trend is that big players are continuously consolidating capabilities, filling gaps, and extending across the upstream and downstream value chain—forming even stronger moats. The era of competing singlehandedly on chips and performance is already gone. An all-around, system-level, fiercely intense race is underway.

Radical Innovation of Vera Rubin: From Single Chips to the System-Level Era

As the cross-generation successor to Blackwell, Nvidia has scheduled the Rubin (R100) architecture for mass production in the second half of 2026. At the core base layer, the architecture fully transitions to TSMC’s 3nm (N3P) process. Its signature Vera CPU (based on an 88-core in-house Olympus architecture) and Rubin GPU achieve true physical same-package integration through 1.8 TB/s NVLink-C2C technology.

This “de-PCIe” tightly coupled design means compute power is no longer constrained by traditional links. Under NVFP4 precision, single-GPU inference compute power increases to 50 PFlops, and training compute reaches 35 PFlops. Its scalable inference energy efficiency is up to 5x higher than Blackwell.

At the application-scenario level, Rubin is designed specifically as the digital factory heart for “Agentic AI” and long-context inference. It introduces Transformer Engine 3.0 and the Inference Context Memory storage platform. By offloading storage-management pressure via the BlueField-4 DPU, AI agents can process context relationships of tens of thousands of tokens and perform multi-step logical reasoning and real-time decision-making. The platform is equipped with a Spectrum-X Ethernet Photonics network supporting silicon photonics (CPO). The total internal interconnect bandwidth within a single NVL72 rack is 260 TB/s—several times the total amount of cross-border bandwidth of the global internet.

Nvidia also released a Vera CPU rack, built on Nvidia’s MGX to form a high-density liquid-cooling infrastructure. It integrates 256 Vera CPUs to provide scalable, energy-saving capacity and world-class single-thread performance. Along with GPU compute racks, they provide the CPU foundation for large-scale Agentic AI and reinforcement learning—Vera’s efficiency is twice that of traditional CPUs, and its speed is 50% higher.

Currently, customers working with Nvidia to deploy Vera CPUs include Alibaba, ByteDance, Meta, and Oracle Cloud Infrastructure, as well as CoreWeave, Lambda, Nebius, and Nscale. Vera is already in full production and will be delivered in the second half of this year.

Traditional data centers and AI infrastructure are facing new transformations. Huang said, “In the AI era, intelligent tokens are the new currency, and AI factories are the infrastructure that generates those tokens. Through the Vera Rubin DSX AI Factory reference design and the Omniverse DSX Blueprint (digital twin blueprint), we are providing the foundation to build the world’s highest-productivity AI factories—accelerating the time to first revenue and maximizing scale and energy efficiency.”

Groq LPU Inference Chips: Building a Hybrid Compute Empire with GPUs

Next, let’s look at the highly anticipated Groq chips.

At the end of 2025, Nvidia’s Groq LPU (Language Processing Unit) architecture—strategically licensed and deeply integrated through a $20 billion investment—acts like a “supersonic interception fighter,” precisely hunting down latency and ushering in the era of real-time interaction.

Nvidia said that the newly launched Groq 3 LPX (rack) marks a milestone in accelerated computing. The LPX rack includes 256 LPU processors, with 128GB of on-chip SRAM and 640 TB/s of expanded bandwidth. When deployed together with Vera Rubin NVL72, Rubin GPU and LPU improve decoding speed by jointly computing each layer of the AI model—thereby providing compute for every output token.

At the same time, the LPX uses a fully liquid-cooled design and is built on MGX infrastructure, enabling seamless integration into the next-generation Vera Rubin AI factory, with availability in the second half of this year.

Entering the inference era, beyond GPUs, Nvidia has fused in a new architecture to greatly improve efficiency.

In terms of technical architecture, the Groq LPU discards “speculative” hardware designs found in traditional GPUs—such as complex cache management, branch prediction, and instruction reordering. Instead, it adopts a deterministic pipeline architecture. This design strips hardware complexity entirely down to the compiler layer. Data flows within the chip like a precision conveyor belt, with no uncontrollable jitter.

In real-world application scenarios, the LPX racks powered by Nvidia technology are becoming the only lifesaver for “Agentic AI” and “real-time voice interaction.” In driver-assistance systems or high-frequency trading robots, any millisecond-level compute fluctuation could cause decisions to fail. The LPU’s deterministic compute ensures task execution time is always constant.

For complex agent chains requiring multi-step reasoning—even involving hundreds of model calls—the LPU can shorten what would previously take minutes of serial thinking to just seconds, enabling AI to conduct natural, fluent real-time conversations and collaboration like humans. To support this new computing paradigm, Nvidia seamlessly embeds the LPU units into its massive CUDA ecosystem via the NVFusion technology, and rapidly schedules trained weights from GPUs to LPU inference arrays through a disaggregated architecture.

With this capability, Nvidia separates training and inference, building a hybrid compute empire: GPUs handle training of trillion-parameter models and long-text preprocessing in the back end, while the LPU array holds the front line—with an energy-efficiency ratio 10x that of competitors and ultra-fast responsiveness—dominating the trillion-scale real-time inference market, officially declaring the arrival of the “instant inference” era.

Nvidia’s “Lobster” Is Here: Embracing the Age of Intelligent Agents

Meanwhile, Nvidia has released a series of important updates around AI agents (Agent), open models, and cross-industry applications. The most developer-focused release is the NemoClaw software stack built for the OpenClaw community. Recently, the open-source project OpenClaw has quickly gone viral in the developer community, and many in the industry view it as the early shape of a “personal AI operating system.”

Huang also spoke highly of OpenClaw. “OpenClaw opens the next frontier of AI to everyone and has become the fastest-growing open-source project in history,” Huang said. “Unlike traditional AI applications, OpenClaw aims to let AI agents run continuously like applications—able to autonomously plan tasks, call tools, and complete complex workflows.”

Under this framework, NemoClaw provides a complete set of core software capabilities. Developers can install NVIDIA Nemotron models and the newly released OpenShell runtime environment with a single command, and add security and privacy controls to AI agents. With the isolation sandbox environment provided by OpenShell, when AI agents access tools and data, they can follow established security policies and privacy rules—thereby improving efficiency while ensuring data security.

NemoClaw also supports hybrid invocation of local models and cloud models. Developers can run Nemotron models on user-dedicated devices, while accessing the cutting-edge cloud models via a privacy router—so they obtain stronger compute capability while ensuring data privacy. Nvidia said that NemoClaw can run on a variety of dedicated compute platforms, including PCs and laptops equipped with GeForce RTX GPUs, RTX PRO workstations, and DGX Station and DGX Spark systems—providing stable compute for AI agents to run around the clock.

While accelerating the development of the AI agent platform, Nvidia is also speeding up the construction of an open model ecosystem. At this conference, Nvidia announced the establishment of the Nemotron Coalition (Nemotron alliance), bringing together many leading AI labs and model development organizations worldwide to jointly advance the development of open frontier models.

Beyond ecosystem-level collaboration, Nvidia is also expanding multiple open model product lines to support the development of different fields such as AI agents, physical intelligence, and medical research. Among them, the NVIDIA Nemotron 3 series models further strengthen multimodal understanding capabilities, introducing multiple versions such as Ultra, Omni, and VoiceChat. These models can process language, vision, and speech information at the same time—so AI agents can not only engage in natural conversation, but also complete complex reasoning tasks and extract key information from diverse data sources such as video and documents.

Beyond AI agents in the digital world, Nvidia is also pushing artificial intelligence into the real world. The new models released in this announcement include various foundation models for robots and autonomous driving systems. For example, NVIDIA Isaac GR00T N1.7 is a vision-language-action model for humanoid-like robots, enabling robots to perceive, reason, and make action decisions in real environments.

NVIDIA Alpamayo 1.5 targets autonomous driving scenarios. It improves vehicle inference capability through navigation prompts, multi-camera support, and configurable camera parameters. Meanwhile, the soon-to-be-released NVIDIA Cosmos 3 is described as the first unified “world generation, physical reasoning, and action simulation” foundation model, and is expected to help robots and autonomous driving systems complete training and decision-making in complex environments.

From the AI agent platform to the open model ecosystem, and then to applications in robotics, autonomous driving, and life sciences, Nvidia is gradually building an AI technology system spanning both the digital and physical worlds. As more developers and enterprises join the open model and AI agent ecosystem, this system is also expected to further drive innovation and real-world deployment of artificial intelligence globally.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes