Bombshell! Nvidia Unveils 7 Chips in Succession, Jensen Huang's Speech Goes Viral

On March 17th early morning, NVIDIA’s annual GTC conference took place as scheduled. “AI Guru” and NVIDIA founder and CEO Jensen Huang announced several major technological breakthroughs during his keynote speech, including a new generation AI accelerator chip architecture, with an aggressive prediction that it will generate at least $1 trillion in revenue.

After Huang’s speech, NVIDIA’s stock price surged, reaching a peak increase of 4.31% during trading and closing up 1.65%.

Concept sectors show that nearly 60 stocks related to NVIDIA concepts are currently listed in the A-share market, with a total market value exceeding 2.7 trillion yuan. Leading the pack is Industrial Fulian (601138), followed by Shenghong Technology (300476), InnoSilicon (002837), with market caps over 100 billion yuan, and others like Huachin Technology (603296), Inspur Information (000977), Tsinghua Unigroup (000938), and Magmet (002851).

Since the beginning of this year, NVIDIA concept stocks have shown divergent trends, with about 60% of stocks rising. Leading the gains, Litong Electronics (603629) soared 1.3 times, while Helin MicroNano and Roboteek (300757) increased by 86.82% and 62.42%, respectively. Companies like Hongchang Electronics (603002), Shunwang Technology (300113), Magmet, and Hangjin Technology (000818) saw gains between 20% and 50%.

At the GTC conference, NVIDIA announced that seven new Vera Rubin chips (the latest chip architecture) are now in full production, and the Vera Rubin platform is ushering in a new era of Agentic AI, building the world’s largest AI factory.

Specifically, these chip products include:

NVIDIA Vera CPU (Yes, NVIDIA has entered the server CPU market)

NVIDIA Rubin GPU (Flagship GPU product)

NVIDIA NVLink 6 (Sixth-generation NVLink switch chip, internal interconnect)

NVIDIA ConnectX-9 SuperNIC (Super Network Interface Card)

NVIDIA BlueField-4 DPU (Storage chip)

NVIDIA Spectrum-6 (Ethernet switch chip supporting CPO technology)

And a newly integrated NVIDIA Groq 3 LPU (the first chip after acquiring Groq).

The chip family includes not only the well-known CPUs and GPUs but also LPU from Groq, storage chips, switch chips, and more. These chips can form five rack types for data center operation.

The Vera Rubin platform consolidates all these chips into a powerful AI supercomputer. Whether for large-scale pre-training, post-training expansion, testing, or real-time intelligent inference, this computing beast can support it all.

“Vera Rubin represents a generational leap—seven groundbreaking chips, five rack types, and one giant supercomputer—powering every stage of AI,” Huang said. “With the launch of Vera Rubin, the inflection point for Agentic AI has arrived, and it will trigger the largest infrastructure build in history.”

Huang also predicted that the revenue from Blackwell and Rubin AI chips will reach $1 trillion by the end of 2027, doubling the $500 billion sales forecast made last October.

Today’s announcement is unprecedented. It is not just about GPUs or a single technological upgrade. Huang emphasized the “Token” economy and the “five-layer cake” theory of AI.

As early as the February GTC 2026 preview, Huang stated: “We have prepared several unprecedented new chips. All technologies have reached their limits, so nothing is easy.”

On one hand, the description of “unprecedented new chips” excites the world. Over the past decade, NVIDIA has continuously launched high-performance chips, making unexpected performance a routine. The technological roadmap from Hopper, Blackwell, Rubin, to Feyman is clear, with at least five years of compute capability assured.

On the other hand, the statement “all technologies have reached their limits” is not just hype. It raises concerns in the hot capital markets, implying that breakthroughs across multiple orders of magnitude may be exhausted prematurely, and further progress could be difficult.

In this launch, NVIDIA also introduced system-level innovations. In the long development of AI, if the past two years have been a “arms race” for compute power, 2026 marks the beginning of a new era of system-level evolution. The competition has shifted from single chips to systematic AI infrastructure construction.

Huang mentioned last year at GTC that NVIDIA aims to transform into an AI infrastructure company, and this year, they are already implementing that vision. NVIDIA is no longer just a “shovel seller” of the era; by building a complete system from compute to application, it is becoming the foundational platform of the entire AI ecosystem, aiming to be as essential as water and electricity in the AI era.

Additionally, NVIDIA announced significant progress around AI agents, open models, and cross-industry applications, including the release of NVIDIA Nemo Claw, an open Physical AI data factory blueprint to accelerate robotics, visual AI agents, and autonomous vehicles, and space computing services to bring AI into orbital data centers (ODC), geospatial intelligence, and autonomous space operations. The NVIDIA Space-1 Vera Rubin module is the latest component.

A trend is emerging: industry giants are continuously consolidating capabilities, filling gaps, extending upstream and downstream, and forming stronger barriers. The era of competing solely on chips and performance is over; a comprehensive, system-level fierce competition is underway.

Vera Rubin’s Radical Transformation: From Single Chips to System-Level Era

As the successor to Blackwell, NVIDIA plans to mass-produce the Rubin (R100) architecture in the second half of 2026. The core architecture fully transitions to TSMC’s 3nm (N3P) process, featuring the Vera CPU (based on the 88-core self-developed Olympus architecture) and Rubin GPU integrated physically via 1.8 TB/s NVLink-C2C technology.

This “de-PCIe” tightly coupled design removes limitations imposed by traditional links. Single GPU inference performance under FP4 precision reaches 50 PFlops, training reaches 35 PFlops, and the scale efficiency for inference is five times higher than Blackwell.

Specifically, the core rack product built with new chips includes the Vera Rubin NVL72 rack, which connects 72 Rubin GPUs and 36 Vera CPUs via NVLink 6, equipped with ConnectX-9 SuperNIC and BlueField-4 DPU for efficiency gains.

It is reported that for training large hybrid expert models, Rubin requires only a quarter of the GPUs compared to Blackwell, achieving up to 10 times the inference throughput per watt and one-tenth the token cost. This system is designed for the world’s largest AI factories. The NVL72 can seamlessly expand via NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet, maintaining high utilization in large GPU clusters while reducing training time and total ownership costs.

In application scenarios, Rubin is tailored for “Agentic AI” and long-context reasoning. It introduces Transformer Engine 3.0 and Inference Context Memory, with BlueField-4 DPU offloading storage management, enabling AI agents to handle tens of thousands of tokens of context, perform multi-step logical reasoning, and make real-time decisions. The platform features silicon photonics (CPO) support with Spectrum-X Ethernet Photonics, with an internal bandwidth of 260 TB/s in a single cabinet, multiple times the global internet cross-border bandwidth.

Besides GPU racks, NVIDIA also launched the Vera CPU rack, built on NVIDIA MGX infrastructure, with 256 Vera CPUs for scalable, energy-efficient capacity, and with top-tier single-thread performance. Together with GPU racks, they provide the CPU backbone for large-scale Agentic AI and reinforcement learning—Vera’s efficiency doubles that of traditional CPUs, with a 50% speed increase.

Current customers deploying Vera CPUs include Alibaba, ByteDance, Meta, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, and Nscale. Vera is now fully in production and will be shipped in the second half of this year.

While chip and rack performance accelerate, NVIDIA increasingly focuses on energy consumption and power issues. Energy is currently the biggest bottleneck in AI infrastructure development. NVIDIA is collaborating with energy providers to accelerate power access and strengthen grid stability; simultaneously, over 200 data center infrastructure partners have launched the DSX platform, applied to Vera Rubin.

The new DSX platform includes DSX Max-Q, enabling dynamic power allocation across AI factories, allowing 30% more AI infrastructure within fixed power data centers. The new DSX Flex software makes AI factories flexible grid assets, releasing 100 gigawatts of idle grid power.

By Vera Rubin’s generation, NVIDIA’s AI infrastructure is no longer just a GPU; it is a “supercomputing unit” integrating compute, interconnect, storage, and liquid cooling infrastructure, marking a new era where AI infrastructure costs drop tenfold for trillions of tokens, with eightfold energy efficiency improvements.

As NVIDIA states, AI infrastructure is rapidly evolving from discrete chips and standalone servers to fully integrated rack-level systems, POD deployments, AI factories, and sovereign AI.

NVIDIA even introduced the Vera Rubin DSX AI factory reference design, guiding how to design, build, and operate the entire AI factory infrastructure stack, including compute, NVIDIA Spectrum-X Ethernet network, and storage, to achieve repeatable, scalable, and optimal cluster performance.

Traditional data centers and AI infrastructure are facing new transformations. Huang said: “In the AI era, intelligent tokens are the new currency, and AI factories are the infrastructure generating these tokens. Through the Vera Rubin DSX AI Factory reference design and Omniverse DSX Blueprint (digital twin blueprint), we are providing the foundation to build the world’s most productive AI factories, accelerating initial revenue and maximizing scale and energy efficiency.”

Furthermore, Huang previewed the next-generation Feynman system at this conference. It will feature new GPUs, LPUs, a new CPU called Rosa, Bluefield 5, and the Kyber architecture, supporting copper cables and CPO expansion. The Feynman system is expected to be released in 2028.

Groq LPU Inference Chip: Building a Hybrid Compute Empire with GPUs

Looking at the highly anticipated Groq chips.

By the end of 2025, NVIDIA’s $20 billion strategic licensing and deep integration of Groq LPU (Language Processing Unit) architecture marks a “supersonic interceptor” for precise latency hunting and real-time interaction. This partnership signifies NVIDIA’s expansion of AI warfare from “training efficiency” to “inference efficiency,” introducing Groq founder Jonathan Ross (former Google TPU chief) to lead a software-defined silicon paradigm, breaking through the bottlenecks of traditional GPUs in generative AI inference scenarios.

NVIDIA states that the newly launched Groq 3 LPX (rack) represents a milestone in accelerated computing. The LPX rack contains 256 LPU processors with 128GB on-chip SRAM and 640 TB/s bandwidth. When deployed with Vera Rubin NVL72, Rubin GPU and LPU work together to accelerate AI model decoding by computing each layer, providing inference for each output token.

In other words, LPX is designed for low latency and large context needs in agentic systems. When combined with Vera Rubin, it merges the ultimate performance of both processors, achieving up to 35 times the inference throughput per watt and offering up to 10 times the revenue opportunity for trillion-parameter models.

The architecture is optimized for trillion-parameter models and hundreds of thousands of tokens of context, working in synergy with Vera Rubin to maximize power, memory, and compute efficiency. This enables higher throughput and token performance, opening a new inference tier—ultra-high-end, trillion-parameter, multi-million token reasoning—expanding revenue opportunities for all AI providers.

LPX uses full liquid cooling and is built on MGX infrastructure, seamlessly integrating into the next-generation Vera Rubin AI factory, and will be available later this year.

Entering the inference era, NVIDIA combines new architectures beyond GPUs to greatly improve efficiency.

In terms of architecture, Groq LPU abandons the traditional GPU’s complex cache management, branch prediction, and instruction reordering hardware designs, adopting a deterministic pipeline architecture. This design strips hardware complexity to the compiler level, ensuring data flows within the chip like a precise conveyor belt with no jitter.

To overcome the industry’s long-standing “memory wall” bottleneck, LPU replaces high-bandwidth but high-latency HBM with up to 230MB of on-chip SRAM, increasing memory bandwidth to 80 TB/s—ten times that of top-tier Blackwell GPUs. With this bandwidth, LPU can achieve near “perceptionless” first-byte latency (TTFT) in batch size 1 inference, maintaining token generation speeds above 1600 tokens/sec, transforming large language model responses from “word-by-word jumping” to “instantaneous writing.”

Simply put, the “typewriter” effect in current AI chat is due to insufficient speed; with LPU’s capabilities, future AI conversations will deliver all text instantly. This performance boost relies heavily on inference speed improvements.

In practical applications, NVIDIA-powered LPX racks are becoming the “savior” for “Agentic AI” and real-time voice interactions. In autonomous driving assistance or high-frequency trading robots, even millisecond fluctuations can cause decision failures, but LPU’s deterministic compute guarantees consistent task execution times.

For complex agent chains involving multi-step reasoning and hundreds of model calls, LPU can reduce what used to take minutes of serial thinking to just seconds, enabling AI to engage in natural, fluent real-time dialogue and collaboration like humans. To support this new computing paradigm, NVIDIA integrates LPU units into its extensive CUDA ecosystem via NVFusion, enabling fast scheduling of trained weights from GPU to LPU inference arrays through a disaggregated architecture.

With this capability, NVIDIA separates training and inference, building a hybrid compute empire: GPUs focus on training trillion-parameter models and long-text preprocessing, while LPU arrays handle real-time inference at ten times the efficiency and response speed of competitors, dominating the trillion-level inference market and heralding the “instant inference” era.

NVIDIA’s “Lobster” Emerges: Embracing the Intelligent Agent Era

Meanwhile, NVIDIA has announced significant progress around AI agents, open models, and cross-industry applications. As AI evolves from a simple dialogue tool to autonomous task planners that invoke tools and execute complex workflows, software platforms, model capabilities, and ecosystems around agent systems are becoming new industry battlegrounds.

In this context, NVIDIA launched the Nemo Claw software stack for the OpenClaw ecosystem, established the Nemotron Coalition with global AI labs, and expanded multiple open model product lines to enhance its AI infrastructure and model ecosystem.

One of the most notable releases for developers is the Nemo Claw software stack for the OpenClaw community. Recently, the open-source project OpenClaw has gained rapid popularity among developers and is seen as a prototype of a “personal AI operating system.”

Huang highly praised OpenClaw: “OpenClaw opens the next frontier of AI for everyone and has become the fastest-growing open-source project in history,” he said. “Mac and Windows are operating systems for personal computers. OpenClaw is the operating system for personal AI. This is the moment everyone has been waiting for—the revival of the software era.” Unlike traditional AI applications, OpenClaw aims to make AI agents run continuously like applications, capable of autonomous task planning, tool invocation, and completing complex workflows.

Under this framework, Nemo Claw provides a comprehensive set of basic software capabilities, allowing developers to install NVIDIA Nemotron models and the newly released OpenShell runtime environment with a single command, adding security and privacy controls for AI agents. Using OpenShell’s sandbox environment, AI agents can follow security policies and privacy rules when accessing tools and data, ensuring data security while improving efficiency.

Nemo Claw also supports hybrid invocation of local and cloud models. Developers can run Nemotron models on dedicated user devices and access cutting-edge cloud models via privacy routing, balancing data privacy and computational power. NVIDIA states that Nemo Claw can operate on various dedicated computing platforms, including PCs and laptops with GeForce RTX graphics cards, RTX PRO workstations, DGX Station, and DGX Spark systems, providing stable compute power for continuous AI agent operation.

In parallel, NVIDIA is expanding its open model ecosystem. At this conference, NVIDIA announced the Nemotron Coalition, uniting leading AI labs and model developers worldwide to promote open frontier models. Founding members include Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam, and Thinking Machines Lab.

The first project of the coalition will be a jointly developed foundational model by Mistral AI and NVIDIA, with other members contributing data, evaluation systems, and domain knowledge. NVIDIA states that this model will serve as a key foundation for the upcoming NVIDIA Nemotron 4 open model family.

Beyond ecosystem collaboration, NVIDIA is also expanding multiple open model lines to support AI agents, physical intelligence, and medical research. The NVIDIA Nemotron 3 series enhances multimodal understanding, with versions like Ultra, Omni, and VoiceChat. These models can process language, vision, and speech simultaneously, enabling AI agents to conduct natural conversations, perform complex reasoning, and extract key information from videos and documents.

In addition to digital AI agents, NVIDIA is pushing AI into the physical world. New models include NVIDIA Isaac GR00T N1.7, a visual-language action model for humanoid robots capable of perception, reasoning, and action in real environments; NVIDIA Alpamayo 1.5, designed for autonomous driving with navigation cues, multi-camera support, and configurable camera parameters; and the upcoming NVIDIA Cosmos 3, claimed to be the first unified “world generation, physical reasoning, and action simulation” foundational model, aiming to assist robots and autonomous vehicles in complex environments.

From AI agent platforms to open model ecosystems, and applications in robotics, autonomous driving, and life sciences, NVIDIA is gradually building a comprehensive AI technology system covering both digital and physical worlds. As more developers and companies join the open model and AI agent ecosystems, this system is expected to further drive innovation and deployment of AI worldwide.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin