NVIDIA GTC 2025: Blackwell Ultra, Vera Rubin, and the AI Factory Vision

NVIDIA GTC 2025: Blackwell Ultra, Vera Rubin, and the AI Factory Vision

Jensen Huang's AI Infrastructure Vision

On March 18, 2025, at GTC 2025, NVIDIA CEO Jensen Huang unveiled Blackwell Ultra and the next-generation Vera Rubin architecture—painting a roadmap for AI infrastructure through 2027. The keynote was a statement: NVIDIA isn't just a chip company, it's building the computing platform for the AI era.

Blackwell Ultra: The Immediate Future

Blackwell Ultra GPUs ship in the second half of 2025, bringing significant improvements over the original Blackwell architecture:

SpecificationBlackwell Ultra (B300)Blackwell (B200)Hopper (H100)
FP4 Performance15 PFLOPS10 PFLOPSN/A
FP8 Performance7.5 PFLOPS5 PFLOPS2 PFLOPS
HBM Memory288 GB HBM3e192 GB HBM3e80 GB HBM3
Memory Bandwidth8 TB/s8 TB/s3.35 TB/s
NVLink Bandwidth1.8 TB/s1.8 TB/s0.9 TB/s
TDP1400W1000W700W

The doubling of FP4 performance is significant for inference workloads. Modern LLMs increasingly use FP4/FP8 quantization, making this the practical performance metric for AI deployment.

Vera Rubin: The 2026-2027 Platform

Named after astronomer Vera Rubin, this next-generation platform includes:

  • Vera Rubin GPU: Successor to Blackwell, expected late 2026
  • Vera CPU: NVIDIA's own ARM-based server processor
  • NVLink 6: Next-generation interconnect for multi-GPU communication
  • CX9 SuperNIC: Advanced networking for AI clusters
text
1NVIDIA AI Platform Roadmap:
22023: Hopper (H100/H200)
32024: Blackwell (B200/GB200)
42025: Blackwell Ultra (B300/GB300)
52026: Vera Rubin (R100)
62027: Vera Rubin Ultra
72028: Feynman (announced)
8
9Key: Each generation ~2x inference performance

DGX Spark and DGX Station: AI on Your Desk

Huang announced two "personal AI supercomputers":

DGX Spark ($3,999):

  • Grace Blackwell processor
  • 128 GB unified memory
  • Up to 1 PFLOPS FP4 AI performance
  • Runs on standard power outlet
  • Target: Researchers, developers, small teams

DGX Station:

  • Full GB300 Grace Blackwell
  • 784 GB memory
  • 20 PFLOPS FP4 performance
  • Liquid cooled desktop form factor
  • Target: AI labs, enterprises, serious research
bash
1# DGX Spark can run frontier models locally:
2# - Llama 4 Scout (109B MoE): runs at ~30 tokens/sec
3# - DeepSeek R1 (671B MoE): runs at ~10 tokens/sec
4# - Mistral Large (123B): runs at ~25 tokens/sec
5# - Any model under 128B parameters at full precision

This is transformative: for $4,000, researchers get hardware that was only available in datacenters a year ago.

NVIDIA Dynamo: AI Inference OS

NVIDIA announced Dynamo—an open-source inference serving framework:

python
1# NVIDIA Dynamo simplifies large model deployment
2# Handles: tensor parallelism, pipeline parallelism,
3# KV-cache management, request scheduling
4
5# Deploy a model with automatic optimization:
6dynamo serve --model meta-llama/Llama-4-Scout \
7    --tensor-parallel 2 \
8    --max-batch-size 256 \
9    --kv-cache-quantization fp8

Key features:

  • Disaggregated serving: Separate prefill and decode phases for efficiency
  • Smart routing: Direct requests to optimal GPU based on KV-cache state
  • Elastic scaling: Auto-scale GPU allocation based on demand
  • Multi-model: Serve multiple models on the same GPU cluster

Isaac GR00T: Physical AI

Huang dedicated significant time to robotics with the Isaac GR00T platform:

  • Foundation model for humanoid robots
  • Simulation-to-real transfer via Omniverse
  • Partners: Figure AI, Agility Robotics, Boston Dynamics, and 10+ companies
  • Goal: Robots that learn from watching humans, not programming

The vision: every factory, warehouse, and eventually home has AI-powered robots, all running NVIDIA's software stack on NVIDIA's hardware.

NIM and NEMO: AI Microservices

NVIDIA's software ecosystem continues to expand:

ServiceFunction
NIM (NVIDIA Inference Microservices)Pre-optimized model containers
NEMOFramework for custom model training
GuardrailsSafety and compliance for AI apps
RivaSpeech AI (ASR + TTS)
BioNeMoDrug discovery models
cuOptLogistics optimization

Financial Context

NVIDIA's AI dominance is reflected in its financials:

  • Data center revenue: $35.6B (Q4 FY2025), up 93% YoY
  • Market cap: ~$3.5 trillion (world's most valuable company)
  • AI training market share: ~95% of AI training runs on NVIDIA GPUs
  • Inference market share: ~80% and growing

The Competitive Landscape

CompetitorProductStatus
AMDMI325X / MI400Gaining share, especially in inference
IntelGaudi 3Struggling, limited traction
GoogleTPU v6 (Trillium)Strong for internal use, limited external
AWSTrainium 2/3Growing for AWS-native workloads
CerebrasWSE-3Niche but innovative (wafer-scale)
GroqLPU InferenceFastest inference, limited availability

Despite growing competition, NVIDIA's CUDA ecosystem, software stack (NIM, NEMO, TensorRT), and supply chain relationships create a formidable moat.

Impact on AI Development

GTC 2025's message is clear: AI infrastructure is scaling from datacenter to desktop. With DGX Spark at $4K, AI development is no longer gated by access to cloud GPUs. Combined with open-source models like Llama 4 and DeepSeek R1, the barrier to AI development continues to fall.

Sources: NVIDIA GTC 2025, NVIDIA Blog, NVIDIA Developer