
Jensen Huang's AI Infrastructure Vision
On March 18, 2025, at GTC 2025, NVIDIA CEO Jensen Huang unveiled Blackwell Ultra and the next-generation Vera Rubin architecture—painting a roadmap for AI infrastructure through 2027. The keynote was a statement: NVIDIA isn't just a chip company, it's building the computing platform for the AI era.
Blackwell Ultra: The Immediate Future
Blackwell Ultra GPUs ship in the second half of 2025, bringing significant improvements over the original Blackwell architecture:
| Specification | Blackwell Ultra (B300) | Blackwell (B200) | Hopper (H100) |
|---|---|---|---|
| FP4 Performance | 15 PFLOPS | 10 PFLOPS | N/A |
| FP8 Performance | 7.5 PFLOPS | 5 PFLOPS | 2 PFLOPS |
| HBM Memory | 288 GB HBM3e | 192 GB HBM3e | 80 GB HBM3 |
| Memory Bandwidth | 8 TB/s | 8 TB/s | 3.35 TB/s |
| NVLink Bandwidth | 1.8 TB/s | 1.8 TB/s | 0.9 TB/s |
| TDP | 1400W | 1000W | 700W |
The doubling of FP4 performance is significant for inference workloads. Modern LLMs increasingly use FP4/FP8 quantization, making this the practical performance metric for AI deployment.
Vera Rubin: The 2026-2027 Platform
Named after astronomer Vera Rubin, this next-generation platform includes:
- Vera Rubin GPU: Successor to Blackwell, expected late 2026
- Vera CPU: NVIDIA's own ARM-based server processor
- NVLink 6: Next-generation interconnect for multi-GPU communication
- CX9 SuperNIC: Advanced networking for AI clusters
1NVIDIA AI Platform Roadmap:
22023: Hopper (H100/H200)
32024: Blackwell (B200/GB200)
42025: Blackwell Ultra (B300/GB300)
52026: Vera Rubin (R100)
62027: Vera Rubin Ultra
72028: Feynman (announced)
8
9Key: Each generation ~2x inference performanceDGX Spark and DGX Station: AI on Your Desk
Huang announced two "personal AI supercomputers":
DGX Spark ($3,999):
- Grace Blackwell processor
- 128 GB unified memory
- Up to 1 PFLOPS FP4 AI performance
- Runs on standard power outlet
- Target: Researchers, developers, small teams
DGX Station:
- Full GB300 Grace Blackwell
- 784 GB memory
- 20 PFLOPS FP4 performance
- Liquid cooled desktop form factor
- Target: AI labs, enterprises, serious research
1# DGX Spark can run frontier models locally:
2# - Llama 4 Scout (109B MoE): runs at ~30 tokens/sec
3# - DeepSeek R1 (671B MoE): runs at ~10 tokens/sec
4# - Mistral Large (123B): runs at ~25 tokens/sec
5# - Any model under 128B parameters at full precisionThis is transformative: for $4,000, researchers get hardware that was only available in datacenters a year ago.
NVIDIA Dynamo: AI Inference OS
NVIDIA announced Dynamo—an open-source inference serving framework:
1# NVIDIA Dynamo simplifies large model deployment
2# Handles: tensor parallelism, pipeline parallelism,
3# KV-cache management, request scheduling
4
5# Deploy a model with automatic optimization:
6dynamo serve --model meta-llama/Llama-4-Scout \
7 --tensor-parallel 2 \
8 --max-batch-size 256 \
9 --kv-cache-quantization fp8Key features:
- Disaggregated serving: Separate prefill and decode phases for efficiency
- Smart routing: Direct requests to optimal GPU based on KV-cache state
- Elastic scaling: Auto-scale GPU allocation based on demand
- Multi-model: Serve multiple models on the same GPU cluster
Isaac GR00T: Physical AI
Huang dedicated significant time to robotics with the Isaac GR00T platform:
- Foundation model for humanoid robots
- Simulation-to-real transfer via Omniverse
- Partners: Figure AI, Agility Robotics, Boston Dynamics, and 10+ companies
- Goal: Robots that learn from watching humans, not programming
The vision: every factory, warehouse, and eventually home has AI-powered robots, all running NVIDIA's software stack on NVIDIA's hardware.
NIM and NEMO: AI Microservices
NVIDIA's software ecosystem continues to expand:
| Service | Function |
|---|---|
| NIM (NVIDIA Inference Microservices) | Pre-optimized model containers |
| NEMO | Framework for custom model training |
| Guardrails | Safety and compliance for AI apps |
| Riva | Speech AI (ASR + TTS) |
| BioNeMo | Drug discovery models |
| cuOpt | Logistics optimization |
Financial Context
NVIDIA's AI dominance is reflected in its financials:
- Data center revenue: $35.6B (Q4 FY2025), up 93% YoY
- Market cap: ~$3.5 trillion (world's most valuable company)
- AI training market share: ~95% of AI training runs on NVIDIA GPUs
- Inference market share: ~80% and growing
The Competitive Landscape
| Competitor | Product | Status |
|---|---|---|
| AMD | MI325X / MI400 | Gaining share, especially in inference |
| Intel | Gaudi 3 | Struggling, limited traction |
| TPU v6 (Trillium) | Strong for internal use, limited external | |
| AWS | Trainium 2/3 | Growing for AWS-native workloads |
| Cerebras | WSE-3 | Niche but innovative (wafer-scale) |
| Groq | LPU Inference | Fastest inference, limited availability |
Despite growing competition, NVIDIA's CUDA ecosystem, software stack (NIM, NEMO, TensorRT), and supply chain relationships create a formidable moat.
Impact on AI Development
GTC 2025's message is clear: AI infrastructure is scaling from datacenter to desktop. With DGX Spark at $4K, AI development is no longer gated by access to cloud GPUs. Combined with open-source models like Llama 4 and DeepSeek R1, the barrier to AI development continues to fall.
Sources: NVIDIA GTC 2025, NVIDIA Blog, NVIDIA Developer


