VAST Data Unveils AI-Native Inference Platform for NVIDIA’s Next-Gen Agentic AI
The platform introduces a class of AI-native storage infrastructure designed for gigascale inference, built on NVIDIA BlueField-4 DPUs and Spectrum-X Ethernet networking.
VAST Data, the AI Operating System company, has announced a breakthrough inference architecture that powers the NVIDIA Inference Context Memory Storage Platform, marking a new era for long-lived, agentic AI. The platform introduces a class of AI-native storage infrastructure designed for gigascale inference, built on NVIDIA BlueField-4 DPUs and Spectrum-X Ethernet networking. It accelerates access to AI-native key-value (KV) caches, enables high-speed context sharing across nodes, and significantly improves power efficiency.
As AI inference evolves from single-prompt tasks to persistent, multi-turn reasoning across agents, the assumption that context remains local is no longer valid. Performance now hinges on how efficiently inference history can be stored, restored, reused, extended, and shared under sustained load, rather than simply on raw GPU compute power.
To address this, VAST is running its AI Operating System (AI OS) software natively on NVIDIA BlueField-4 DPUs. This embeds critical data services directly into GPU servers where inference occurs, as well as in dedicated data nodes. The architecture eliminates classic client-server contention and unnecessary data copies, reducing time-to-first-token (TTFT) even under high concurrency. Combined with VAST’s parallel Disaggregated Shared-Everything (DASE) design, each host can access a shared, globally coherent context namespace without bottlenecks, enabling seamless access from GPU memory to persistent NVMe storage over RDMA fabrics.
“Inference is becoming a memory system, not just a compute job. The winners won’t be the clusters with the most raw compute – they’ll be the ones that can move, share, and govern context at line rate. Continuity is the new performance frontier, and our AI OS on NVIDIA BlueField-4 turns context into fast, shared infrastructure that scales predictably for agentic AI.”
– John Mao, Vice President of Global Technology Alliances, VAST Data
Beyond performance gains, the VAST platform offers AI-native organizations and enterprises deploying NVIDIA AI factories a production-ready approach to inference coordination with security and efficiency. Teams can manage context with policies, isolation, auditability, and lifecycle controls, while keeping KV caches fast and usable as shared resources. This reduces GPU idle time, prevents costly infrastructure rebuilds, and ensures scalable performance as context sizes and session concurrency grow.
“Context is the fuel of thinking. Just like humans write things down to remember, AI agents need to save their work to build on it. Multi-turn and multi-user inferencing fundamentally changes how context memory is managed. VAST Data AI OS with NVIDIA BlueField-4 provides a coherent data plane for sustained throughput and predictable performance at scale.”
– Kevin Deierling, Senior Vice President of Networking, NVIDIA
VAST Data will showcase this AI-native approach at its inaugural user conference, VAST Forward, taking place February 24–26, 2026, in Salt Lake City, Utah. The event will feature technical sessions, hands-on labs, and certification programs for attendees to explore the future of AI and data infrastructure.

