
Chip manufacturer AMD and cloud service provider Oracle jointly announced at the Oracle AI World conference that they will significantly expand their long-term and cross-generation cooperation to help customers significantly expand AI capabilities and related deployment plans. Building on years of joint innovation, Oracle Cloud Infrastructure (OCI) will be the first partner to deliver a public AI supercluster powered by AMD Instinct MI450 series GPUs, with an initial deployment of 50,000 GPUs in Q3 2026, with continued expansion planned in 2027 and beyond.
AMD stated that this cooperation is based on the previous joint efforts between AMD and Oracle to provide end customers with the AMD Instinct GPU platform on OCI, starting with the launch of computing shapes (shapes) equipped with AMD Instinct MI300X in 2024, and will be further expanded to the current official launch of OCI Compute equipped with AMD Instinct MI355X GPU. These services will be deployed in Zettascale-level OCI superclusters.
As new generation AI models continue to surpass the limits of existing AI clusters, the market demand for large-scale AI computing capacity is accelerating. To train and run these workloads, customers require flexible and open computing solutions with extreme scale and efficiency. The new AI supercluster planned by OCI will be powered by AMD's "Helios" rack design, combining AMD Instinct MI450 series GPUs, a new generation of AMD EPYC CPUs code-named "Venice", and a new generation of AMD Pensando advanced network technology code-named "Vulcano". This vertically optimized rack-level architecture aims to provide ultimate performance, scalability, and energy efficiency for large-scale AI training and inference.
Mahesh Thiagarajan, executive vice president of Oracle Cloud Infrastructure, said, “Our customers are building the world’s most forward-looking AI applications and require powerful, scalable and high-performance infrastructure. By combining AMD's latest processor innovations, OCI's secure and flexible platform, and advanced networking technologies powered by Oracle Acceleron, customers can push the envelope with confidence. Through our 10-year cooperation with AMD from EPYC to AMD Instinct accelerator, we will continue to work together to provide the most cost-effective, open, secure and scalable cloud foundation for the AI era to meet customers' needs for new era AI.
Forrest Norrod, AMD executive vice president and general manager of the Data Center Solutions Group, pointed out that AMD and Oracle continue to lead the innovation pace of cloud AI. With AMD Instinct GPUs, EPYC CPUs, and advanced AMD Pensando networking technology, Oracle customers gain powerful new capabilities for training, fine-tuning, and deploying next-generation AI applications. AMD and Oracle are working together to accelerate AI with an open, optimized and secure system built for large-scale AI data centers.
AMD emphasized that computing models equipped with Instinct MI450 series GPUs aim to provide high-performance and flexible cloud deployment options and have extensive open source support. This provides customers with an ideal foundation to run today's most advanced language model, generative AI and high-performance computing (HPC) workloads.
With AMD Instinct MI450 series GPUs on OCI, customers will benefit from:
1. Breakthrough computing and memory technology: By increasing the memory bandwidth of AI training models, it helps customers accelerate results output, handle more complex workloads, and reduce the need for model segmentation. Each AMD Instinct MI450 series GPU will provide up to 432 GB of HBM4 memory and 20 TB/s of memory bandwidth, allowing customers to fully train and infer models 50% larger than the previous generation in memory.
2. AMD’s optimized “Helios” rack design: Through a high-density, liquid-cooled design of 72 GPU racks, customers can optimize performance density, cost and energy efficiency in large-scale deployments. The AMD “Helios” rack design integrates UALoE scale-up connectivity and Ethernet-based scale-out networking based on Ultrafast Ethernet Consortium (UEC) standards to minimize cluster-to-rack latency and maximize throughput.
3. Powerful front-end nodes: By adopting the new generation AMD EPYC CPU code-named “Venice”, it enhances job orchestration and data processing, helping customers maximize cluster utilization and simplify large-scale workflows. In addition, EPYC CPUs will provide confidential computing capabilities and built-in security features to protect sensitive AI workloads end-to-end..
4. DPU-accelerated converged networking: Improve large-scale AI and cloud infrastructure performance and strengthen security posture with wire-speed data ingestion. Based on fully programmable AMD Pensando DPU technology, DPU-accelerated converged networking delivers the security and performance data centers need to run next-generation AI training, inference and cloud workloads.
5. AI scale-out network: Allows customers to take advantage of ultra-high-speed distributed training and optimized collective communications with a future-proof open network architecture. Each GPU can be equipped with up to three 800 Gbps AMD Pensando “Vulcano” AI Network Cards (AI-NICs), providing customers with lossless, high-speed, programmable connectivity supporting advanced RoCE and UEC standards.
6. Innovative UALink and UALoE network architecture: Help customers effectively extend workloads, alleviate memory bottlenecks, and orchestrate large multi-terabyte parameter models. This scalable architecture minimizes transmission hops and latency without routing through the CPU, and enables direct and hardware-consistent network interconnection and memory sharing between GPUs in the rack through the UALink protocol transmitted by the UALoE network architecture. UALink is an open high-speed interconnect standard specifically designed for AI accelerators and is supported by a wide range of industry systems. As a result, customers will enjoy the flexibility, scalability and reliability they need to run the most demanding AI workloads on an infrastructure based on open standards.
7. Open source AMD ROCm software stack: By providing customers with an open and flexible programming environment that covers mainstream frameworks, libraries, compilers, and execution environments, it enables rapid innovation, provides freedom of vendor choice, and simplifies the migration of existing AI and HPC workloads.
8. Advanced partitioning and virtualization: Through fine-grained GPU and cluster partitioning, SR-IOV virtualization technology, and powerful multi-tenant capabilities, customers can allocate GPUs according to workload needs and share cluster resources safely and efficiently.
AMD also emphasized that in order to provide customers with more choices for large-scale construction, training and inference of AI, OCI simultaneously announced the official launch of OCI Compute equipped with AMD Instinct MI355X GPU. These services will be available in Zettascale-grade OCI superclusters that scale to 131,072 GPUs. The computing model equipped with AMD Instinct MI355X has high cost performance, cloud deployment flexibility and open source compatibility.