Think about harnessing the ability of 72 cutting-edge NVIDIA Blackwell GPUs in a single system for the following wave of AI innovation, unlocking 360 petaflops of dense 8-bit floating level (FP8) compute and 1.4 exaflops of sparse 4-bit floating level (FP4) compute. Immediately, that’s precisely what Amazon SageMaker HyperPod delivers with the launch of help for P6e-GB200 UltraServers. Accelerated by NVIDIA GB200 NVL72, P6e-GB200 UltraServers present industry-leading GPU efficiency, community throughput, and reminiscence for growing and deploying trillion-parameter AI fashions at scale. By seamlessly integrating these UltraServers with the distributed coaching surroundings of SageMaker HyperPod, organizations can quickly scale mannequin improvement, scale back downtime, and simplify the transition from coaching to large-scale deployment. With the automated, resilient, and extremely scalable machine studying infrastructure of SageMaker HyperPod, organizations can seamlessly distribute huge AI workloads throughout hundreds of accelerators and handle mannequin improvement end-to-end with unprecedented effectivity. Utilizing SageMaker HyperPod with P6e-GB200 UltraServers marks a pivotal shift in direction of sooner, extra resilient, and cost-effective coaching and deployment for state-of-the-art generative AI fashions.
On this put up, we evaluation the technical specs of P6e-GB200 UltraServers, talk about their efficiency advantages, and spotlight key use instances. We then stroll although tips on how to buy UltraServer capability by way of flexible training plans and get began utilizing UltraServers with SageMaker HyperPod.
Contained in the UltraServer
P6e-GB200 UltraServers are accelerated by NVIDIA GB200 NVL72, connecting 36 NVIDIA Graceâ„¢ CPUs and 72 Blackwell GPUs in the identical NVIDIA NVLinkâ„¢ area. Every ml.p6e-gb200.36xlarge compute node inside an UltraServer contains two NVIDIA GB200 Grace Blackwell Superchips, every connecting two high-performance NVIDIA Blackwell GPUs and an Arm-based NVIDIA Grace CPU with the NVIDIA NVLink chip-to-chip (C2C) interconnect. SageMaker HyperPod is launching P6e-GB200 UltraServers in two sizes. The ml.u-p6e-gb200x36 UltraServer features a rack of 9 compute nodes absolutely linked with NVSwitch (NVS), offering a complete of 36 Blackwell GPUs in the identical NVLink area, and the ml.u-p6e-gb200x72 UltraServer features a rack-pair of 18 compute nodes with a complete of 72 Blackwell GPUs in the identical NVLink area. The next diagram illustrates this configuration.
Efficiency advantages of UltraServers
On this part, we talk about a few of the efficiency advantages of UltraServers.
GPU and compute energy
With P6e-GB200 UltraServers, you may entry as much as 72 NVIDIA Blackwell GPUs inside a single NVLink area, with a complete of 360 petaflops of FP8 compute (with out sparsity), 1.4 exaflops of FP4 compute (with sparsity) and 13.4 TB of high-bandwidth reminiscence (HBM3e). EveryGrace Blackwell Superchip pairs two Blackwell GPUs with one Grace CPU by way of the NVLink-C2C interconnect, delivering 10 petaflops of dense FP8 compute, 40 petaflops of sparse FP4 compute, as much as 372 GB HBM3e, and 850GB of cache-coherent quick reminiscence per module. This co-location boosts bandwidth between GPU and CPU by an order of magnitude in comparison with previous-generation cases. Every NVIDIA Blackwell GPU incorporates a second-generation Transformer Engine and helps the most recent AI precision microscaling (MX) knowledge codecs resembling MXFP6 and MXFP4, in addition to NVIDIA NVFP4. When mixed with frameworks like NVIDIA Dynamo, NVIDA TensorRT-LLM and NVIDIA NeMo, these Transformer Engines considerably speed up inference and coaching for giant language fashions (LLMs) and Combination-of-Consultants (MoE) fashions, supporting greater effectivity and efficiency for contemporary AI workloads.
Excessive-performance networking
P6e-GB200 UltraServers ship as much as 130 TBps of low-latency NVLink bandwidth between GPUs for environment friendly large-scale AI workload communication. At double the bandwidth of its predecessor, the fifth-generation NVIDIA NVLink offers as much as 1.8 TBps of bidirectional, direct GPU-to-GPU interconnect, tremendously enhancing intra-server communication. Every compute node inside an UltraServer will be configured with as much as 17 bodily community interface playing cards (NICs), every supporting as much as 400 Gbps of bandwidth. P6e-GB200 UltraServers present as much as 28.8 Tbps of whole Elastic Fabric Adapter (EFA) v4 networking, utilizing the Scalable Dependable Datagram (SRD) protocol to intelligently route community visitors throughout a number of paths, offering easy operation even throughout congestion or {hardware} failures. For extra data, confer with EFA configuration for a P6e-GB200 instances.
Storage and knowledge throughput
P6e-GB200 UltraServers help as much as 405 TB of native NVMe SSD storage, very best for large-scale datasets and quick checkpointing throughout AI mannequin coaching. For prime-performance shared storage, Amazon FSx for Lustre file techniques will be accessed over EFA with GPUDirect Storage (GDS), offering direct knowledge switch between the file system and the GPU reminiscence with TBps of throughput and thousands and thousands of enter/output operations per second (IOPS) for demanding AI coaching and inference.
Topology-aware scheduling
Amazon Elastic Compute Cloud (Amazon EC2) offers topology data that describes the bodily and community relationships between cases in your cluster. For UltraServer compute nodes, Amazon EC2 exposes which cases belong to the identical UltraServer, so that you’re coaching and inference algorithms can perceive NVLink connectivity patterns. This topology data helps optimize distributed coaching by permitting frameworks just like the NVIDIA Collective Communications Library (NCCL) to make clever selections about communication patterns and knowledge placement. For extra data, see How Amazon EC2 instance topology works.
With Amazon Elastic Kubernetes Service (Amazon EKS) orchestration, SageMaker HyperPod mechanically labels UltraServer compute nodes with their respective AWS Area, Availability Zone, Community Node Layers (1–4), and UltraServer ID. These topology labels can be utilized with node affinities, and pod topology spread constraints to assign Pods to cluster nodes for optimum efficiency.
With Slurm orchestration, SageMaker HyperPod mechanically permits the topology plugin and creates a topology.conf file with the respective BlockName
, Nodes
, and BlockSizes
to match your UltraServer capability. This manner, you may group and phase your compute nodes to optimize job efficiency.
Use instances for UltraServers
P6e-GB200 UltraServers can effectively practice fashions with over a trillion parameters as a consequence of their unified NVLink area, ultrafast reminiscence, and excessive cross-node bandwidth, making them very best for state-of-the-art AI improvement. The substantial interconnect bandwidth makes positive even extraordinarily massive fashions will be partitioned and skilled in a extremely parallel and environment friendly method with out the efficiency setbacks seen in disjointed multi-node techniques. This leads to sooner iteration cycles and higher-quality AI fashions, serving to organizations push the boundaries of state-of-the-art AI analysis and innovation.
For real-time trillion-parameter mannequin inference, P6e-GB200 UltraServers allow 30 instances sooner inference on frontier trillion-parameter LLMs in comparison with prior platforms, reaching real-time efficiency for advanced fashions utilized in generative AI, pure language understanding, and conversational brokers. When paired with NVIDIA Dynamo, P6e-GB200 UltraServers ship important efficiency positive factors, particularly for lengthy context lengths. NVIDIA Dynamo disaggregates the compute-heavy prefill part and the memory-heavy decode part onto completely different GPUs, supporting unbiased optimization and useful resource allocation throughout the massive 72-GPU NVLink area. This permits extra environment friendly administration of enormous context home windows and high-concurrency purposes.
P6e-GB200 UltraServers supply substantial advantages to startup, analysis, and enterprise clients with a number of groups that must run various distributed coaching and inference workloads on shared infrastructure. When used together with SageMaker HyperPod task governance, UltraServers present distinctive scalability and useful resource pooling, so completely different groups can launch simultaneous jobs with out bottlenecks. Enterprises can maximize infrastructure utilization, scale back total prices, and speed up mission timelines, all whereas supporting the advanced wants of groups growing and serving superior AI fashions, together with huge LLMs for high-concurrency real-time inference, throughout a single, resilient platform.
Versatile coaching plans for UltraServer capability
SageMaker AI at the moment provides P6e-GB200 UltraServer capability by way of flexible training plans within the Dallas AWS Local Zone (us-east-1-dfw-2a
). UltraServers can be utilized for each SageMaker HyperPod and SageMaker training jobs.
To get began, navigate to the SageMaker AI training plans console, which features a new UltraServer compute kind, from which you’ll choose your UltraServer kind: ml.u-p6e-gb200x36 (containing 9 ml.p6e-gb200.36xlarge compute nodes) or ml.u-p6e-gb200x72 (containing 18 ml.p6e-gb200.36xlarge compute nodes).
After discovering the coaching plan that matches your wants, it’s endorsed that you simply configure no less than one spare ml.p6e-gb200.36xlarge compute node to ensure defective cases will be shortly changed with minimal disruption.
Create an UltraServer cluster with SageMaker HyperPod
After buying an UltraServer coaching plan, you may add the capability to an ml.p6e-gb200.36xlarge kind occasion group inside your SageMaker HyperPod cluster and specify the amount of cases that you simply wish to provision as much as the quantity accessible throughout the coaching plan. For instance, should you bought a coaching plan for one ml.u-p6e-gb200x36 UltraServer, you possibly can provision as much as 9 compute nodes, whereas should you bought a coaching plan for one ml.u-p6e-gb200x72 UltraServer, you possibly can provision as much as 18 compute nodes.
By default, SageMaker will optimize the position of occasion group nodes throughout the similar UltraServer in order that GPUs throughout nodes are interconnected throughout the similar NVLink area to attain the most effective knowledge switch efficiency on your jobs. For instance, if you are going to buy two ml.u-p6e-gb200x72 UltraServers with 17 compute nodes accessible every (assuming you configured two spares), then create an occasion group with 24 nodes, the primary 17 compute nodes will probably be positioned on UltraServer A, and the opposite 7 compute nodes will probably be positioned on UltraServer B.
Conclusion
P6e-GB200 UltraServers assist organizations practice, fine-tune, and serve the world’s most bold AI fashions at scale. By combining extraordinary GPU assets, ultrafast networking, and industry-leading reminiscence with the automation and scalability of SageMaker HyperPod, enterprises can speed up the completely different levels of the AI lifecycle, from experimentation and distributed coaching by way of seamless inference and deployment. This highly effective answer breaks new floor in efficiency and adaptability and reduces operational complexity and prices, in order that innovators can unlock new prospects and lead the following period of AI development.
In regards to the authors
Nathan Arnold is a Senior AI/ML Specialist Options Architect at AWS based mostly out of Austin Texas. He helps AWS clients—from small startups to massive enterprises—practice and deploy basis fashions effectively on AWS. When he’s not working with clients, he enjoys climbing, path operating, and taking part in along with his canines.