News

NVIDIA Unveils Next-Generation H100 GPUs, Its Most Powerful Chips Yet

March 28, 2022

784

Powering the Next Wave of AI Data Centers, NVIDIA Today Announced Its Next-Generation Accelerated Computing Platform Powered by the NVIDIA Hopper Architecture , which makes it possible to accomplish an order of magnitude leap in performance over its predecessor.

The new architecture is named after Grace Hopper, a pioneering computer scientist in the United States, and replaces the NVIDIA Ampere architecture, which was introduced two years ago.

The company also announced its first Hopper-based GPU, the NVIDIA H100 , which features 80 billion transistors. The world’s largest and most powerful accelerator, H100 is packed with revolutionary features like a revolutionary transformer engine and a highly scalable NVIDIA NVLink® interconnect to process gigantic AI language models, deep recommendation systems, genomics, and complex digital twins.

“Data centers are becoming AI factories, processing and refining mountains of data to produce intelligence,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA H100 is the world’s AI infrastructure engine that companies use to accelerate their AI-powered businesses.”

H100 Technological Innovations

The NVIDIA H100 GPU sets a new standard for accelerating AI and HPC at scale, enabling six transformative innovations:

The world’s most advanced chip: H100 was designed with 80 billion transistors using a cutting-edge TSMC 4N process, designed for NVIDIA’s accelerated computing needs. It offers meaningful advances to accelerate AI, HPC, memory bandwidth, networking, and communication, including 5 terabytes per moment of external connectivity. H100 is the first GPU to support PCIe Gen5 and the first to use HBM3, enabling 3TB/s reminiscence bandwidth. Twenty H100 GPUs can handle the equivalent of the world’s Internet traffic, enabling customers to deliver advanced recommendation systems and large language models that run inference on real-time data.
New Transformer Engine – Transformer is one of the most important deep learning models ever, making it the standard mannequin for natural language processing. The H100 Accelerator Transformers engine was built to speed up these networks by up to 6 times compared to the preceding generation, without losing an iota of accuracy.
Secure 2nd Gen Multi-Instance GPU: MIG technology allows a unmarried GPU to be partitioned into seven smaller, fully loney instances to handle different types of work. The Hopper architecture extends MIG capabilities up to 7 times compared to the preceding generation, offering secure multi-tenant configurations in cloud environments and on each GPU instance.
Confidential Computing: H100 is the world’s first accelerator with confidential computing capabilities to protect AI models and customer data as they are processed. Customers can also apply confidential computing to federated learning for industries where privacy is important, such as healthcare and financial services, as well as shared cloud infrastructures.
NVIDIA NVLink 4th Gen: To accelerate larger AI models, NVLink is combined with a new external NVLink Switch to extend NVLink as a scale-up network beyond the server. This enables up to 256 H100 GPUs to be connected with 9x higher bandwidth compared to the previous generation using NVIDIA HDR Quantum InfiniBand.
DPX Instructions: New DPX instructions speed up dynamic programming, used in a wide variety of algorithms such as path optimization and genomics, by up to 40 times compared to CPUs and up to 7 times compared to GPUs of the previous generation. This includes the Floyd-Warshall algorithm, for finding optimal routes for fleets of autonomous robots in dynamic warehouse environments, and the Smith-Waterman algorithm, which is used in sequence alignment for DNA and protein sorting and folding.

The H100’s combined technology innovations extend NVIDIA’s leadership in AI training and inference to enable real-time, immersive applications using giant-scale AI models. The H100 will enable chatbots to use the world’s most powerful monolithic transformer language mannequin, Megatron 530B , with up to 30 times higher transfer rate than the previous generation, while meeting the subsecond latency required to conversational AI in real time. H100 also enables researchers and developers to train enormous models like the Combination of Experts, with 395 billion parameters, up to 9 times faster, reducing training time from weeks to days.

Wide adoption of NVIDIA H100

NVIDIA H100 can be deployed in every type of data center, including on-premises, cloud, hybrid cloud instances, and the edge. It is expected to be available worldwide later this year from leading cloud service providers and PC manufacturers, as well as directly from NVIDIA.

DGX H100, the DGX system NVIDIA’s fourth-generation processor features eight H100 GPUs to deliver 32 petaflops of AI performance with new FP8 precision, providing the scale to meet the huge computing requirements of large language models, recommendation systems, health and climate science discoveries.

Each GPU in DGX H100 systems is connected using fourth-generation NVLink technology, providing 900 GB/s connectivity, 1.5 times faster than the previous generation. NVSwitch allows all eight H100 GPUs to associate via NVLink. A new outside NVLink Switch can connect up to 32 DGX H100 nodes on NVIDIA DGX SuperPOD supercomputers next generation.

Hopper has broad industry support through main cloud service providers Alibaba Cloud, Amazon Web Services, Baidu AI Cloud, Google Cloud, Microsoft Azure, Oracle Cloud , and Tencent Cloud, which plan to offer H100-based instances.

A wide variety of H100-based servers are expected from the world’s main PC manufacturers including Atos, BOXX Technologies, Cisco, Dell Technologies , Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise , Inspur, Lenovo, Nettrix and Supermicro.

NVIDIA H100 at all scales

H100 is offered in SXM and PCIe forms to meet a wide variety of server design requirements. A converged accelerator will also be available, combining an H100 GPU with an NVIDIA ConnectX®-7 400Gb/s InfiniBand SmartNIC and Ethernet .

NVIDIA H100 SXM is Available on HGX Server Boards H100 in four-way and eight-way configurations for businesses with applications that scale to multiple GPUs in one server and across multiple servers. HGX H100-based servers offer the highest application performance for AI training and inference, along with data analytics and HPC applications.

The H100 PCIe, with NVLink to put through two GPUs, provides over 7 times the bandwidth of PCIe 5.0, delivering exceptional performance for applications running on mainstream enterprise servers. Its format facilitates integration into the existing infrastructure of the data center.

H100 CNX , a new converged accelerator, combines an H100 with a ConnectX-7 SmartNIC to provide breakthrough performance for I/O-intensive applications such as multi-node AI training in enterprise data centers and 5G signal processing in the edge.

The H100 can also be combined with the NVIDIA Grace CPU using ultra-fast NVLink®-C2C interconnect for 7x faster communication between CPU and GPU, compared to PCIe 5.0. This combination, the Grace Hopper Superchip , is an embedded module designed to serve large-scale HPC and AI applications.

NVIDIA Software Support

The NVIDIA H100 GPU is supported by powerful software tools, enabling developers and businesses to build and accelerate applications from AI to HPC. This includes major updates to the NVIDIA AI software suite for workloads such as voice, recommendation systems, and hyperscale inference.

NVIDIA also released more than 60 updates to its collection of CUDA-X libraries , tools, and technologies. to develop accelerated applications such as quantum computing and 6G research, cybersecurity, genomics, and drug discovery.

Availability

NVIDIA H100 will be available from the third quarter.