NVIDIA H100 Tensor Core GPU

Provide excellent performance, scalability and security for various data centers.

Speed the quantity of calculations。

Through NVIDIA H100 Tensor Core GPU achieve excellent performance, scalability and security in each workload. use Nvidia^® NVLink^®Switch system, connecting as many as 256 H100 will accelerate the workload The engine is to handle trillion parameter language models. Compared with the previous generation of products, the comprehensive technical innovation of H100 can increase the speed of large language models 30 times, providing the industry's leading dialogue AI.

Are you ready to welcome the company AI?

Enterprises have adopted AI to become the mainstream, and corporate organizations need to end AI's ready -based infrastructure, accelerate its own pace of moving towards the new era.

H100 suitable for mainstream servers With five -year NVIDIA AI ENTERPRISE software kit subscription (including corporate support), it can be simplified with strong performance AI adoption. This ensures that the organization can access the AI required for the construction of an AI workflow that built H100 acceleration Frames and tools, such as AI chat robots, recommendation engines and visual AI.

Safety accelerate workload from enterprise level to 10 billion -level scale。

Provide up to 9 times for large models AI training speed

Multi -expert model (395 billion parameters)

Forecasting performance may change. based on 1T token data collection training with 395 billion parameters (MOE) Transformer Switch-XXL variant | A100 cluster: HDR IB network | H100 Cluster: NVLINK SWITCH system, NDR IB

Change AI training.

H100 is equipped with the fourth generation Tensor Core and Transformer engine (FP8 accuracy), compared with the previous generation of products, can be multi -expert The (MOE) model provides 9 times higher training speed. By combined, it can provide 900 GB/S GPU The fourth -generation NVLINK, cross -node accelerate the NVLink of each GPU communication Switch system, PCIe 5.0, and NVIDIA Magnum IO ™ software, for small enterprises to large -scale unified unified unified large -scale uniform The GPU cluster provides efficient scalability.

The H100 GPU at the data center -level deployment can provide excellent performance and enable all researchers to easily use the new generation of billions of billions (Exascale) AI with high performance calculation (HPC) and trillions of parameters.

Real -time deep learning reasoning。

AI It is using a series of extensive neural networks to solve a series of business challenges with a wide range of business challenges. Excellent AI The reasoning accelerator must not only provide extraordinary performance, but also use generalization to accelerate these neural networks.

H100 further expands the market leading position of NVIDIA in the field of reasoning, and many advanced technologies can improve the speed of reasoning 30 times and provide ultra -low delay. The fourth generation Tensor Core can accelerate all accuracy (including FP64, TF32, FP32, FP16 and INT8). Transformer engine can be used in combination FP8 and FP16 Adventure, reducing memory occupation and improving performance, while still maintaining the accuracy of large language models.

AI reasoning performance of oversized models improves Gundam 30 times.

Megatron chat robot reasoning (530 billion parameters)

Project Performance Subject to Change. Inference on Megatron 530B Parameter Model Chatbot for input Sequence Length = 128, output sequence length = 20 | A100 cluster: HDR IB network | H100 Cluster: 16 H100 | 32 A100 comparison 16 comparison of NDR IB networks H100 (1 second and 1.5 seconds) | Comparison of 16 A100 and 8 H100 (2 Second)

HPC The performance of the application is as high as 7 Times.

Ten billion -level high -performance calculation.

Nvidia The performance of the data center platform has continued to improve, surpassing the law of Moore. H100's new breakthrough AI Performance has further strengthened HPC+AI The power of accelerate scientists and researchers allows them to devote themselves to work and solve major challenges facing the world.

H100 increases the dual -precision Tensor Core's floating -point operation (FLOPS) 3 Double, provide HPC with a FP64 floating -point operation of 60 Teraflops. Fusion AI The high -performance computing application can be achieved using the TF32 accuracy of H100 1 petaflop The throughput of throughput, so as not to change the code, realize the single -precision matrix multiplication operation.

H100 also uses DPX instructions, its performance to NVIDIA A100 Tensor Core GPU 7 times high, in dynamic programming algorithms (for example, for DNA sequence comparison Smith-Waterman) The server that uses traditional dual-road CPUs is faster than 40 Times.

Accelerate data analysis.

In AI During application development, data analysis usually consumes most of time. The reason is that large datasets are scattered on multiple servers, which are only equipped with commercial CPU The server formed a horizontal extended solution to lack scalable computing performance, which is in trouble.

Acceleration server equipped with H100 can provide corresponding computing power and use NVLINK and NVSWITCH each GPU 3 TB/S The memory bandwidth and scalability are based on high -performance response data analysis and supporting huge data sets through extension. Use by combining NVIDIA Quantum-2 Infiniband, Magnum IO software, GPU acceleration Spark 3.0 andNVIDIA RAPIDS^™，NVIDIA The data center platform can accelerate these large workloads with excellent performance and efficiency.

Increase resource utilization for enterprises.

IT Manager manages the utilization rate (peak and average value) of data center computing resources to a greater extent. They usually adjust the computing resources reasonably by dynamically reorganizing, thereby meeting the needs of the workload that is being processed.

The second -generation multi -instance GPU (MIG) technology in H100 uses each GPU securely every GPU Divided into 7 independent instances, to increase each GPU to a greater extent Use. With confidential computing support, the H100 can achieve the safe use of multiple -to -end tenants, which is very suitable for cloud service providers (CSP) Environment.

Using H100 that supports MIG technology, the infrastructure administrator can use its GPU The accelerated infrastructure is standardized, and at the same time, it can flexibly be the GPU The resource allocation is more finely granularity, so as GPU resources are fully utilized.

Built -in confidential calculation.

Today's confidential calculation solution is based on The CPU, for AI and HPC and other computing dense workloads, these solutions are far from meeting demand. Nvidia The confidential calculation is the built -in security function of the NVIDIA Hopper ™ architecture. This function makes H100 Become a accelerator with a confidential calculation function launched by NVIDIA. User can be obtained While the H100 GPU excellent acceleration function, it protects the confidentiality and completeness of data and applications in use. It creates a hardware -based trusted execution environment (Tee), used to protect and isolate multiple H100 GPUs or a single in a single H100 GPU, multiple H100 GPUs in the node The entire workload runs on the mig instance. In Tee, GPU Accelerating the operation of applications can remain unchanged and does not need to be partitioned. Users can apply to AI and HPC's NVIDIA software's powerful features and NVIDIA The combination of the safety of the hardware trust root provided by confidential computing.

For large -scale AI And high -performance calculations provide excellent performance.

Hopper Tensor Core GPU will provide support for NVIDIA Grace Hopper CPU+GPU architecture. The TB -level acceleration calculation can be constructed, which can provide 10 times the performance for large AI and HPCs. Nvidia Grace CPU uses ARM^® The flexibility of the architecture to create a CPU And the server architecture, this architecture is designed for accelerated calculations. Hopper GPU is matched with Grace CPU, using NVIDIA's super fast chip interconnection technology, which can be provided 900GB/s bandwidth is 7 times faster than PCIE 5.0. Compared with the fastest -running server today, this innovative design will The GPU's aggregate system memory bandwidth increases by 30 times, and the application performance of the application of trillion -byte data will increase 10 times.

View product specifications

Appearance	H100 SXM	H100 PCIe
FP64	34 teraFLOPS	26 teraFLOPS
FP64 Tensor Core	67 teraFLOPS	51 teraFLOPS
FP32	67 teraFLOPS	51 teraFLOPS
TF32 Tensor Core	989 teraFLOPS*	756teraFLOPS*
BFLOAT16 Tensor Core	1979 teraFLOPS*	1,513 teraFLOPS*
FP16 Tensor Core	1979 teraFLOPS*	1,513 teraFLOPS*
FP8 Tensor Core	3958 teraFLOPS*	3026 teraFLOPS*
INT8 Tensor Core	3958 TOPS*	3026 TOPS*
GPU Memory	80GB	80GB
GPU Bandwidth	3.35TB/s	2TB/s
Decoder	7 NVDEC 7 JPEG	7 NVDEC 7 JPEG
Maximum hot design power (TDP)	700 W (configurable)	300-350 W (configurable)
Multi -instance GPU	Up to 7 mig @each10GB
Appearance	SXM	PCIe Double slot air -cooled style
Interconnection	NVLink：900GB/s PCIe 5.0：128GB/s	NVLink：600GB/s PCIe 5.0：128GB/s
Server option	NVIDIA HGX^™H100 partners and 4 or 8 GPUs NVIDIA certification system^™ ，Equipped with 8 GPU NVIDIA DGX^™ H100	Equipped with 1 to 8 GPU partnership system and NVIDIA certification system
NVIDIA AI Enterprise	Additional component	Contain

* Use sparse technology display. Without the use of sparse technology, the specifications are reduced by half.

Preliminary product specifications. Changes may change.