How do NVIDIA data center GPUs iterate after the release of ChatGPT?

Fri, 15 May 2026 19:58:51 +0800

Let’s first establish the date. ChatGPT’s public research preview version was released on November 30, 2022, not 2023. [1]

After this point, NVIDIA’s data center GPU main roadmap is quite clear: The conclusion of Ampere, followed by Hopper taking over. Hopper focused on expanding VRAM capacity and refresh rate, while Blackwell will shift its focus from “single-card dense compute power” toward “inference throughput, power consumption, and system-level interconnection.” The China-specific versions represent a different story: A800, H800, and H20 are fundamentally compliance versions created under US export control constraints, and therefore cannot be viewed using the same metrics as the global flagship line.

This only counts two lines:

Global Data Center Training/Inference Mainline: A100 as the control baseline, H100, H200, B200, B300.
China Dedicated Line: A800, H800, H20.

I didn’t include the L4, L40, L40S, and L2 in the main text either. It’s not that they aren’t important; it’s just that they are more related to the video inference, general inference, graphics, and virtualization line, and mixing them with the large model training main line of A100/H100/H200/B200 would confuse the pricing and performance metrics.

Look at the Main Storyline

To conclude: If we only look at the release cadence after November 30, 2022, H100 was the true starting point for the early generative AI explosion; H200 was a refresh card designed to “fill the memory gap”; B200 represents what truly constitutes a platform-level generational replacement, and B300 pushes Blackwell one step further into the inference and reasoning era.

Model	Release Date	Architecture	VRAM	VRAM Bandwidth	Interconnect	Official Performance Metrics
A100 80GB	Nov 2020, as comparative baseline	Ampere	80GB HBM2e	2.039 TB/s	NVLink 600 GB/s	BF16/FP16 Tensor Core 312 TFLOPS, INT8 624 TOPS [2]
H100 SXM	2022-03-22	Hopper	80GB HBM3	3.35 TB/s	NVLink 900 GB/s	BF16/FP16 1,979 TFLOPS, FP8 3,958 TFLOPS; DGX H100 single node 32 PFLOPS FP8, a 6x increase over DGX A100 [3][4]
H200 SXM	2023-11-13	Hopper Refresh	141GB HBM3e	4.8 TB/s	NVLink 900 GB/s	The focus provided by Nvidia is not doubled core compute power, but rather 1.9x inference for Llama2 70B and 1.6x for GPT-3 175B; relative to H100, it features larger and faster memory [5][6]
B200 SXM	202

What’s easy to misunderstand here is that the H200 is not a “brute-force double compute card”; rather, it is more like supplementary knowledge built upon the Hopper era. Once large model training and inference enter the stage of ultra-long contexts, massive KV caches, MoE (Mixture of Experts), and larger batch sizes, the bottleneck is no longer solely determined by the BF16 peak number, but rather by VRAM capacity and VRAM bandwidth. The H200 addresses this shortcoming.

The true generational leap happens with Blackwell. Blackwell is no longer just selling a single card; it is selling an entire suite of platform capabilities: new precision, interconnectivity, system-level bandwidth, inference cost, power efficiency, and rack-scale architecture. This is why when many sources discuss the B200, the single-card metrics are not as immediately understandable at a glance as those for the H100. This is because NVIDIA’s narrative focus has shifted from “how many TFLOPS does this card have” to “what size model can this entire system run and what will the cost be.”

Take Another Look at Chinese Specialty Lines

The China exclusive line must be viewed separately. Because its goal is not to defeat global flagship cards, but rather to stay below the export control red line while retaining commercial usability.

The most memorable sentence from this discussion is: A800 and H800 are more like “reducing interconnectivity,” while H20 involves “continued restrictions even on computing capability.”

Therefore, if someone only looks at the memory specification and draws the conclusion that “H20 is newer than H800, so it must be stronger,” this judgment is flawed. While H20’s 96GB HBM3 and 4.0 TB/s bandwidth look impressive, its very existence is predicated on meeting stricter export limitations. Its commercial goal is first to be sellable, and only secondly to maximize usability.

What are the upgrades compared to the previous generation?

First, let’s talk about computation methods:

\[ \text{Upgrade Rate}=\frac{\text{New Indicator}-\text{Previous Indicator}}{\text{Previous Indicator}} \]

However, this formula is only suitable for metrics with consistent definitions. VRAM, memory bandwidth, and NVLink bandwidth can be calculated directly; however, platform-level inference cost and whole machine throughput cannot be forced back into the single-card TFLOPS metric framework.

Global Key Themes

Generation	Biggest Change	Quantifiable Upgrade Magnitude
A100 80GB -> H100 SXM	Tensor Core and memory bandwidth rise together	Memory capacity 0%; memory bandwidth from 2.039 to 3.35 TB/s, approx +64.3%; NVLink from 600 to 900 GB/s, approx +50%; BF16/FP16 from 312 to 1,979 TFLOPS, approx +534.3% [2][3]
H100 SXM -> H200 SXM	Focus shifted to “larger and faster memory”	Memory from 80GB to 141GB, approx +76.3%; memory bandwidth from 3.35 to 4.8 TB/s, approx +43.3%; NVLink remains basically the same; BF16/FP8 peak metrics remain basically unchanged [3][6]
H200 SXM -> B200 SXM	Platform leap from Hopper to Blackwell	Memory from 141GB to 180GB, approx +27.7%; memory bandwidth from 4.8 to up to 8 TB/s, approx +66.7%; But the biggest change is FP4, 1.8 TB/s NVLink, and entire system and rack-scale inference efficiency [8][9]
B200 SXM -> B300 SXM	Blackwell Ultra pushes large memory and reasoning further forward	Memory from 180GB to 288GB, approx +60.0%; publicly reported memory bandwidth remains up to 8 TB/s; DGX B300 dense FP4 is 1.5 times better than DGX B200, with attention improving by 2x [10][11]

Upon reading through, a pattern emerges:

H100 is the generation that aggressively boosted single-card tensor computing power.
H200 is the generation focused on supplementing VRAM capacity.
B200 is the generation that transformed “training cards” into “AI factory infrastructure.”
B300 is the generation that more clearly pushes Blackwell towards reasoning and large-scale inference.

China Exclusive Line

Generational Gap	At first glance seems like an upgrade, but actually needs to be viewed separately	My judgment
A800 -> H800	If only looking at local HBM bandwidth, from the A100 generation to the H100 generation, this can roughly be understood as a +64% generational advancement.	But the core constraint is still interconnection, not single-card local memory.
H800 -> H20	Memory increased from 80GB to 96GB, approx. +20%; if based on common public parameters, bandwidth increased from 3.35 to 4.0 TB/s, approx. +19.4%.	This is not a pure upgrade. H20 is a compromise product due to greater compliance pressure and cannot simply be treated as an “H800 Plus.”

This also explains why the China special supply line is not suitable to frame as “how much every generation comprehensively improves.” This specific product line inherently carries compliance constraints; its design objective is not technical optimization, but commercial feasibility within regulatory constraints.

How Much Has the Price Actually Increased?

This section is most susceptible to inaccuracies/fabrication. Because NVIDIA rarely publicly discloses the single-card MSRP for data center GPUs, what is more commonly available in the public domain is:

DGX complete system price or third-party listed price for the entire system.
Channel quotes for China-special edition cards/GPUs.
Media, securities firm, or supply chain news.

Therefore, here I only provide “publicly traceable price samples,” rather than fabricating an official price list that looks complete but is actually inconsistent in its scope/criteria.

Object	Public Price Sample	Interpretation compared to previous generation
DGX H100	Official starting price of $199,000 at release on 2022-03-22 [4]	This is the cleanest official anchor point.
DGX H100	Market listing

So, concerning “how much the overall selling price increased,” I offer two conclusions:

First, the global flagship line has indeed risen significantly. Based on publicly comparable samples, the DGX B200 is roughly 40% to 50% more expensive compared to the DGX H100 listed during the same period. [19]

Second, China’s special allocation lines are not characterized by continuous price increases; rather, there might be a situation where “later released cards are cheaper.” The open quote price for the H20 eight-card server is approximately 30% lower than the H800 eight-card server. The reason is not ethical consideration (or conscience), but that the performance capability has been further compressed. [17]

Final Wrap-up

If I had to summarize the shifts in NVIDIA data center GPUs following the release of ChatGPT into a single sentence, my take is that:

The H100 was the catalyst for the generative AI boom; the H200 is a memory-oriented refresh, but the B200 is the true generational leap for the AI factory era, while the B300 clearly paves the way for the reasoning era. The China-specific lineup operates on an entirely different logic. It is not chasing flagships, but rather focusing on maintaining usability within regulatory gaps.

Do not view these two criteria together/conflate them. If you mix them, it is easy to draw conclusions such as “the new card has larger VRAM, therefore its generation is stronger,” or “the price is lower, therefore the cost-performance ratio is higher.” These are conclusions that seem superficially plausible but are based on incorrect premises.

References

Author’s Notes

Original Prompt

Compile a summary of NVIDIA GPU models and their corresponding performance parameters released since the launch of ChatGPT, detailing how much each generation upgraded from the previous one and the overall price increase. I specifically need data center GPUs, including special versions for China.

Summary of Writing Ideas

Fix the actual release date of ChatGPT to November 30, 2022, to avoid misalignment of timelines from the start.
Separate Nvidia data center GPUs into “Global Flagship Mainline” and “China Customized Line,” and do not force these two lines into a single generational upgrade history.
Percentage upgrades should only be calculated for directly comparable metrics, primarily VRAM, memory bandwidth, and interconnects.
For pricing, do not fabricate MSRP for individual cards; instead, only adopt official starting prices, complete system list prices, and Reuters channel quotes.
L4, L40S, and L2 are not expanded upon because they tend to mix the training mainline with the general inference/graphics line.

Brainstorming Expansion

Area	Inclusion Status in Main Text	Rationale
A100 as baseline	Included	The user asks for “previous generation vs. previous,” and without A100, it’s impossible to calculate the upgrade magnitude of H100.
L4, L40, L40S, L2	Rejected	Belongs to data center products, but is more oriented towards video/graphics and general inference, inconsistent with the main training price scope.
GB200, GB300 full system architecture	Partially Included	Used to explain why Blackwell starts emphasizing platform-level performance rather

China Exclusive Edition on Uncle Xiang's Notebook