$2 H100s: How the GPU Rental Bubble Burst

Thanks, nice essay. Just a nit: your scenarios don't seem to reflect the full impact of electricity. An H100 takes nearly a kW of electricity, so because we're assuming full utilization, the $0.01/0.03/0.10 kWh charges can approximately be subtracted from the rental rate. In that case, looking at the $1/hr scenarios, I'd expect the 3-year revenue projections to differ by a few percent, not a small fraction of a percent.

Expand full comment

https://docs.google.com/spreadsheets/d/1kZosZmvaecG6P4-yCPzMN7Ha3ubMcTmF9AeJNDKeo98/edit?usp=sharing

Oct 11Edited

Author here: it is subtracted from the rental rates - you can run through the math in the full spreadsheet here (linked at the end of the article too)

Didn't go too deeply into the spreadsheet itself, cause the article was slowly entering "too much information" category

Expand full comment

Richard Yeh

Thanks! This is correct for the $4.50 rental rate, but needs to be fixed for the other ones. For example, your cell E26 should subtract E$10 (electricity fee), not E25 (IRR from previous scenario).

Expand full comment

Ah ur right, thats a mix up on my side - surprisingly the changes isn't as large as I panicked it would be (given how big of a mistake it is).

There are some feedback as well as using a lower IRR as a comparison against (10% is high), so will adjust and update

Expand full comment

Daniel Nishball

Oct 12

Hi Eugene, thanks for the article! I was wondering how you are thinking about the colocation cost ? I usually think about that as another opex item along with electricity cost.

Expand full comment

https://docs.google.com/spreadsheets/d/1Ft3RbeZ-w43kYSiLfYc1vxO41mK5lmJpcPC9GOYHAWc/edit?usp=sharing

Oct 13Edited

The capex items can be found listed here:

I added in a naive 50k per node for facility/colocation, as a capex item, leaving only electricity as opex. Which uses a 1 (gpu) * 1.2 (system overhead) * 1.2 (facility overhead) multiple

I was honestly split between capex-vs-opex. And for nearly all other server scenerios would have left it as opex.

But decided to roll it into capex, given how many of the larger clusters are being deployed into purpose built datacenters, or retrofitted datacenters / server rooms, with upgraded cooling and power. With substantial upfront costs.

I also have low faith the GPUs will last longer then 6-8 years considering the failure rates.

You folks probably have way more experience then me, in projecting the facility cost

Expand full comment

Daniel Nishball

Oct 14

Thanks for sharing the capex items!

For facility cost we typically think about it as an opex item as the capex for datacenter equipment and fit out will be done by the colocation operator and we always think of it as being split into different layers. In most cases they are indeed separate people owning the GPUs vs owning the datacenter but some are vertically integrated and own both. In this case - we still like to consider it opex - i.e. transfer pricing,.

Anyway - if I use $150/kW/mth over a 5y expected lifetime, I get $89k of costs over the 5 years - I also typically add some support engineers and misc direct costs of about $4k a year. But I think its best to convert this to opex to get a more accurate IRR. If you put it as a capex item it unfairly lowers your IRR because in reality you will be paying these over the 5y and not upfront.

H100s could be deployed into datacenters with not too high of a rack power density, you just put only one H100 per rack, so on H100s you many not always need a retrofit or purpose built datacenter.

Expand full comment

Dave Burstein

Very strong work. I have numerous confirming datapoints about GPU supply,

Expand full comment

Oct 11Edited

If you can elaborate (on datapoints), would love to hear more

Expand full comment

Ryan Cunningham

Excellent deep dive, well done Eugene. Really crisp insights

Expand full comment

Robert Feng

Oct 22

Another case. Last year, there was a 15% profit margin to get H100 into China. But this year, only 1% profit margin.

Expand full comment

Diego

Oct 18

Excellent article! Maybe a silly question, but looking at the H100 instances on AWS, Azure, and Google Cloud, the prices for H100 are still above $4 per GPU on 3-year commitments and over $10 on-demand. Is there any trade-off in using these resellers instead of the three big cloud providers, or is this IaaS commoditized? I’m thinking about this more to assess the ROI of cloud providers—should their prices converge to around $2 as well (given that the free market is clearing at this price), or are their services/features so different from the resellers that it is a different market?

Expand full comment

Abdullah Alzabin

Oct 17

Hi Eugene. Great essay, thanks! I'm new to your substack, so pardon my question if it's been answered already in another post. Any thoughts on inference economics, especially in light of chain-of-thought models like o1 and future open-source models that will do the same?

Expand full comment

Zach Magdovitz

Oct 16

Wildly interesting. Thanks for sharing.

Expand full comment

Keith Hayden

Oct 15

GPUs will become much more harder to accumulate from now on.

I wonder how that will affect AI development?

Expand full comment

Dickson Pau

Oct 13

Eugene, my question is if the demand for GPUs are supposedly falling because only a very small number of companies need them, and this number continues to decline, and all the demand is going to inferencing, then why is demand for Blackwell even higher? And why is Hopper demand still so strong in this current quarter??

Expand full comment