Josh Albrecht, CTO of Imbue, and Jon Frankle, Chief AI Scientist of Databricks, dish on what it takes to train the largest models on the largest clusters... including fighting Infiniband Porch Pirates
The general large model DBRX used 3,072 H100 GPUs for training, while GPT-5 required about 50,000 H100s. Meta has stated that by the end of 2024, they expect to have computing power equivalent to 600,000 H100 GPUs. The training of Llama-3 involved 49,152 H100 GPUs.
The current demand for computing power in foundational large model training is immense, and the amount of computing power directly influences the level of intelligence.
In the computing power supply market, stability and abundant resource availability are crucial for providing services to a larger number of customers.
Is Jonathon’s hair still blue what’s the status
see video
things developed since
Sad
The general large model DBRX used 3,072 H100 GPUs for training, while GPT-5 required about 50,000 H100s. Meta has stated that by the end of 2024, they expect to have computing power equivalent to 600,000 H100 GPUs. The training of Llama-3 involved 49,152 H100 GPUs.
The current demand for computing power in foundational large model training is immense, and the amount of computing power directly influences the level of intelligence.
In the computing power supply market, stability and abundant resource availability are crucial for providing services to a larger number of customers.
> while GPT-5 required about 50,000 H100s
ooh, source?
This is the data estimated by Elon Musk.