State of the Art: Training >70B LLMs on…

Jun 25, 2024

Josh Albrecht, CTO of Imbue, and Jon Frankle, Chief AI Scientist of Databricks, dish on what it takes to train the largest models on the largest clusters... including fighting Infiniband Porch Pirates

Listen →

6 Comments

Nathan Lambert

Jun 25, 2024

Is Jonathon’s hair still blue what’s the status

Expand full comment

Reply (1)

Latent.Space

Jun 26, 2024

see video

things developed since

Expand full comment

Reply (1)

Nathan Lambert

Jun 26, 2024

Sad

Expand full comment

Meng Li

Jul 3, 2024

The general large model DBRX used 3,072 H100 GPUs for training, while GPT-5 required about 50,000 H100s. Meta has stated that by the end of 2024, they expect to have computing power equivalent to 600,000 H100 GPUs. The training of Llama-3 involved 49,152 H100 GPUs.

The current demand for computing power in foundational large model training is immense, and the amount of computing power directly influences the level of intelligence.

In the computing power supply market, stability and abundant resource availability are crucial for providing services to a larger number of customers.

Expand full comment

Reply (1)

Latent.Space

Jul 5, 2024

> while GPT-5 required about 50,000 H100s

ooh, source?

Expand full comment

Reply (1)

Meng Li

Jul 6, 2024

This is the data estimated by Elon Musk.

Expand full comment