The Four Wars of the AI Stack (Dec 2023 Recap)
The Data Wars, The War of the GPU Rich/Poor, The Multimodality War, The RAG/Ops War. Also: our usual highest-signal recap of top items for the AI Engineer from Dec 2023!
We’re putting the “news” back in “newsletter”: This is the fifth in our new monthly recaps for supporters - Nov 2023 is archived. See the full list here.
ALSO: we have recorded an audio podcast discussion of this available now! thanks also to NLW of the AI Breakdown for featuring us, and then having us back on the AI Breakdown.
As we turn the 2023rd page of human history1, the main theaters of conflict among AI startups and stakeholders have become apparent:
We saw all elements of these in December 2023:
The Data War - with OpenAI announcing a partnership with Axel Springer (see also its deal with the AP and its Data Partnerships program), the NYT bringing a lawsuit on OpenAI demanding shutdown of all GPTs, and Apple now offering $50m for data contracts with publishers. Meanwhile there is an undeniable rise in interest in synthetic data both at NeurIPS and at Deepmind.
The GPU/Inference War - with the price per million Mixtral output tokens starting at ~$2 and rapidly racing down to $0.27 in a week (details below), and fresh benchmark drama between Anyscale and other inference providers. New research into new model architectures (Mamba, RWKV), and moving compute off Nvidia (Modular, tinycorp, Apple MLX) make more out of existing GPU resources.
The Multimodality War - with Midjourney soft-launching v6, a web UI, and now reported making >$200m/yr, Assembly AI raising a $50m Series C for “building the Stripe for AI models”, Replicate (historically Stable Diffusion-centric) raising a $40m Series B to serve AI Engineers, and Suno AI coming out of stealth and returning to monkey - all steady point solution improvements while OpenAI and Google continue work on God Models that compete with all of them at once.
The RAG/Ops War - the debate on whether you need a Vector DB, vs power users adopting new vector DBs like turbopuffer; the debate between LangChain (now at v0.1, with TED talk and State of AI survey) vs LlamaIndex (now with Step-Wise Agent Execution); and continuing LLMOps developments (HumanLoop’s new .prompt file, Openlayer) vs framework-driven tooling like LangSmith and new approaches like Martian (who announced their $9m seed).
The above “wars” are selected for being essential components in the “AI stack” where major money is being made and deployed2. As it should be, the final battle is always for the end user, and the industry marked many milestones there too, with Glean (former guest!) raising at a $2b valuation for enterprise search, Perplexity (raising from Jeff Bezos!) raising at a $520m valuation for consumer search, and Harvey raising at $715m for legal assistance. The funding market for each of these market leading vertical AI startups appears to be around 70x ARR, a sharp premium over "insane" SaaS rounds at 20x valuation. As the industry matures, the multiple compression will be brutal, but not as brutal as the infra market is right now…
Mixtral sparks a GPU/Inference Race to the Bottom
The GPU/Inference War had been brewing all year, with Together, Fireworks, Perplexity, and others raising hundreds of millions to build up infrastructure businesses - and the release of Mixtral was the spark that set everything aflame. We’ve seen all of the major inference players slashing prices, undercutting Mistral’s own la plateforme pricing by as much as 70% overnight.
Keep reading with a 7-day free trial
Subscribe to Latent Space to keep reading this post and get 7 days of free access to the full post archives.