Here’s What DeepSeek’s Stunning Innovation Means for the AI Ecosystem
For just a moment Silicon Valley’s digital titans must have felt unassailable as they enjoyed their status as guests of honour at the inauguration of the new US administration. The next day, leading lights of the tech sector including OpenAI’s Sam Altman and Larry Ellison founder of Oracle shared a stage with The Donald, along with Softbank and MGX. They announced The Stargate Project, a massive initiative to secure Silicon Valley’s place at the epicentre of the AI world, accompanied by an extraordinary price tag – $US500bn.
Their moment in the sun lasted barely six days.
Even as they were sharing canapes with the paparazzi at the Inauguration Ball, the thunderclouds were all fully formed just waiting for that first burst of lightning to presage the storm. Weeks before, on Boxing Day, a little-known Chinese hedge-fund-owned AI business released DeepSeek-V3, an open-source Mixture-of-Experts (MoE) language model.
There was no fanfare and little comment.
But what ultimately set the hares running, and the market tanking, began coincidently on Inauguration Day, with the January 20 release of DeepSeek’s R1 reasoning engine, which it made freely available via the development platform Hugging Face.
The chatter increased throughout that week and really took off over the weekend at about the time Marc Andreessen, General Partner of Andreessen Horowitz — and tech sector royalty — called DeepSeek’s R1 a gift to the world and described its release as AI's Sputnik moment.
When markets opened on Monday morning last week, all hell broke loose as investors absorbed the news that DeepSeek beat out OpenAI, Anthropic, and Meta in independent third-party benchmarks. And it did so despite having vastly underspent its rivals, outlaying dimes to the American’s dollars.
The exact level of DeepSeek's investment in R&D is contested by its US rivals who appeared stunned into disbelief. But even Anthropic’s CEO Dario Amodei, in a column trying to sow doubt about DeepSeek's actual infrastructure investment, acknowledged, “All of this is not to say that DeepSeek V3 is not a unique breakthrough or something that fundamentally changes the economics of LLMs.”
So how did DeepSeek do it, and what does it mean?
Constrained by US trade sanctions, DeepSeek built a reasoning model on older Nvidia GPUs and used a lot less of them. Whereas its US rivals might use over 15,000 GPUs off the top shelf, DeepSeek needed barely 2,000 from the bargain basement.
But it also did something much cleverer. It used reinforcement learning – which in simple terms means its model learns by interacting with an environment and getting feedback based on its actions. Traditionally, AI models have been trained using supervised learning, where they learn from large datasets containing examples of correct answers.
DeepSeek‘s approach allowed R1 to develop sophisticated problem-solving abilities independently.
It did all this using open-source software. And then it shared a paper with the AI community describing how it did it, so they could do it too.
Article continues below.
Like this article? Join the thousands of tech founders, board members and investors who subscribe to our free monthly newsletter, Tech Round-Up. Sign up below!
Implications
Looking beyond the fury of the stockmarket's immediate reaction, what are the likely long-term implications?
For Nvidia, which was on the receiving end of what Bloomberg called the greatest rout in market history, the outlook may not be nearly as bad as the sea of blood-red that Monday suggested.
While DeepSeek’s R1 model demonstrates that advanced AI capabilities can be achieved using fewer and less powerful GPUs, former Intel chief Pat Geslinger suggested that making computing more efficient and cost-effective could expand the market for AI applications, potentially benefiting companies like Nvidia in the long run (noting that Intel is virtually out of this race…).
Likewise, there’s a case to be made that the efficiency demonstrated by DeepSeek's R1 could lower barriers to entry for AI development, leading to broader adoption across various industries. That could increase overall demand for GPUs, albeit possibly favouring mid-range over high-end models.
Some of those assessments likewise hold true in the data centre world. And certainly the world’s major data centre owners are suddenly watching closely. However, any rapid recalibration away from the vast investment levels driven by the hyperscalers might suggest a potential decrease in the need for expansive data centre infrastructure traditionally required for AI model training. Or at least a re-evaluation of data centre design priorities which are currently focused on managing the challenge of dispersing the extraordinary heat generated by clusters of very high-end Nvidia GPUs.
The jury is out on whether DeepSeek’s success challenges existing projections of power demand growth driven by AI workloads. On the one hand, some analysts suggest the anticipated surge in energy consumption for data centres may need to be reassessed. On the other, an article just released by MIT Technology Review says that "New figures show that if the model’s energy-intensive ‘chain of thought’ reasoning gets added to everything, the promise of efficiency gets murky."
Then of course there’s the even murkier issue of AI ethics and safety, with Forbes recently reporting that cybersecurity firm KELA “Was able to jailbreak DeepSeek's model, enabling it to produce malicious outputs, including ransomware development, instructions for creating toxins and the fabrication of sensitive content”.
And all pick and shovel stocks? Firms associated with data centre construction and operation might now come under closer scrutiny as well.
A Blowtorch on the AI Wolfpack
AI’s first-generation giants like OpenAI, Anthropic and Perplexity are also likely to face a tougher reception when they go cap in hand to the market to raise funds.
Despite claims to the contrary, they were clearly blindsided by DeepSeek’s success and punished by investors who are likely questioning their business practices, their bloat, and their ability to compete with Chinese innovation and talent.
And the story got worse for them as the week went on, although not many people noticed.
Lost in the noise of the market reaction on Monday (US time) was the fact that on the same day that prices were tanking in response to R1, DeepSeek dropped another open-source rival to the US AI hegemony. This time it was a competitor to OpenAI’s Dall E, and once again DeepSeek’s platform performed strongly in independent third-party benchmarks, besting the incumbent.
Then on Tuesday last week, Alibaba revealed its Qwen2.5-Max model which not only beat GPT-4o and Claude-3.5 Sonnet but also outperformed DeepSeekR1!
What Moat?
What DeepSeek’s approach suggested is that the competitive moat companies like OpenAI and Anthropic believed they had – the huge investment required to build and train LLMs – might actually resemble more of a scratch in the sand than an uncrossable ocean.
Its achievement also revealed that decades' worth of Chinese investment in education is paying dividends and the country has technical capability equal to, and perhaps in some areas, beyond that of the West.
And it confirmed once again the power of the open-source community to challenge and transform markets.
To that end, Mark Zuckerberg must be enjoying a moment. Unlike his rivals at OpenAI and Anthropic, he made his bet early and big on open source.
Dario Amodei can try and sow doubts about DeepSeek’s engineering costs all he likes, and Sam Altman can complain that DeepSeek stole OpenAI's IP (and you would need a reasoning engine worth billions of dollars to process that level of chutzpah). But as Meta’s former Business Engineering Director and co-founder of the Centre of AI Leadership, Georg Zoeller pointed out on LinkedIn, “Every model will now backport what DeepSeek did and anyone going forward will do it on that base.”
The game may just have changed forever.