Last week, the emergence of a previously unknown Chinese AI start-up undermined the assumptions of the US artificial intelligence (AI) community and sent shock waves through the stock market.
On 20 January, DeepSeek, an AI research lab that is an offshoot of a Chinese hedge fund, released its new R-1 model. The accompanying research paper showed that DeepSeek performed on par with OpenAI’s GPT-4 model, despite costing just $5.6mn to train. By comparison, GPT-4 cost more than $100mn. In response, Nvidia’s (US:NVDA) share price dropped almost a fifth and the S&P 500 was dragged down 2 per cent. Although some tech stocks have recovered, Nvidia remains down 17 per cent from a week ago.
The reason DeepSeek sent shockwaves through the market was that the US sector had reached the consensus view that the best way to develop AI was simply to increase the amount of data it was trained on. This meant more computing, more parameters and more energy.
This theory is called the ‘scaling law’. Its hypothesis was famously written about by popular technology blogger Gwern Branwen in 2020, when he noticed OpenAI’s GPT-3 model was an order of magnitude larger than GPT-2 but did not “run into diminishing or negative returns, as many expected”.
In fact, rather than getting worse with size, Gwern hypothesised that AI models would “become more powerful, more generalizable, more human-like when simply made very large and trained on very large datasets”. Almost two years after this post, OpenAI released ChatGPT and the AI race accelerated. The same year as Gwern’s observation, OpenAI’s chief executive, Sam Altman, wrote about the “power of scale”. He said: “The central learning of my career so far has been that you can and should scale up things that are working.”
It is therefore not surprising that the largest US technology companies jumped on the theory that bigger was better with their AI plans. For the past decade, the most successful internet companies have profited from ‘network effects’, when the value of a service increases as more people use it. Google, Facebook and Amazon (US:AMZN) all adopted this philosophy when building their core products and services.
Tech stocks saw the popularity of ChatGPT and invested heavily and quickly in the new technology. The ‘scaling law’ was particularly appealing to Microsoft (US:MSFT), Amazon, Meta (US:META) and Alphabet (US:GOOGL) because their cash flow and reserves allowed them to spend big and early to march ahead of rivals.
As the chart shows, since 2022 they have ramped up capex, while keeping research and development costs relatively flat, with the majority being spent on Nvidia’s graphics processing units (GPUs). Nvidia’s earnings growth has been remarkable. In the past two years, operating profit has increased more than 14 times to $84.6bn. It’s not just Nvidia that has profited, Arista Networks (US:ANET), which sells the networking equipment to connect chips within the data centres, has increased its operating profit by 113 per cent in the same period.
However, with the consensus view spreading that all that companies needed was more computing, valuations started to widen. Just before the recent sell-off, Nvidia’s forward price/earnings (PE) ratio hit 33, while Arista was up to 50 and Broadcom (US:AVGO) rose to 36. Even outside of AI and computing, companies are trying to profit from this narrative. In a recent earnings report, Solar panel manufacturer First Solar (US:FSLR) quoted a report from Boston Consulting Group, saying data-centre-driven energy demand is expected to increase by 15 per cent to 20 per cent annually through to 2030 and will drive a 3 per cent annual increase in total US energy consumption.
This belief in the power of scale has even reached the top of politics. Within a day of his inauguration, President Donald Trump proudly announced the Stargate joint venture, created by OpenAI, Softbank, Oracle (US:ORCL) and an Abu Dhabi fund to invest $500bn in AI infrastructure in the US by 2029. Unknown to Trump as he was making this announcement, less than 24 hours earlier DeepSeek had released its R-1 model and accompanying research paper.
The reaction from the technology and economics worlds has been extravagant. Venture capitalist Marc Andreessen called it ‘AI’s Sputnik moment, while MIT economist Olivier Blanchard said it was “probably the largest positive total factor productivity shock in the history of the world”.
Since the announcement, as well as Nvidia’s drop, Broadcom has fallen 11 per cent and Arista Networks has sunk 15 per cent. Meanwhile, utilities companies Vistra (US:VST) and NRG Energy (US:NRG) are down 9 per cent and 6 per cent, respectively.
The logic of the market is that, now that DeepSeek showed large improvements can be made without increasing computing power, big tech companies will cut back capex. Jefferies analyst Edison Lee believes the DeepSeek release could force US companies to “refocus on efficiency and return on investment, meaning lower demand for computing power as of 2026”.
Experts are debating whether the market’s response has been appropriate. TS Lombard analysts note that “cost advantages lead to more investment, not less”, citing air travel, renewable energy and consumer goods as examples, where lower costs have “consistently driven demand and capacity expansion”.
On Monday, Microsoft chief executive Satya Nadella waded into the discussion, when he posted on X that the “Jevons paradox strikes again”. This was a phrase coined by British economist William Stanley Jevons when he observed in the 19th Century that as machines became more efficient, they actually consumed more coal.
On Microsoft’s earnings call on Wednesday, Nadella expanded on this point, saying DeepSeek “has had some real innovations” and that the big beneficiaries of this would be customers as “there’ll be more apps”. As would be expected, he dismissed this as a problem for Microsoft, predicting that if smaller AI models can be run on personal computers it will increase demand for its products.
Last quarter, Microsoft spent another $15.8bn on capex, up 62 per cent year on year, and Nadella is showing little signs of wanting to slow down. “As AI becomes more efficient and accessible, we will see exponentially more demand,” he explained to analysts.
There are also doubts whether DeepSeek is being honest about how it created its model. OpenAI claims the Chinese company used the GPT-4 model to train DeepSeek. This is known as “distillation” and is when a company lets one AI ask questions of another to train itself. If this were true, it means DeepSeek piggybacked off the upfront computing expense of training GPT-4.
Scale AI chief executive Alexandr Wang also believes that, because of the US export restrictions, DeepSeek is lying about how many Nvidia GPUs it has. Talking to CNBC, Wang said he believes DeepSeek has “about 50,000 H-100s [Nvidia GPUs] which they can’t talk about”, rather than the 2,000 GPUs DeepSeek claims it used.
Regardless of how many GPUs are in China, DeepSeek has moved the AI industry forward. On an earnings call last week, Meta chief executive Mark Zuckerberg confirmed DeepSeek is a significant step and has done a “number of novel things” that Meta will implement into its system.
Compared with Nadella, Zuckerberg was a little more cautious about forecasting what this meant for Meta’s spending plans, saying it is “too early to have a strong opinion on what this means for the trajectory around infrastructure and capex”. He does still believe investing “very heavily” is going to be an advantage over time but admits Meta could change its plans.
Economics writer Noah Smith points out that it is irrelevant how DeepSeek got here, “they made it, and it works”. According to Smith, this draws into question how deep the “moats” around big tech’s AI businesses are. He argues that distillation makes it almost impossible to protect an algorithm. So even if they spend a lot to get ahead, it won’t take long for competitors to catch up.
The future of AI is still up for grabs, but what this sell-off shows is the market is not confident in the near-record high valuations. It reached a consensus that scale was essential. In this world, the billions spent would be an almost uncrossable moat. But suddenly, this doesn’t seem so certain.
TS Lombard is bullish on Nvidia’s prospects, but admits “investors may remain skittish” in the short term. When the market becomes convinced of an idea, valuations become stretched. But it doesn’t take much to put some cracks in that belief. The AI race has a lot further to go and no winners have been chosen.
The recent uproar in the United States over H-1B high-skilled work visas has exposed deep fissures within Donald Trump’s “Make America Great Again” moveme
An American tech consortium, led by Times Internet vice-chairman Satyan Gajwani and Palo Alto Networks CEO Nikesh Arora, has acquired a 49% stake in the London
The stunning revelation of the Federal Trade Commission's (FTC) decision to seek information from a Chinese e-commerce platform, Temu, to build an antitrust cas
The US should welcome China’s best scientific minds into its universities to compete with the mainland’s success in AI, American lawmakers in Washington hea