The Initial Shock
Just months ago, China seemed far behind the frontier AI advances being made in the United States. Two new models from DeepSeek have shattered that perception: Its V3 model matches GPT-4’s performance while reportedly using just a fraction of the training compute. Its R1 reasoning model—akin to OpenAI’s o1 introduced last September—appears to match OpenAI’s o1 at a fraction of the cost per token.
Some have suggested that DeepSeek’s achievements diminish the importance of computational resources (compute). That narrative may be compelling, but it is misleading. If anything, these efficiency gains have made access to vast computing power more crucial than ever—both for advancing AI capabilities and deploying them at scale.
What DeepSeek’s emergence truly changes is the landscape of model access: Their models are freely downloadable by anyone. If Chinese companies continue to develop the leading open models, the democratic world could face a critical security challenge: These widely accessible models might harbor censorship controls or deliberately planted vulnerabilities that could affect global AI infrastructure.
A Close Look at DeepSeek’s Costs
One number that shocked analysts and the stock market was that DeepSeek spent only $5.6 million to train their V3 large language model (LLM), matching GPT-4 on performance benchmarks. While this appears dramatically lower than reported estimates for GPT-4’s training costs, two important caveats apply. First, the comparison is not apples-to-apples: U.S. companies have never publicly disclosed their actual training costs. When CEOs refer to staggering costs in the hundreds of millions of dollars, they likely include a more exhaustive view—hardware acquisition, staffing costs, and research expenses. In contrast, DeepSeek only reported the cost of the final training run, excluding crucial expenses like preliminary experiments, staffing, and the massive initial investment in hardware.
Second, V3’s efficiency improvement is not surprising. Algorithmic advances alone typically cut training costs in half every eight months, with hardware improvements driving additional efficiency gains. Using current cloud compute prices and accounting for these predictable advances, a final training run for a GPT-4-level model should cost around $3 million today. That means DeepSeek’s efficiency gains are not a great leap, but align with industry trends.
The story of DeepSeek’s R1 model might be different. This reasoning model—which thinks through problems step by step before answering—matches the capabilities of OpenAI’s o1 released last December. Since early 2024, DeepSeek has made significant strides in reasoning, particularly excelling at mathematical problem-solving. Its public release provides the first look into the details of how these reasoning models work. What is notable is that DeepSeek offers R1 at roughly four percent the cost of o1. While such improvements are expected in AI, this could mean DeepSeek is leading on reasoning efficiency, although comparisons remain difficult because companies like Google have not released pricing for their reasoning models.
Given all this context, DeepSeek’s achievements on both V3 and R1 do not represent revolutionary breakthroughs, but rather continuations of computing’s long history of exponential efficiency gains—Moore’s Law being a prime example. To be sure, direct comparisons are hard to make because while some Chinese companies openly share their advances, leading U.S. companies keep their capabilities private. Still, for those closely watching the field, DeepSeek’s improvements follow expected patterns.
Why Compute Actually Still Matters
Counterintuitively, DeepSeeks advances make compute more important, not less.
Here is why. Recreating existing capabilities requires less compute, but the same compute now enables building far more powerful models with the same compute resources (this is called a performance effect). When OpenAI, Google, or Anthropic apply these efficiency gains to their vast compute clusters (each with tens of thousands of advanced AI chips), they can push capabilities far beyond current limits. Indeed, if DeepSeek had had access to even more AI chips, it could have trained a more powerful AI model, made certain discoveries earlier, and served a larger user base with its existing models—which in turn would increase its revenue.
Second, new models like DeepSeek’s R1 and OpenAI’s o1 reveal another crucial role for compute: These “reasoning” models get predictably better the more time they spend thinking. As AI systems take on worker-like roles, compute capacity could directly determine both how many AI workers can be deployed and how skilled each one is.
Leading in Open Models, A Potential Security Concern
DeepSeek does highlight a new strategic challenge: What happens if China becomes the leader in providing publicly available AI models that are freely downloadable? That would not directly generate revenue for DeepSeek, but it creates soft power. More importantly, it raises serious national security concerns.
DeepSeek’s downloadable model shows fewer signs of built-in censorship in contrast to its hosted models, which appear to filter politically sensitive topics like Tiananmen Square. Most current censoring happens through additional filtering tools after the model generates its output. However, the downloadable model still exhibits some censorship, and other Chinese models like Qwen already exhibit stronger systematic censorship built into the model. As these models gain widespread adoption, the ability to subtly shape or restrict information through model design becomes a critical concern. What if such models become the foundation of educational systems worldwide?
Furthermore, DeepSeek presents at least two types of potential “backdoor” risks. The first is traditional security vulnerabilities, like remote code execution (as demonstrated in PyTorch incidents). The second, and more subtle, risk involves behaviors embedded within the model itself—what researchers call “sleeper agents.” Research from U.S. company Anthropic shows that a model could be designed to write secure code most of the time but insert subtle vulnerabilities when used by specific organizations or in specific contexts. Once a backdoor is present in a model, it becomes extremely difficult to detect or remove—even with extensive safety testing. Traditional red-teaming often fails to catch these vulnerabilities, and attempts to train away problematic behaviors can paradoxically make models better at hiding their backdoors.
The Path Forward
These developments force the United States to confront two distinct challenges. First, when efficiency improvements are rapidly diffusing the ability to train and access powerful models, can the United States prevent China from achieving truly transformative AI capabilities? Second, how can the United States manage the security risks if Chinese companies become the primary suppliers of open models? Policymakers should consider three priorities in response to DeepSeek:
First, strengthen rather than abandon export controls. While DeepSeek shows that determined actors can achieve impressive results with limited compute, they could go much further if they had access to the same resources of leading U.S. companies.
Second, restrict the integration of Chinese open models into critical U.S. systems and infrastructure. Just as the government tries to manage supply chain risks in tech hardware, it will need frameworks for AI models that could harbor hidden vulnerabilities. The U.S. Framework for Artificial Intelligence Diffusion already requires validated end users to cut ties with intelligence and military actors from untrusted countries. Under some interpretations, this requirement could extend to prohibiting the hosting of these models.
Finally, there is a critical gap in AI security research. Without better tools to detect backdoors and verify model safety, the United States is flying blind in evaluating which systems to trust. This security challenge becomes particularly acute as advanced AI emerges from regions with limited transparency, and as AI systems play an increasing role in developing the next generation of models—potentially cascading security vulnerabilities across future AI generations.