🐋 How DeepSeek slayed the AI giants
Violating the narrative for all the major players in AI
Beyond Big Budgets: The Quiet Disruption of Capital-Efficient AI
Since 2024, the AI narrative has been dominated by a narrative: Only Big Tech, with billion-dollar budgets and GPU arsenals (Nvidia’s historic climb), could compete in foundational AI. This belief, fueled by OpenAI’s AGI billion dollar fundraising spectacles and Meta’s $65B pledges, have convinced many investors that to compete, you have to either compete at a planetary scale or not compete at all.
Yet as we enter 2025, DeepSeek’s trajectory, achieving frontier performance at a fraction of the cost, echoes several of my warnings
The next decade of AI won’t be won through brute force computation alone, success will come from teams that combine technical ingenuity with capital efficiency.
We will be going through the how, and my predictions moving forward.
Deepseek comparison table with Leading Models
How DeepSeek does it: Efficiency through specialization
Deepseek’s rise to the forefront was not done overnight, instead its success emerged from a year of deliberate, thoughtful, incremental, specialization across two critical domains:
Specialization in mixture-of-experts architecture
GPU compute optimization (forged under the constraints of hardware sanctions)
This strategic focus mirrors a broader trend among Chinese tech players, from Tencent and Alibaba, to smaller startups (like StepFun), all compelled to innovate efficiency-first solutions in response to limited access to H100-class GPU’s.
Unable to match the 100k GPU clusters of Meta and other western tech giants, they have turned necessity into ingenuity, narrowing the gap with Big Tech which has been demonstrated by DeepSeek.
1) Mixture-of-experts: Scale smart, not just scaling up
Traditional transformers, like Llama, activate 100% of their parameters for every token generated. Mixture-of-expert architectures flip this script. Instead of making use of one large model, they are a network of smaller specialized models, of which only a portion of them gets activated at any point of time. Reducing the cost of compute for both training and inference.
MoE mirrors a principle of human cognition: where we typically only use a fraction, say 10%, of the brain’s capacity for a single task at any specific point in time. Shifting resources as needed depending on the task
(or in the AI case, the output token).
At Recursal, we validated this efficiency firsthand with Flock of Finches: RWKV-6 MoE matching the performance of our dense model while reducing activate compute to approximately 20%. This demonstrated how MoE can amplify capability without proportionally increasing the inference cost.
For DeepSeek V3, 671 Billion parameter model, this translates to a 5% compute cost, by using only 9 out of 256 experts. Making it, despite its larger size, more efficient to train and run than even Llama 405B, which has a smaller total parameter count but makes use of 100% of its parameters.
Yet MoE’s advantages come with challenges, Training these models correctly is significantly harder to get right in comparison to a full transformer model. A trial-and-error process which only grows more complex with each added expert.
Which is part of the reason for DeepSeek’s iterative process from 64 experts in its 16B Model (Jan 2024) scaling up to 164 experts in v2 to 256 in v3 today with improved efficiency in each iteration.
Resulting into successful implementation’s of MoE systems with several orders of magnitude higher expert counts then most western labs (typically at most ~8 experts)
This reduction is what allowed them to reduce the training cost from over $100-50M in traditional transformers training to under $10M with MoE.
2) Optimizing GPU efficiency: Squeezing compute from sanctioned GPU lemons
Another major advancement made, goes beyond how the model was designed at a high level and focuses more on extracting maximum value from constrained resources and GPU utilization.
In short: it’s getting 2x more work done with the same hardware.
While we can go into deep technical details on each one of them…
Its less about a single innovation, but compounded improvements made where they transform sanctioned H800 clusters into 2x-more-efficient workhorses than unrestricted H100 systems running conventional AI frameworks.
Some of these techniques were developed and have been tested over time in earlier versions and were mentioned as a contributing factor. When layered atop MoE’s 5-10x cost reductions, the total efficiency multiplier reaches 10-20x.
We intentionally provide a range, to avoid debating the accuracy of numbers, focusing more on the order of magnitude of change.
Less a sudden revelation, more of an incremental build up
For the past 1 year, the top 2 teams in China have been incrementally making all these improvements, while iterating on their own respective techniques over multiple models both big and small over time. With gradual increment at every step. All while keeping training costs to a low budget.
The end result was more of an incremental upgrade every 2 months approximately.
Over the past year, China’s two leading AI teams have pursued a disciplined strategy of continuous improvement, refining their distinct methodologies across models of varying scales, big and small. By adopting a gradual approach, they implemented steady, measurable improvements at each development cycle, all while operating with budget constraints. This iterative process culminated in a pattern of consistent progress, with upgrades getting rolled out approximately every two months.
So it was a surprised for no one who has been watching. But a surprise for everyone being ignorant in the progress outside of the big labs.
Prediction 1: Big labs loses moat of scale
The key takeaway from Deepseek's success is the clear evidence that we do not need hundreds of millions in compute budgets to train O1-level models. While there may be some reporting discrepancies in the actual capital and resources spent, Deepseek's story demonstrates that the effective cost of building and training models could be significantly lower than what the major players—Meta, Microsoft, and Google—would have us believe.
These tech giants consistently push the narrative that model development is an overwhelmingly capital-intensive endeavor, creating a perceived barrier to entry for smaller players. Given the resource intensive nature of large transformer models.
A narrative the market adopted, in assuming that the resource-intensive development pathway is the only way to lead in this AI race.
While members of these companies, actively advising investors, against investing into smaller alternative startups.
This strong belief, leads to strong resistance of ideas, that challenge the notion. Stifling innovation that would have came from smaller nimbler teams (A belief which fools not only themselves, but their investors)
While at the same time, this benefits the larger players as they become the concentration of AI investments, and pushing their valuations upwards into trillion’s of dollars. Into an oligopoly.
An oligopoly, not so different from electricity providers (Nvidia) and Incandescent light bulb (large AI companies), constantly lobbying against new technological improvements for alternatives (like LED’s) for their own benefits.
However smaller companies like Deepseek and open source project alternatives to transformers like RWKV (which I work on) challenge the notion that AI development demands insurmountable resources.
As they focus on efficiencies, and lowering the barriers of entry.
Open model builders are rapidly positioning themselves as attractive investment vehicles for institutional investors. By offering a compelling alternative to the centralized dominance of big labs, they enable investors to diversify their bets and reduce over-reliance on a handful of AI giants.
This shift not only democratizes innovation but also creates a more competitive and resilient AI ecosystem. With alternative investment opportunities.
Prediction 2: NVIDIA’s hardware monopoly is cracking and that’s great for everyone
The industry stands at an inflection point. What began as NVIDIA’s unchallenged dominance in training infrastructure is rapidly evolving into a more nuanced landscape where inference capabilities take center stage. This shift isn’t merely technical, it represents a fundamental restructuring of the AI economy. While training does remain critical, new players e.g. DeepSeek allow smaller clusters to train models competitively, eroding the necessity for hyperscale infrastructure. NVIDIA’s historic premise of bigger clusters equal better results is showing its cracks.
As enterprises deploy AI in their organizations and applications demand for dedicated inference platforms is surging. Challengers like AMD, Groq and Cerebras are capitalizing here, offering tailored solutions where alternatives perform as well or even better.
This shift signals a structural change, the value chain is fragmenting.
Training, once the exclusive domain of NVIDIA hyperscalers is eroding,
while inference, the next growth driver, is no longer exclusive to NVIDIA.
For developers and investors, this spells opportunity as the market is no longer bottlenecked by scarcity but instead a more dynamic and accessible economy where innovation can thrive in abundance.
Prediction 3: Raise of Open models
Open-models will drive faster iteration and innovation than any single corporate lab by unlocking community collaboration. Meta’s Llama models demonstrated this potential but DeepSeek’s MIT-licensed R1, which open-sources both its model weights and technical innovations. Democratizing AI accessibility to more companies and communities.
Its not just llama, deepseek, or qwen, and a whole new wave of open models which will be competing with each other. With the commercial incentive on adoption for their API platform (qwen) or consumer applications (llama) or both (deepseek).
To accelerate the growth of open-model development, resources and capital must be made accessible to as many contributors as possible. However, one of the biggest pain points for any AI developer that develops his own application today is “How can I run and test the latest open-models without the pain and cost of setting up my own infrastructure?”
Here is where platforms that handle the complex infrastructure (GPU management, model deployment, access to a wide range of models) become a necessity in the development cycle.
These platforms are doing for AI what cloud computing did for web development by simplifying the technical overhead and reducing the costs by 10-100x for any developer. Just like what platform’s like featherless.ai, which we built at recursal, have done.
As AI continues to reshape industries, the platforms that makes open source power more accessibility represent a strategic opportunity for forward-thinking investors to capitalize on the expanding ecosystem of open models.
Closing words
Investors betting on AI’s next decade must ask: Are we funding another GPU arms race led by Big Labs, or do we look at open ecosystems where efficiency and ingenuity coexist. The next decade of AI won’t be won through brute force computation alone, success will come from teams that combine technical ingenuity with capital efficiency.
While capital will always play a role, DeepSeek’s achievements signal a fundamental shift in the opportunity landscape. The most promising investments may not be in replicating existing approaches at scale, but in identifying teams that can leverage efficiency innovations to create outsized impact with limited resources.
We’re entering an era where innovation and efficiency aren’t just nice-to-have features, they’re essential competitive advantages that will determine which projects succeed in bringing AI’s benefits to the broader market.
If you’re an investor interested in exploring these opportunities or a team building with these principles, I’d welcome connecting to discuss how we can collaborate in shaping this more efficient, accessible future. Reach out to me on linked in or twitter (@picocreator) to start the conversation.
🖖🚀
Eugene
CEO @ recursal.AI
Acknowledgment & References
Several parts of this article, including the research was done by the content team at featherless
Data points are sourced from
- https://arxiv.org/pdf/2412.19437v1
- openai.com/index/gpt-4-research/
- wired.com/story/openai-ceo-sam-altman-the-age-of-giant-aimodels-is-already-over/
- api-docs.deepseek.com/quick_start/pricing
- help.openai.com/en/articles/7127956-how-much-does-gpt-4-cost
- https://arxiv.org/pdf/2407.21783
- apxml.com/posts/training-cost-deepseek-v3-vs-llama-3
- together.ai/pricing