Author: Ada, Shenchao TechFlow
San Francisco, San Jose Convention Center, GTC on site.
NVIDIA's Chief Scientist Bill Dally sits on stage opposite Google's Jeff Dean. Halfway through their conversation, Dally throws out a number: “Previously, transplanting a standard cell library containing about 2,500 to 3,000 cells required a team of 8 engineers about 10 months.”
He paused for a moment.
“Now it only takes a single GPU, running overnight.”
There were no gasps from the audience, as those who understood this statement knew what it meant. The work of 8 engineers over 10 months was consumed overnight by one of their own GPUs. Moreover, Dally added that the results achieved matched or even exceeded human designs in terms of area, power consumption, and latency.
The next day, news interpreted it as "NVIDIA designs GPUs using AI."
But the truth of the matter is much more thought-provoking than the headline.
What is NVIDIA running internally?
NVIDIA is not running a black box internally; it is using a set of toolchains honed over several years.
NB-Cell is a program based on reinforcement learning specially focused on the arduous task of standard cell library migration. Prefix RL aims to solve the long-standing research problem of placement in the carry-forward stage of the forward-looking chain. Dally stated that the layouts generated by this system are “something a human could never conceive,” and compared to human designs, key metrics improved by about 20% to 30%.
Additionally, there are two internal LLMs, Chip Nemo and Bug Nemo. NVIDIA has fed the RTL code, architectural documents, and design specifications of every GPU in history into these two large models. According to Dally, this is akin to distilling NVIDIA's muscle memory from G80 to Blackwell into an internal model, allowing newcomers to interface directly with the expertise of seasoned engineers built over twenty years.
So, has “AI been able to design GPUs”?
Quite the opposite. Dally's original words were: “I really wish that one day I could just say ‘design me a new GPU,’ but we are still far from that step.”
NVIDIA has not used AI to design GPUs. But it has accomplished another feat that will make it impossible for the entire industry to operate without it in the future.
$2 Billion Stake in EDA Stronghold
On December 1, 2025, NVIDIA invested $2 billion in Synopsys, one of the three giants of EDA. The two parties signed a joint development agreement, embedding NVIDIA's accelerated computing stack into the entire EDA workflow of Synopsys, with Blackwell and the next-generation Rubin GPU to be deeply integrated with Synopsys.ai.
The status of Synopsys needs explanation. Every advanced process chip globally, including Apple’s M series, AMD’s MI series, and Google’s TPU, almost all rely on Synopsys or Cadence's toolchains during the design phase. These two, along with Siemens EDA, monopolize the underlying tools for chip design. You might avoid using Qualcomm's chips or TSMC's production lines, but you can't escape the software of these three.
Three months after taking a stake in Synopsys, NVIDIA also brought in Cadence, Siemens, and Dassault, announcing that they were all developing AI-driven chip design tools based on NVIDIA GPUs.
The benchmark data released by NVIDIA is quite startling: Synopsys PrimeSim on Blackwell is 30 times faster, Proteus is 20 times faster, and Sentaurus on B200 accelerates 12 times compared to CPUs. MediaTek accelerated Cadence Spectre to six times faster with H100. Astera Labs accelerated chip validation 3.5 times using Synopsys + NVIDIA.
One detail worth highlighting: Cadence's Millennium M2000 platform is marked as “exclusively designed for the EDA market, solely based on NVIDIA Blackwell.”
The word “exclusive” is particularly noteworthy. This means that EDA tools that previously ran on CPUs, where both Intel and AMD could play, will now require NVIDIA's cards for the fastest EDA performance.
The True Shape of the Flywheel
The version of NVIDIA's flywheel most people understand is as follows: sell GPUs to AI companies, AI companies train large models, large models prove GPUs are irreplaceable, leading to more purchases of GPUs.
This flywheel is already frightening. But there is another layer beneath it.
NVIDIA uses its own tools to design the next generation of GPUs, creating a generational gap in design efficiency, while tying the entire industry's EDA toolchain to its own hardware. Competitors want to catch up, but they have to rent even the tools to chase from NVIDIA's ecosystem.
The anxiety hidden behind AMD's earnings report that triggered a significant stock drop is exactly this layer. Even though NVIDIA and Synopsys publicly state that “the investment does not entail any obligation to purchase NVIDIA hardware,” the market is well aware: the accelerated EDA functionalities are first released on NVIDIA's hardware, and AMD and Intel can only rely on a path “optimized for the largest competitor's platform.”
Imagine if AMD's engineers want to design a chip to rival Blackwell. When they open Synopsys's tools, this tool runs fastest on NVIDIA GPUs. They will either have to endure a design cycle that is twice as slow or buy a bunch of NVIDIA cards to design a chip meant to defeat NVIDIA.
The shovel is still being sold. But the method of selling it has changed.
The True Situation of Domestic GPUs
At this point, it is necessary to present a group of sobering numbers.
In the same year that NVIDIA's net profit surpassed $70 billion in fiscal year 2025, the domestic GPU "Four Little Dragons" — Moore Threads, Muxi, Birun, and Suipyuan — are queued up at the IPO window.
The prospectus of Moore Threads shows a cumulative net loss of 5 billion yuan from 2022 to 2024, with an additional loss of 271 million yuan in the first half of 2025, totaling an unrecouped loss of 1.478 billion yuan by June 30. The company's management predicts that it will achieve consolidated profitability at the earliest in 2027. Muxi fared slightly better, with cumulative losses exceeding 3 billion yuan over three years. The worst off is Birun, which has lost over 6.3 billion yuan in three and a half years, with revenue of only 58.9 million yuan in the first half of 2025, not even approaching Moore Threads's 702 million yuan during the same period.
Looking at the intensity of R&D investment, Moore Threads had R&D expenses accounting for 2,422.51% of revenue in 2022, which still remains at a high of 309.88% in 2024. The amount spent on R&D in one year is more than three times its revenue. This is not business management; it is life support through continuous funding from the primary market and the recently opened Sci-Tech Innovation Board window.
At the tool level, the bottleneck is even more severe. Huada Jiutian's 2022 IPO prospectus shows that its tools only partially support the 5nm advanced process. Galun Electronics can cover the 7nm/5nm/3nm nodes, but only makes point tools and cannot be considered a full-flow solution.
Huada Jiutian's founder, Liu Weiping, stated candidly: “Domestic EDA's support for advanced processes is still significantly lacking, especially for current 7nm, 5nm, and 3nm. Currently, domestic EDA can reach the level of 14nm, although the 7nm process technology is mastered, it still requires the entire industry chain to work together for deep integration with practical applications.”
This means that comprehensive EDA for advanced processes is basically unusable domestically. Domestic GPU companies still use Synopsys and Cadence to design chips. In 2025, when Trump announced a temporary export control on all key software, although it did not materialize, EDA tools for advanced processes below 7nm remain under strict control. The timing of when licenses will be revoked is in others' hands.
The capital market's reaction is surreal. On the day of Muxi's listing, the stock price closed at 829.9 yuan, soaring 692.95% in a single day. Moore Threads's stock briefly rose to the third highest in A-shares following Kweichow Moutai and Cambrian, with some media calculating its total market value at approximately 359.5 billion yuan based on the stock price at that time.
The real business behind the numbers is that a group of companies, still burning cash and incurring losses, which still rely on regulated foreign toolchains to continue designing chips, are being priced in the secondary market as successors to “domestic NVIDIA.”
And the tools these companies use to design chips are becoming part of NVIDIA's ecosystem. The $2 billion tie between NVIDIA and Synopsys and Cadence's label “exclusively based on NVIDIA Blackwell” turns catching up into a paradox.
A Complete Chain from Design to Manufacturing
Returning to that discussion at GTC.
Dally was very humble throughout the session. “AI is still far from being able to design chips on its own,” a statement NVIDIA has made for four or five years. But the way it is framed changes every year. Four years ago it was "AI can assist in design," three years ago it was "AI can automate certain processes," and this year it is "completing in one night what takes 8 people 10 months." Each year pushing a step forward while leaving a statement of “we are still far from the ultimate goal.” Looking back three years later, the previous “still far” has already been achieved, and the new “still far” is defined at positions still out of reach for all competitors.
What NVIDIA has done in the past twelve months is essentially one thing: use AI in the most valuable and moat-deep segments of the chip industry supply chain, and then sell these tools layer by layer to the entire industry.
The front end of chip design is taken over by internal LLMs like Chip Nemo; the mid-design standard cell library migration and layout optimization are managed by NB-Cell and Prefix RL; the entire EDA toolchain is tied to its own GPU through Synopsys’s $2 billion investment and Cadence’s “exclusively based on Blackwell;” manufacturing-side lithography computations are managed by cuLitho, which TSMC is already using.
From design to manufacturing, NVIDIA has re-done each segment using AI. Each segment ultimately leads to the same endpoint: if you want to use the fastest tools, you have to buy NVIDIA’s cards.
For all competitors wanting to create a chip that can defeat Blackwell, the most awkward thing has already happened. The EDA tools needed to design this chip run the fastest version on NVIDIA’s GPUs; the lithography computations required to manufacture this chip have the fastest algorithm libraries provided by NVIDIA; the computing power for training design AI is also powered by NVIDIA’s cards.
The person you need to defeat is renting you all the tools you need to defeat it. The rent is paid annually, and the contract increases in price every year.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。