Written by: Techub News Compilation
Recently, NVIDIA CEO Jensen Huang returned to his alma mater, Stanford University, to deliver an in-depth presentation lasting nearly an hour in a course on cutting-edge computer science systems. In this dialogue, Huang not only reviewed NVIDIA's journey in leading the AI computing revolution but also outlined forward-looking perspectives on how AI fundamentally reshapes the entire computing stack, why open models are crucial, and strategies to address energy bottlenecks. As a key figure deeply involved in defining the current wave of AI, his insights provide a clear framework for understanding future technological trends.
Fundamental Changes in Computing Paradigms: From “Pre-recorded” to “Real-time Generation”
Huang pointed out at the beginning that we are at an unprecedented exciting moment in the history of computer science. Computing is experiencing the most severe reshaping in over 60 years. Since the IBM System 360, the basic architecture of computers, programming models, and even business models have remained relatively stable. However, AI, particularly deep learning and generative AI, has completely changed all of this.
He made an essential distinction between traditional computing and AI-driven computing: in the past, computing processed “pre-recorded” content—pre-stored images, videos, and software code; whereas now, the core of computing is “real-time generation.” This generation is not just about content creation but also about intelligent outputs that have contextual consistency, relevance, and can respond to user intent. This means that from software development methodologies, corporate organizational structures, to software execution methods (neural networks vs. compiled binaries), and down to the underlying computer systems, networks, storage, and even the top-level cloud services and applications, every layer is undergoing fundamental changes.
He used autonomous driving as an example: achieving fully automated driving was considered unattainable before the advent of deep learning. But AI unlocked this possibility, and today he believes “everything that moves will be automated.” This fundamental unlocking compels us to rethink every aspect: what is a software engineer in the AI era? What is a computer in the AI era? How should we design its architecture? What can we ultimately do with it? Where should we deploy it?
Huang traced the beginning of this transformation back about 15 years ago. When generative AI (like GPT) emerged, many focused on its ability to generate images or text, but he saw a deeper significance: AI is capable of generating “thoughts”. Generating tokens for internal consumption is thinking, while generating tokens for external use is tool usage. The advent of GPT clearly heralded the arrival of “thinking” AI. Today, we have entered the era of “agent systems,” where computers no longer respond on demand, but run continuously. This will trigger a new round of rethinking of all systems, from cloud services to personal computers.
The Power of Collaborative Design: Million-fold Performance Improvement and Ecological Activation
When asked why “collaborative design” is so important, Huang referred back to Stanford's tradition. He mentioned Professor John Hennessy’s beautiful work on RISC architecture: harmoniously collaborating design between compilers and microprocessor architectures to create overall performance that is better than independent optimization of each. This concept has been maximized by NVIDIA in the AI era.
He pointed out that during the age of general computing, people tended to use general tools to solve all problems. But for specific fields with extremely high computational intensity (like the past of computer graphics, molecular dynamics, quantum chemistry, or today’s deep learning), general computers are not the optimal choice. NVIDIA's core philosophy is: deeply understand algorithms, computer systems, compilers, frameworks, and chip architectures, and synchronize optimization across all these layers. NVIDIA may be the first computer systems company to practice this “extreme collaborative design,” covering CPUs, GPUs, networks, switches, and storage.
The results of this approach are astounding. In the past decade, with Dennard scaling failures and the near-end of Moore's Law, relying solely on the scaling of microprocessors, performance improvement might only be around 10 times. However, through collaborative design, NVIDIA achieved a million-fold performance improvement over a decade. Such a massive leap in computational scale and speed allows AI researchers to pose the question, “Why not use the entire internet's data directly?” When computational speed reaches this level, all assumptions about computation change.
Huang emphasized that it is this accelerated computation brought about by collaborative design that creates “infinite abundant opportunities,” allowing everyone to reimagine the future. Moreover, NVIDIA has activated the entire downstream industry by creating foundational models. For example, creating BioNeMo in the biological field, Alpamayo in the autonomous driving sector, Groot in the humanoid robotics field, and models in climate science. Without NVIDIA’s leading investment in building these foundational models, scientists in related fields might lack the necessary scale and technology to initiate their work. This practice essentially expands and democratizes AI capabilities.
He specifically explained why NVIDIA is investing in open language models (like Nemotron). There are two reasons: first, many languages in the world are not large enough for major commercial entities to prioritize, but the intelligence represented by each language should be valued; second, it is crucial to integrate language models with domain-specific models because human prior knowledge can greatly enhance efficiency. For instance, Alpamayo (the autonomous driving model) integrates language models with world models, enabling it to reason like humans, thus significantly reducing the amount of training data needed and proving its efficiency and safety.
Regarding model safety, Huang presented a striking viewpoint: to achieve safe and reliable AI, it must be open. You cannot defend a black box, nor can you ensure the safety of a black box. Transparent systems allow everyone to scrutinize it, and researchers can utilize it. In the face of future cybersecurity challenges, the smartest approach is not to compete in versioning against more powerful attack models but to use millions or billions of low-cost AIs (like Nemotron Nano) to form a “swarm” that can systematically defend.
Metrics, Bottlenecks, and Future Architectures: Beyond “MFU,” Focusing on True Performance
The discussion shifted to resource utilization and metrics. Huang criticized the current industry's focus on “MFU” (Model Floating Point Utilization) metrics. He believes that solely pursuing high MFU may be a mistake. In large data centers, among resources like floating point operations, memory bandwidth, memory capacity, and network capacity, one will always become a bottleneck. The ideal approach is to over-allocate across all aspects to avoid the limitations of Amdahl's Law. This means that to cope with peak loads, you would have many floating point units idle at times, resulting in low MFU, but at critical moments, they can deliver 100% performance.
He compared this to the past, when people would ask a car “how much horsepower,” but no one asks that anymore. The key now is real “performance”. For AI computing, a more relevant metric might be “number of tokens generated per watt” (tokens per watt). During the decoding phase of large language models, the critical factor for generating tokens is actually the aggregate bandwidth provided by NVLink 72, while the MFU could be very low at that time. But not every token has the same value, so ultimately, we need to return to a real evaluation of “success.”
Huang admitted that designing underlying platform architectures for different customers (each with their own evaluation standards) is extremely challenging. Over-optimizing for a specific problem might yield astonishing performance in that domain, but if the market isn’t large enough, it can’t support large R&D. Conversely, if you try to do everything, you might become mediocre. Finding this balance is an “art” that requires vision, strategy, trial and error, and personal judgment.
He then outlined the evolution of NVIDIA's chip architecture and future outlook:
- Hopper: Designed specifically for the then-emerging “pre-training” problem, daring to conceive and build a system worth billions of dollars, even though it initially seemed without clients.
- Grace Blackwell: Recognized that the ultimate goal of AI is “reasoning,” and created the NVLink 72 architecture, aggregating 72 chips to offer extraordinarily high memory bandwidth to meet token generation demands, achieving a 50-fold performance improvement in two years.
- Vera Rubin: Designed for “agents.” Agents require long-term memory and working memory, with storage needing to communicate directly with the GPU, while the CPU requires extremely low latency to respond to tool usage instructions from AI.
- Feynman: Looking to the future, where AI might be fully software-based, comprising a swarm of “agent systems and sub-agent systems.” The Feynman architecture will be designed for such swarm computing models.
Regarding energy bottlenecks, Huang pointed out that future computing will be “generative” and “continuous,” fundamentally different from the current “on-demand retrieval” model. He estimates that the energy required for future computing may be over 1000 times that of the present. The way to address this is first by increasing energy efficiency (like the tokens per watt metric), which NVIDIA has already improved by 50 times. Secondly, it requires educating the entire ecosystem for preparedness. Lastly, sustainable energy is key. Now is the best time to invest in sustainable energy because market forces are strong enough to not rely on government subsidies. Upgrading the grid and adding various sustainable energy sources are essential.
Education, Strategy, and Responsibility: The CEO's Perspective
Regarding how education can adapt to industry changes, Huang believes AI must be integrated into the curriculum, not just learning AI, but also using AI to assist learning. Traditional textbooks cannot keep pace with the real-time generation of information and knowledge by AI. He shared his approach to learning: after reading papers, he has AI read related papers, turning it into a “dedicated super researcher,” and then interacts with it deeply.
On career choices, he offered pragmatic advice: pursuing passion and enthusiasm is good, but many people do not know what they are passionate about. He believes that “only selecting jobs that bring joy” is an overly high standard. He admitted that as a CEO, he only loves about 10% of his job; the remaining 90% is challenging and needs to be “endured.” But this “endurance” can forge resilience and tenacity, which you will need when the future world, family, company, or colleagues require you to be strong.
Speaking of initial strategic mistakes, Huang mentioned NVIDIA's foray into the mobile device market. Although it once established a billion-dollar business, it was completely squeezed out during the transition from 3G to 4G. He reflected that if he had thought a few steps ahead back then, he should have been able to foresee its transient nature and thus concentrate resources in other fields. However, the extremely low power consumption and high energy efficiency technologies accumulated from the mobile device experience were successfully transitioned into then-nonexistent “robotic” applications, and today’s Thor chip is a continuation of that technological lineage.
On strategy and forecasting, Huang's method is based on observation, returning to first principles to reason, building mental models for the future, and then reverse-deducing the company's positioning. He acknowledges that predictions may not be completely accurate; the key is to lower opportunity costs, increase optionality, and strive to generate value from the journey itself.
Finally, Huang responded with strong personal conviction to the topics of technology export, competition, and the future of the industry. He opposed comparing NVIDIA GPUs to nuclear weapons, emphasizing that they are general computing tools applied in countless beneficial areas, including gaming and medical imaging. He firmly believes that the U.S. tech industry is a national treasure and should not abandon two-thirds of the global market through policy; otherwise, graduates will enter a shrinking industry. He dismissed the sci-fi fear of an AI “singularity” suddenly arriving, being unmanageable, and taking over the world as irresponsible and untrue. He called for the creation of a future that is optimistic about technology and firmly believes that everyone should have access to AI, rather than nuclear weapons.
Regarding the issue of universities obtaining computing resources, he candidly stated that the problem lies within the system itself: research departments operate independently, and grants are insufficient to support large shared computing facilities. He suggested that universities like Stanford, with substantial endowments, should proactively change their budgeting and computing resource allocation methods, investing in building campus-level supercomputers or purchasing corresponding cloud services to allow every student and researcher to access AI supercomputing. This requires planning and determination rather than simply blaming suppliers.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。