Rocky
Rocky|Feb 18, 2025 14:43
Let me talk about some thoughts after watching the Grok3 press conference today, which can be described in a few words: mixed feelings, falling into a confirmation error (as shown in Figure 4). Although I am not a professional AI investment player, I should still be considered one of the top performers in amateur AI investment. Let me explain why I have this feeling, combined with my experience this afternoon. Firstly, it can be confirmed that the score of Grok3 this time has indeed surpassed many mainstream models currently known, especially in benchmark tests such as mathematics, science, and programming. It can be described as far ahead, and the Grok3 Reasoning Beta with reasoning ability is an invincible existence. As Musk said, the most powerful AI model on the surface currently has no problems. But I still express some disappointment. Why do you say that? From the actual situation. Especially the evaluation data of LMSYS 📊, We can see that Grok3 scored 1400, GPT4o scored 1380, and Deepseek R1 scored 1360. In fact, the difference between them is not too big, around 1% -3%. However, based on the sensory differences observed during an afternoon of testing, tests were conducted using Big Punch and CLUE, including some classic Tower of Hanoi problems, multimodal recognition, and Python based sorting algorithms. The actual results were comparable to GPT4o and Deepseek, but in terms of Python, the speed and accuracy were slightly lower than Deepseek. The key is to achieve this effect, The cost of Grok3 is quite high, with 200000 GPUs (reportedly H100) and a cumulative training time of 200 million GPU hours. I can only say that the cost is too high, energy is greatly wasted. GPT4o completed 1380 points with only one tenth of the cost, and Deepseek (H800 old generation GPU) completed 1360 points with only one hundredth of the cost. I can only say that the money making ability is really good, the landlord's family is really wealthy. I made a big cannon and felt like it hit a mosquito. Why do I say that many AI systems have entered a 'confirmation bias' where they believe that increasing the radius of a circle can increase its area. The 'scale effect' of faith. If the development of AI leads to everyone running around in terms of scores and scale, it would be a massacre and irresponsibility towards the human environment. When the scale reaches a certain level, the marginal effect decreases. So currently, the development of AI is facing a key 'confirmation bias': overly relying on economies of scale, neglecting the true improvement of efficiency and intelligent core. With the rise of large models such as GPT-3, GPT-4, and the upcoming release of Grok-3, the industry generally believes that as the number of parameters increases, the performance of the model will inevitably enhance. However, the fact is that as the model size continues to expand, the marginal effects decrease, and the performance improvement does not increase proportionally as expected. For example, GPT-3 has 175 billion parameters, while GPT-4 has 1.8 trillion parameters, an increase of more than ten times, but its performance improvement is far from reaching ten times. This trend reveals that simply increasing scale and computing power cannot sustainably bring about qualitative breakthroughs. In addition, the expansion of data scale has not brought the expected performance improvement. The data of the Internet has been exhausted for decades, but the performance of AI has not been improved dozens or hundreds of times. Looking at the human brain, early biologists conducted a study that was very confusing. It's just that there's no way to explain why humans are smarter than other animals. Because the weight of the human brain cannot compare to that of an elephant. And the number of neurons cannot compare to that of a blue whale. So in fact, the human brain does not have any significant advantage in terms of computing power. As for the data level, it's even more frustrating. Most birds, the visual data information they come into contact with, is overwhelming to humans. But the core advantage of humans is algorithms. Recently, Li Feifei's team spent $50 to reproduce Deepseek's paper, which also proves that the core of whether an AI model is powerful lies in its ability to think and reason, rather than its scale. So, humans cannot compare with large-scale AI models in terms of computing power and data volume, but the core of their intelligence lies in the efficient operation of algorithms and ways of thinking. The human brain does not rely on massive amounts of data, but rather adapts flexibly to complex tasks through highly optimized algorithms and reasoning abilities. The power of the human brain is only 20-30 watts, while models like GPT-4 require huge amounts of electricity and computing power. Currently, Chatgpt consumes over 500000 kWh of electricity per day, equivalent to the electricity consumption of 20000 American households. The Grok3 released today is likely to be multiplied by 10 on top of Chatgpt, which will result in extremely high energy costs and environmental issues. This is also the reason why innovative AI models such as DeepSeek are gradually gaining attention. The core advantage of DeepSeek lies in its Mixture of Experts (MOE) mode, which is similar to the learning method of the human brain. Through the mechanism of reinforcement learning, AI can achieve intelligent improvement at a lower computational cost. The MOE model adjusts model weights through the reward and punishment mechanism of intelligent agents, guiding AI to reason and make decisions more efficiently, rather than simply learning through memory and rules. This approach can avoid the common waste of resources in training large models and improve the learning efficiency of the model. In contrast, although the large scale and complexity of large-scale AI models can improve performance in the short term, as the scale expands, the performance improvement will gradually flatten out, and even exhibit a phenomenon of diminishing marginal effects. Taking Grok3 as an example, although its graphics card count has surged, the improvement in model performance may not necessarily meet the investment cost. If the development of AI continues to rely solely on computational and data resources, it may face problems of resource waste and difficult cost control in the future. Therefore, future AI development should focus more on efficiency rather than scale. By optimizing algorithms, increasing intelligence density, and utilizing reinforcement learning methods, AI can enhance its true reasoning and judgment abilities with lower computational and data resource consumption. This will be the key to the evolution of AI technology from large-scale reasoning to efficient intelligence, and may lead the entire industry into a new stage of development. Worth pondering deeply 🧐
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads