OpenAI Launches GPT-5.4 Days Just After Last Version as 'QuitGPT' Exodus Gains Steam

CN
Decrypt
Follow
1 hour ago


OpenAI began rolling out GPT-5.4—its most capable model to date—on Thursday as the company scrambles to contain a PR crisis that has seen an estimated 2.5 million users take actions against the company, either by canceling their subscription or sharing the boycott on social media.


The so-called QuitGPT movement exploded after OpenAI revealed a deal with the U.S. Department of Defense hours after Anthropic publicly walked away from the same contract—earning the Claude maker the public scorn of President Trump and other government officials.


Anthropic's sticking point: The DoD refused to include language explicitly prohibiting the deployment of autonomous weapons and mass surveillance of U.S. citizens.





OpenAI took the deal anyway. CEO Sam Altman, who has been fielding questions about the apparent gap between his company's stated safety red lines and the contract's actual language, needs those users back.


Enter GPT-5.4… just two days after GPT-5.3 was introduced.



The new model consolidates reasoning, coding, and agentic capabilities into a single release. It also has a million tokens of context capability, which translates in users having more freedom to handle large amounts of information in a single session.


On paper, the numbers look promising. On GDPval—a benchmark testing knowledge work across 44 occupations—GPT-5.4 matches or beats industry professionals in 83.0% of comparisons, up from 70.9% for GPT-5.2. Computer use is the biggest leap: On OSWorld-Verified, which measures a model's ability to operate a desktop through screenshots and keyboard/mouse actions, GPT-5.4 hits a 75.0% success rate versus GPT-5.2's 47.3%—and clears the human baseline of 72.4%.


On BrowseComp, a test of deep web research, it jumps 17 percentage points over GPT-5.2. The 1 million token context window and a mid-response steering feature—letting users redirect the model while it's still thinking—round out the headline features.


The feature saves time and computation by avoiding the need to discard all previously generated tokens when an error is detected.



Who will benefit from GPT 5.4?


It’s important to note that some benchmarks mostly compare GPT-5.4—and most of the time, reasoning was set to extra high effort, which free and Plus users don’t get to enjoy—to GPT-5.2, skipping over GPT-5.3 entirely.


For users already on GPT-5.3, several gains may feel more incremental than the charts suggest.




Coders have the most reason to temper expectations: On SWE-Bench Pro, the improvement from GPT-5.3-Codex (56.8%) to GPT-5.4 (57.7%) is barely a rounding error. The model also claims significantly fewer tokens are required to complete tasks compared to GPT-5.2.


“GPT‑5.4 is our most token-efficient reasoning model yet, using significantly fewer tokens to solve problems when compared to GPT‑5.2”, OpenAI said.


That said, any improvement in this field is a positive for developers who use OpenAI models via API and get charged per token used. A model with an efficient chain of thought may provide the same results at a fraction of the cost, versus a model that tends to overthink things to ensure it reaches the proper conclusion.


There's another wrinkle for anyone hoping to use the new model right now: OpenAI says GPT-5.4 will be released today, but it wasn’t yet available as of this writing, so it is likely being slowly rolled out. For most users, the best model is GPT 5.3, and it can only be used for instant replies, meaning it provides answers that don’t require too much effort.


Users who rely on thinking—OpenAI's terminology for extended chain-of-thought reasoning on complex tasks—are still on GPT-5.2. In other words, the users most likely to push the model's limits are the last ones to get it.




The clearest beneficiaries are enterprise users doing document-heavy work. On an internal spreadsheet modeling benchmark, GPT-5.4 scored 87.3% against GPT-5.2's 68.4%. Legal research firm Harvey said it scored 91% on its BigLaw Bench eval. Mainstay, which runs agents across 30,000 property tax portals, reported a 95% first-attempt success rate and sessions running "~3x faster while using ~70% fewer tokens."


That's the kind of efficiency argument that might matter to enterprise procurement teams—but it's a harder sell to the individual user reconsidering whether to delete their account.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink