Anthropic quantifies biological risks, who will regulate high-energy scientific AI?

CN
10 hours ago

Recently, Anthropic has laid bare the biosafety assessment of Claude Mythos 5 in its public system card: during a red team exercise specifically targeting plant pathology, six PhD biologists, with the help of the model, designed an end-to-end biological resistance strategy for a hypothetical engineered agricultural pathogen. External experts estimate that completing similar defense strategies and implementation protocols without the use of AI typically requires 40 to 95 working days, averaging about 72.5 working days. Anthropic candidly states in the system card that this is one of the strongest pieces of evidence that Mythos 5 is close to its internally defined CB-2 biosafety risk threshold— the model has demonstrated the ability to elevate a general researcher to a knowledge level close to that of a world-class expert in tasks such as completing experimental designs, optimizing steps, and identifying gaps. Meanwhile, the CUSP scientific prediction benchmark has poured cold water on this: when determining whether a research path can ultimately succeed, the overall performance of current large models still approaches random guessing, revealing a clear gap from the truly “autonomous scientist.” The capability curve is elongated: on one end, there is a significant acceleration of specific technical work, shortening of critical paths, while on the other end, there is still a lack of overall judgment and original planning. This gray area of "assisted PhDs approaching top experts, yet still not independent research entities" is forcing regulators and compliance departments to answer a new question—when the system card begins to quantify the risk threshold of this powerful research AI, who has the authority and capability to delineate the audit-able and accountable boundaries of its use.

Mythos 5 Approaches the CB-2 Risk Red Line

In Anthropic’s internal safety framework, CB-2 is used to mark a “quantified high-risk tier”: it’s not simply a matter of “can it answer experimental steps,” but whether the model is sufficient to systematically alter the reality risk profile in the biological field. The earlier CUSP scientific prediction benchmark has provided a reference—current large models perform outstandingly well in completing parts of research steps, but are almost equivalent to random guessing when judging whether a research path will ultimately succeed. This indicates that the real danger lies not in the model suddenly becoming an all-capable scientist, but in its amplification of mid- to high-level researchers when placed in specific experimental scenarios. Once this amplification exceeds the CB-2 threshold defined by Anthropic, the model would no longer be seen merely as a “smart assistant,” but rather as a technological node with structural biological risks.

The plant pathology red team exercise is a key example that Anthropic uses to illustrate Mythos 5’s proximity to this red line. In this exercise, six PhD biologists, assisted by Mythos 5, designed an end-to-end defense strategy for a hypothetical engineered agricultural pathogen; expert reviews provided a control group: completing similar defense strategies and implementation protocols without such model tools typically requires 40 to 95 working days, averaging about 72.5 working days. The time saved is not only in the “information gathering” effort but in how the model consolidates the scattered knowledge of top experts into a compressed process that allows “general researchers” to achieve systematic work that would generally take long-term accumulation to master in a short time. This is also why Anthropic lists this exercise in the system card as one of the strongest individual pieces of evidence of Mythos 5’s proximity to the CB-2 risk threshold. Additionally, Anthropic’s choice to write this internal tier, red team design, and expert assessment into the public system card itself mimics a platform compliance pathway—similar to how a cryptocurrency trading platform demonstrates to regulators through disclosures of KYC, transaction monitoring, and sanctions screening mechanisms that “we know where the risks are and actively identify them.” System cards and model cards are being shaped by leading model developers into documentation interfaces that external regulation and audits can link to; the next question for regulators is how to turn these self-reported risk tiers into genuinely executable external rules.

Assisted PhD Accelerates by 70 Days Yet Still Cannot Self-Generate

In the plant pathology red team exercise, six PhD biologists, with assistance from Mythos 5, designed an end-to-end defense plan for a hypothetical engineered agricultural pathogen. The subsequent biosafety expert review provided a control group: under the absence of such model tools, completing similar defense strategies and implementation protocols typically requires 40 to 95 working days, averaging about 72.5 working days. In other words, Mythos 5 compresses what originally would require two to three months of “information gathering + writing plans + filling in details” into a centralized collaborative process, effectively raising general researchers to a knowledge level “close to world-class experts.” Anthropic deliberately uses language that describes the model as “empowering generalists” in the system card, yet does not claim that it can autonomously select topics or plan routes. This aligns with the findings of the CUSP scientific prediction benchmark: currently, large models excel at completing parts of experimental steps and conducting literature summaries, but once asked to judge whether a complete research path will ultimately succeed, their performance approaches random guessing, indicating a clear gap from being an “autonomous scientist.”

Because of this, the regulatory and safety community has begun to classify different risk levels of research AI using “process assistance” versus “autonomous decision-making”: tools like Mythos 5, primarily responsible for accelerating, completing, and translating existing human ideas, are more like consolidating the resources that are originally scattered across postdocs, research leads, and literature databases within the laboratory, with the risk focus remaining on the user itself; when a model can autonomously propose, screen, and assess entirely new experimental routes in high-risk biological fields, the responsibility focus will shift from “who used it” to “who enabled it with such capabilities.” Following the progressive path from soft guidelines to record-keeping obligations, risk assessments, and even licensing requirements in various countries, a possible regulatory boundary can be foreseen: research AI limited to process assistance will fall under a relatively relaxed framework for filing and auditing, while models with autonomous route planning capabilities would be placed under strict regulations similar to dual-use research and export controls; this capability boundary may very well become the first substantial red line drawn by various countries in the regulation of high-risk research AI.

From Biosafety to Model Licensing

When Anthropic writes that Mythos 5 is close to the CB-2 biosafety risk in the system card, it is not just performing “academic disclosure,” but is using a set of quantifiable capability gradings to transplant the traditional laboratory biosafety framework into the model layer. In the past, high-pathogenic pathogens and dual-use research required laboratory licensing, ethical reviews, and special permissions; moving forward, thresholds like CB-2 can easily be taken by regulators in various countries as triggering points for “model licensing”—once a model is assessed to possess high-risk research assistance capabilities, it no longer remains ordinary software, but is classified under the licensing lists for technologies akin to dual-use technologies and sensitive export projects.

If this line of thought is adopted, the first parties to be drawn in won’t be individual researchers but cloud service providers, model API platforms, and research institutions. Cloud and API providers have already experimented with access restrictions based on intended use and user categories in highly sensitive scenarios; now, by adding “biosafety high-risk capabilities” to a sensitive directory and layering over practices already mature in the traditional finance sector—KYC, behavior monitoring, sanctions list screening—a complete package of compliance tools directed at models could be constructed: high-risk research capabilities would only be made accessible to institutions that pass qualification reviews, usage would have to be reported in advance, and critical interactions would need to have logs retained, while external audits could also be required when necessary. For organizations like DeSci and BioDAO, which support research through token governance and on-chain funding, future interactions with such advanced models may also require them to wrap an additional layer of “compliance shell” around their on-chain governance, providing auditable disclosures about projects, participants, and funding flows. Anthropic’s system card practices are furnishing ready-made documentation templates for all this: developers first self-report their capabilities and internal controls, then regulators decide which capabilities must be licensed; this path is very likely to become the mainstream solution for the formal regulation of powerful research AI.

Risk Control Obligations of the Crypto Research Community

Translating the plant pathology red team scenario of Mythos 5 into the on-chain world sharpens the question immediately: when six PhD biologists can run through a complete process from proposal design to implementation protocol on a hypothetical pathogen with the model's assistance, research organizations like DeSci and BioDAO that rely on token governance, open proposals, and on-chain funding allocation also have pathways to embed similar capabilities into their projects. As long as the direction funded by the DAO falls within sensitive disciplines like biology and medicine and uses high-capability models to design experimental steps and optimize defense strategies, it is no longer merely a “technical toy” of “anonymous voting + funding contracts,” but essentially stands at the frontier of various countries' biosafety, export controls, and hazardous material research regulations, with traditional ethical reviews and safety responsibilities typically assigned to offline laboratories likely being questioned down to “who clicked the approval button on-chain.”

This is also why the crypto research community will eventually have to confront the same exam question of “does the platform bear reasonable control responsibilities?” Regulatory bodies have already written an answer regarding digital asset service providers: requiring KYC, transaction monitoring, sanctions list screening obligations, presuming that platforms cannot completely abdicate responsibility for high-risk activities. Accordingly, for DeSci and BioDAO, one possible self-rescue path is to inscribe biosafety and AI usage guidelines into the on-chain governance itself—for example, requiring proposals involving biology to disclose the proposed model capability level, reference system card risk grading, and have them initially screened by a review committee or external compliance advisor with relevant expertise before funding disbursement is triggered; setting up access thresholds for interfaces connecting to high-capability models, only accessible to researchers who pass identity verification and belong to compliant laboratories or are included on a whitelist, while maintaining the most basic on-chain reporting records for research purposes. Before unified regulation is realized, even if only a few project parties initially adopt constraints like advisors, committees, and whitelists, it will nonetheless draw a new regulatory dividing line for the DeSci ecosystem: on one end, there are “compliance testbeds” willing to leave auditable footprints for high-risk projects, while on the other end, there are those who continue to insist on complete openness and may be folded into focused law enforcement attention at any time.

Compliance Front Lines for DeSci and BioDAO

The case of Mythos 5 clearly outlines a warning line for DeSci and BioDAO that is both clear and still shifting: these advanced models can already enable general researchers to obtain support akin to world-class experts in specific biological tasks, as Anthropic describes in the system card and corroborates through the plant pathology red team exercise—defense strategies that originally took dozens of working days could be significantly compressed with the model's assistance. Yet, it remains a clear distance from truly “conducting research independently.” The performance approaching random guessing in predicting the success or failure of research paths on the CUSP benchmark supports the regulatory classification of large models as “high-energy assistive tools rather than autonomous labs” in the short term. Following down this line of reasoning, future rules concerning DeSci may very well delineate responsibility boundaries along two lines: one is the “capability thresholds” like CB-2 within Anthropic, where if a model invoked by a DAO is deemed to cross a specific biosafety level, it triggers additional record-keeping obligations, risk assessments, or even entry permissions; the other is the “usage scenario,” where the same Mythos 5 used for general literature review and for defense designs relating to sensitive pathogens may be intentionally differentiated in compliance burdens. For project parties and users, the observations over the next few years will be very concrete: whether disclosure standards like system cards and model cards are incorporated into regulatory documents, whether internal gradings like CB-2 would be directly adopted by policies as regulatory thresholds, and whether DeSci and BioDAO will be willing to embed biosafety reviews, model usage whitelists, and audit logs into their on-chain governance to position themselves on the audit-able side; these choices will determine who can have a voice at the negotiation table when high-energy research AI enters the next round of regulatory formation.

Join our community to discuss and become stronger together!
AiCoin exclusive Hyperliquid benefits: https://app.hyperliquid.xyz/join/AICOIN88
AiCoin exclusive Aster benefits: https://www.asterdex.com/zh-CN/referral/9C50e2
On-chain Telegram community: https://t.me/AiCoinWhaleData
On-chain community: https://www.aicoin.com/link/chat?cid=N6OVMor5g
AiCoin on-chain Twitter: https://x.com/aicoinwhaledata

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink