Understand the new model of Token economics in one article.

A middle layer for token distribution connecting large model vendors and developers is emerging, with real profits hidden in inference acceleration, enterprise integration, and scene implementation.

Written by: Zhao Ying

Source: Wall Street Insights

The commercialization of AI applications is evolving from selling software and memberships to selling token call capabilities. Here, tokens are the smallest units of information that large models use to process data, also serving as the basis for API billing, settlement, and consumption. As the call volume expands, tokens themselves begin to be treated like "inventory" that is purchased, routed, split, and resold.

Chen Liangdong, an analyst at Huayuan Securities, outlined the core changes in a recent report on the media industry: "Token operations are forming a new intermediate market, exploring token distribution models that connect upstream large model vendors with downstream developers, enterprises, and individuals. Essentially, it is a liquidity infrastructure for global token wholesale to retail networks."

The backdrop for this business emergence is straightforward: on one side, token call volumes in China are rapidly expanding, with an average daily call volume of 100 billion at the beginning of 2024, rising to 100 trillion by the end of 2025, and exceeding 140 trillion by March 2026; on the other side, the capabilities of domestic large models have stepped up, already entering the global top tier in certain rankings and call volumes. As demand increases and models proliferate, the real bottlenecks in transactions have shifted to payment, network, interfaces, compliance, channels, and scene implementation.

However, token distribution should not simply be understood as "reselling API quotas." The thinnest profit layer comes from resale price differences, while thicker portions arise from inference acceleration, unified interfaces, enterprise-level prompt engineering, agent orchestration, model selection, and business system integration. Because the entry barriers are not high, the risks in this market are also direct: intensified competition, capital solicitation and bad debts, and policy changes by upstream model vendors can compress intermediate layer profits.

Tokens now have "wholesalers" and "retailers"

The basic chain of token distribution includes three types of roles.

Upstream are model providers, including ByteDance's Seedance series, Alibaba's Qwen series, Zhipu's GLM series, the Kimi series from the Dark Side of the Moon, the DeepSeek series, etc. They are the source suppliers of tokens.

The intermediate layer consists of agency platforms that are responsible for acquiring upstream model resources and redistributing them to end-users. Their work is not only to resell quotas but also to convert different model interface protocols into a unified API format, enabling downstream users to call multiple models with just one API key.

Downstream are the actual consumers of tokens, including individual users, developers, corporate customers, and possibly lower-level distribution practitioners.

The value of this intermediate layer is concentrated in several areas: direct connections in China reduce network entry barriers; one set of code adapts to multiple models; support for individual and corporate payments; potential for lower costs through bulk purchases; a platform aggregating different models like GPT, Claude, DeepSeek, Kimi, etc., reducing the repetitive integration costs for developers.

Therefore, token distribution appears to be a light asset business that does not require training large models or maintaining large-scale server clusters. The core assets have become API transfer and scheduling systems, upstream model resources, channel customers, and service capabilities.

Surging call volumes are the most direct fuel for this business

For the token operation model to establish itself, there must first be a sufficiently large consumption volume.

The average daily token call volume in China soared from 100 billion to over 140 trillion within two years, growing more than a thousandfold. The expansion in call volume comes from the implementation of various vertical agents and from enterprises embedding generative AI into more business processes.

IDC's data presents a more aggressive path: the number of active intelligent agents among Chinese enterprises is expected to surpass 350 million by 2031, with a compound annual growth rate exceeding 135%; as the density and complexity of tasks for intelligent agents increase, the annual growth rate of token consumption is expected to exceed 30 times.

Execution-oriented intelligent agents are already showing this change. The weekly token consumption of OpenClaw on the OpenRouter platform rose from 0.81T between February 2 and March 16, 2026, to 4.97T, with the proportion increasing from 8.31% to 24.36%.

Once tokens become mass consumables, procurement, pricing, routing, and settlement surrounding them will naturally stratify. Model providers may not directly serve every single customer, and end customers may not be willing to individually access models, thus creating space for this intermediate layer.

The cost-effectiveness of domestic models opens the door for token exports

The enhancement of domestic large model capabilities is a key variable for token distribution to transition from domestic to cross-border.

SuperCLUE data indicates that the composite scores of domestic models, such as ByteDance's Doubao and DeepSeek series, have already surpassed 70 points, narrowing the gap with top overseas models like GPT-5.4 and Gemini; models like Tongyi Qianwen, Kimi, and Zhipu GLM have also formed a relatively clear hierarchy.

According to OpenRouter data, as of the week ending May 10, 2026, Tencent's Hy3 preview (free) ranked first in call volume; among the top five, ten, and twenty models, there are 2, 6, and 9 domestic large models, respectively.

A more symbolic change occurred in the first quarter of 2026. From February 9 to 15, the call volume of Chinese models on OpenRouter reached 41.2 trillion tokens, surpassing the 29.4 trillion tokens of U.S. models for the first time during the same period. From February 16 to 22, the weekly call volume of Chinese models further rose to 51.6 trillion tokens; among the top five models in platform call volume, four were from Chinese vendors, specifically MiniMax M2.5, Kimi K2.5, Zhipu GLM-5, and DeepSeek V3.2, collectively contributing 85.7% of the total call volume of the top five.

Price advantages are also significant. The input price for MiniMax M2.5 and GLM 5 is $0.30 per million tokens, while Claude Opus 4.6 costs $5; for output prices, MiniMax M2.5 is $1.1, GLM 5 is $2.55, and Claude Opus 4.6 is $25. In high token consumption scenarios like AI agents and code development, the cost-effectiveness differences of domestic models will continue to be magnified.

Global AI resource imbalances make routing platforms "transshipment stations"

Token distribution does not merely solve pricing issues, but also addresses resource mismatches.

Top overseas large models are affected by geographic access restrictions, compliance rules, and payment barriers, preventing them from directly reaching some users, including developers in mainland China. The high-quality domestic large models going overseas also face challenges of localization adaptation, channel establishment, and user acquisition.

This imbalance has created demand for cross-border circulation, aggregation routing, and stratified distribution.

OpenRouter is already a typical example. Its platform processed token volumes ranging from 50 trillion to 70 trillion weekly in 2025, increasing to over 200 trillion weekly by April 2026; during the year of 2026, its annualized revenue exceeded $50 million, growing about fivefold from the over $10 million annual revenue disclosed in October 2025.

There are also similar platforms domestically. Silicon Flow is a one-stop large model cloud service platform that utilizes its own research and development inference engine for efficient inference acceleration while providing enterprise-level large model services. As of December 2025, the platform had over 9 million registered users, more than 10,000 corporate users, and over 150 models available.

Even politically-related capital in the U.S. has entered this sector. On May 5, 2026, WLFI, a cryptocurrency company closely linked to Trump and his family, partnered with WorldClaw to launch WorldRouter, integrating over 300 models including Claude, GPT, and Gemini, priced at USD1, about 30% lower than official published rates.

Real profits are not necessarily found in "arbitrage"

There are three ways to profit from token distribution.

The first is resale price differences. Platforms purchase API quotas in bulk from upstream model vendors and sell them to downstream customers at a markup. OpenRouter's approximately 5.5% premium on supplier costs is a representative of this model.

The second is technical premium. Platforms reduce the operating costs of tokens per unit through self-developed inference acceleration engines, allowing them to capture gross margins when selling at prices close to or even lower than official rates, relying on computational efficiency. Silicon Flow’s technologies, SiliconLLM and OneDiff, have enhanced the inference speed of language models by ten times and the efficiency of text-to-image tasks by three times, bringing the API calling cost down to one-tenth of the industry standard.

The third is enterprise value-added services. The cost for businesses to deploy AI is not only in token prices but also includes prompt engineering, multi-model selection, business system integration, workflow orchestration, operation scheduling, and employee AI capability building. As basic token prices drop, these hidden costs may more easily become points of payment.

Silicon Flow's enterprise-level MaaS platform is an example of this direction: it provides three levels of capabilities for enterprise users, including model training optimization, deployment inference, and application development support, covering data processing, model fine-tuning, prompt engineering, and RAG, ultimately delivering these as standardized APIs to sectors such as energy, finance, and government.

Marketing, short dramas, gaming, and e-commerce are easier scenarios for token consumption

For token distribution to be profitable, it ultimately must translate into real-world scenarios.

Generative AI applications are entering industries like healthcare, broad transportation, and industrial manufacturing, also starting to participate in core processes such as corporate decision support and strategic management. However, many enterprises have a weak foundation for intelligent transformation, insufficient data asset accumulation, and limited computational investments, making direct deployment of AI capabilities challenging.

In contrast, marketing and advertising companies already have clients and scenarios, engaging in short dramas, comic dramas, gaming, e-commerce, and other areas where token consumption needs are more direct and sustained. For these companies, the opportunity lies not just in reselling model capabilities but in embedding tokens into their clients' content generation, delivery, material production, and video processes.

Investment clues are also developing along two main lines:

One type consists of companies with strong model capabilities, including Alibaba, Tencent Holdings, Kuaishou, Kunlun Wanwei, Zhipu, MiniMax, and others.

The other type includes companies with strong token scenarios and high-quality client sources, especially those with overseas client resources and marketing scenarios that are willing to actively layout in AI marketing and AI video production, such as Yidian Tianxia, BlueFocus, and others.

Risks are also substantial: low entry barriers, capital investment required, upstream control

The business model of token distribution is light, but the moat is not inherently deep.

Peer competition is the first layer of risk. The technological barriers for distribution businesses are relatively low, and once leading distributors enter the market leveraging their capital, clients, and channel advantages, they can quickly replicate models, compressing profit margins.

Capital investment and bad debts represent the second layer of risk. Distributors often deal with downstream clients on a monthly or quarterly settlement basis but need to invest upfront when purchasing API quotas from upstream. The larger the scale of token consumption, the greater the pressure of capital investment; once customers default, the risk of bad debts will also increase.

The third layer of risk involves changes in policies from upstream model vendors. Large model vendors control API prices and access rules, and they may adjust prices or tighten third-party access policies. For the intermediate layer, this is the hardest aspect to control.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。