From "Token" to "Symbol": The AI Underlying Cognitive Debate Behind the Chinese Name of Token

Recently, the National Committee for Terminology Approval in Science and Technology issued a notice, recommending that the term “Token” in the field of artificial intelligence be translated as “Word Element”, and this is open for public trial. Subsequently, the "People's Daily" published an article titled "Experts Interpret Why the Chinese Name for Token is Defined as 'Word Element'", systematically explaining this naming from a professional perspective.

The article mentions that the term “token” originates from the Old English tācen, meaning "symbol" or "mark". In language models, a token is the smallest discrete unit obtained after text segmentation or byte-level encoding, which can take different forms such as words, subwords, affixes, or characters. The model exhibits a certain level of intelligence by modeling the sequence of tokens.

This translation is considered to conform to the principles of univocality, scientific validity, conciseness, and coordination in the expert verification system, and it has a certain usage base in the current Chinese context. However, after reading the relevant interpretations, I have developed a different understanding of this naming approach.

From a standardization perspective, this naming scheme has comprehensibility and dissemination advantages in the short term. However, when examined from dimensions such as computational ontology, information structure, multimodal evolution, and consistency in back-translation, its long-term adaptability still needs further verification. In this context, an alternative path that deserves attention—“Symbol Element”—is gradually revealing stronger structural consistency and stability across different contexts.

1. Misalignment of Definitions: "Origin" Cannot Replace "Essence"

Article viewpoint (Chen Xilin, researcher at the Institute of Computing Technology, Chinese Academy of Sciences): The initial role of Token in artificial intelligence is "the basic semantic unit of language", so "Word Element" can more closely reflect its essence.

This judgment is reasonable in a historical context, but in the current era of significant technological paradigm shifts, this way of thinking is essentially an instance of “academic stops at the river.”

On the logical level of terminology definition, it is essential to sharply distinguish "initial application scenario" from "structural essential attributes".

Tokens indeed originated from natural language processing (NLP), but in the evolution path of AGI, they have already transcended the boundaries of language models, evolving into basic units that unify the processing of text, images, speech, and even physical signals. In modern computational systems, the true structural ontology of a Token is "discrete symbol unit," rather than a singular modal language unit.

If named according to "initial role," a computer should still be called "electronic calculator" (originating from its initial function of replacing human calculators); the Internet should be called "Cold War military network." The fatal flaw in this naming logic is that it only sees the "temporary job" of technology at a specific historical moment while ignoring its "physical ontology" that spans eras.

Historical paths cannot be equated with essential attributes. Similarly, we cannot permanently lock Token within the narrow context of "word" simply because it was initially used for processing text.

Using "initial application scenario" to define basic concepts essentially replaces the truth of structural ontology with historical path dependency. This definition might offer convenience in understanding during the early stages of technology, but in the expansion phase of multimodal outbreaks, it quickly becomes ineffective and turns into a shackle that hinders cognition. In contrast, “Symbol Element” directly aligns with the ontological symbols of cross-modal computing, defining not the “past” of Token, but rather the “truth” of Token.

2. Boundary of Analogy: Explanation Becomes Definition and Begins to Deviate

Article viewpoint (Dong Yuxiao, associate professor at Tsinghua University's Department of Computer Science): Through analogies like "word cloud" and "bag of words," the discrete units in multimodal contexts can be understood as "broadly defined words."

Professor Dong Yuxiao's analogy aids in understanding, but should not replace definitions. This line of thought is somewhat enlightening at the explanatory level, but if it escalates to a basis for naming, it may cause categorical misalignment at the conceptual level.

From a methodological perspective, the role of analogy is to lower the understanding threshold while the duty of definition is to delineate semantic boundaries. When the term “word” is extended to cover image patches, speech segments, embedding vectors, and even broader perceptual signals, its original linguistic attributes have been increasingly diluted, and semantic boundaries become vague. This “analogy-driven” expansion path can maintain consistency of explanation in the short term but is prone to semantic drift in long-term evolution.

In terms of cross-modal expansion capabilities, care must be taken to guard against the slippage from “analogy” to “definition.” In the context of terminology standardization, it is essential to differentiate the boundaries of “interpretative metaphor” and “ontological definition” to avoid the former substituting for the latter.

A more intuitive analogy is that in a popular science context, we can liken a lightbulb to a "man-made sun" to enhance the intuitiveness of understanding; however, in a scientific naming system, one cannot rename the unit of electric current "Ampere" to "light element" based on this analogy. The former belongs to descriptive expression, while the latter involves a rigorous measurement system and standardized definition, which cannot be conflated.

Similarly, terms like “word cloud” and “bag of words” essentially belong to descriptive or statistical metaphors, serving to assist in understanding data structures or distribution patterns; while Token, as a basic metric unit in large models, has been deeply embedded in computational billing, model training, and academic measurement systems. When its usage scale reaches up to billions to trillions of calls daily, what its naming carries is not just an explanatory function, but also a foundational concept with engineering and standard significance. At this level, terminology needs to align with its ontological attributes rather than rely on analogies for extension.

If this analogy logic is further pushed to the naming level, it implicitly carries a dangerous premise: since people have become accustomed to understanding Tokens as “words,” why not continue to use this analogy? But this is essentially a continuation of path dependency—using the convenience of existing cognition to replace adjustments to the ontological concepts. In this sense, this naming is closer to a form of “linguistic romanticism” rather than strict alignment with computational ontology.

We cannot insist that since “horsepower” contains “horse,” we should discuss “electronic horses” in motors. Analogies can inspire understanding, but cannot define standards.

In contrast, “symbol” as a more neutral concept has a natural capacity for cross-modal adaptation and covers various information forms such as text, images, and speech without additional explanations. Therefore, a naming path centered on “symbol unit” is closer to the structural essence of Token at the definitional level. In this logic, “Symbol Element” as the corresponding translation embodies higher conceptual consistency and long-term adaptability.

3. The Cost of Cognition: When Semantic Anchor Points Create Systematic Misunderstandings

Article viewpoint (compiled expert opinions): “Word Element” is concise in expression, aligns with Chinese habits, and is easy to disseminate.

This judgment holds certain reasonableness at the dissemination level, but its implicit premise is: the public can accept the cross-modal analogy of “word.” However, analogy is fundamentally an expert thinking tool, not a natural cognitive method for the general public. For ordinary users, “word” has a strong semantic anchoring effect—once they hear “word,” their intuition will inevitably point to the linguistic system, rather than to images, sounds, or actions within other modalities. This cognitive path is not a technical issue but a stable structure at the level of cognitive psychology.

On this basis, when “word” is extended to the so-called “broadly defined word,” it has effectively created a deviation in user cognition. Users initially form an intuitive understanding of “word = linguistic unit,” rather than the abstract concept of “cross-modal symbol unit.” Once this misunderstanding is established, all subsequent explanations will turn into corrections of existing cognition, rather than extensions of natural understanding.

For instance, when the media reports that “the model was trained with 100 trillion word elements,” the public easily understands it as having “read a vast amount of text,” overlooking the significant amounts of image, audio, and other modal data that are included. This misunderstanding is not an isolated case, but a systemic effect induced by the semantic anchoring of the term itself.

In practical engineering contexts, this naming might also cause friction in cross-disciplinary communication. When discrete units in visual models or speech models are called “words,” it not only easily triggers semantic misunderstandings but also creates unnecessary linguistic conflicts between different fields. Multimodal systems need unification at the “symbolic layer,” rather than an expansion of linguistic categories.

In contrast, “symbol” as a more abstract concept, while slightly higher in initial understanding threshold, has more neutral semantic pointing and does not pre-lock cognition at the linguistic layer. Over long-term use, it is more conducive to establishing a stable and unified cognitive framework, thereby reducing overall explanation costs and providing a more stable cognitive foundation for multimodal unification.

The cost of naming occurs not at the time of definition, but at the time of correction; once early naming establishes semantic anchoring, the cost of subsequent cognitive repair will rise exponentially.

Experts can extend the boundaries of “word” through analogy, but the general public will not understand concepts through analogy. Naming serves not the experts, but bears responsibility for the cognitive system of the entire era.

4. The Illusion of Univocality: When a Word Tries to Carry Two Systems

Article viewpoint (principle of terminology approval): “Word Element” conforms to the principle of univocality and helps solve the problem of translation chaos.

In terms of the univocality of terminology, particular attention should be paid to the systemic risks that “one word with two meanings” may cause. In the approval of scientific terminology, “univocality” is one of the foundational principles. If a term requires contextual or additional explanation to distinguish its meaning, then it has already lost its value as a standard component.

However, from the existing academic system's perspective, this judgment still has space for further discussion. The term “Word Element” has long been “owned” in the fields of linguistics and natural language processing (NLP); in classical linguistics, its long-term corresponding English concept is Lemma, which is the normative original form of a word (for example, the lemma for is/am/are is be). This usage has formed a stable consensus in fundamental textbooks and academic papers in linguistics and NLP.

In this context, if Token is also translated as “Word Element,” semantic conflict may easily arise in specific expressions, leading to disastrous scenarios.

For instance, when describing the operation of “lemmatizing a token” in NLP, the Chinese expression would result in “carrying out ‘lemmatization’ on ‘Word Element’,” which not only increases understanding costs but can also introduce ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether “Word Element” refers to the segmented discrete unit or the normative original form of the word.

From the functional concept perspective, the two also have clear distinctions: Lemma emphasizes the “restoration” on the linguistic level, corresponding to normative expressions after morphological changes; while Token emphasizes the “segmentation” in the computational process, corresponding to the smallest discrete unit when the model processes information. This difference between “restoration” and “segmentation” corresponds to different dimensions of semantic and symbolic layers.

Therefore, when a term requires “generalization” to simultaneously cover multiple existing concepts, its univocality has actually transformed into “unification at the explanatory level”, rather than “stability at the semantic level.”

When a term needs to rely on explanation to maintain unity, its stability as a standard term has often begun to shake.

In contrast, “Symbol Element” does not present any semantic conflicts within the existing terminology system. On one hand, it retains the ontological attributes of Token as a discrete symbol; on the other hand, it avoids overlapping with existing translations of Lemma, thereby exhibiting higher stability in semantic clarity and system consistency.

5. The Return to Ontology: Token is Essentially “Symbol”, Not “Word”

Article viewpoint (general explanation): Token is the smallest unit used to process text in language models.

This statement holds at the functional level but remains at the level of “how to use,” without touching upon its ontological attributes within computational theory. From the perspectives of information theory and computational theory, the fundamental objects processed by computational systems are not “words,” but “symbols” (symbol).

This can be further understood from two levels:

On one hand, from the perspective of information theory, the essence of information lies in eliminating uncertainty, with the measurement unit being bits (bit), and the carrier entity being discrete symbols. Symbols do not concern semantic content, but are only related to probability distributions and encoding structures;

On the other hand, at the level of computational implementation, large models do not “read” as humans do; rather, their processing objects are discrete index representations (ID). Whether this ID corresponds to a Chinese character, an image patch, or an audio sampling point, it participates in computations in a unified symbolic form.

In this framework, it is precisely because its essence is on the “symbol layer,” not the “semantic layer.” The symbols themselves do not carry semantics but exist as the basic carriers of encoding and computation.

Naming Token as “Word Element” introduces an implicit pointing towards the layer of language semantics to some extent, bringing this concept originally situated in the symbol layer back to a language-centric understanding path. This naming approach may offer intuitiveness at the explanatory level but can blur the boundaries between “symbolic computation” and “semantic understanding” at the theoretical level.

In contrast, “Symbol Element” conceptually remains within the symbol layer. On one hand, it accurately reflects the computing properties of Token as a discrete symbol; on the other hand, it avoids bringing semantic traits into the definition of ontology, aligning more closely with the fundamental framework of information theory and computational theory.

From a broader perspective, as artificial intelligence systems continue to evolve towards multimodal and general intelligence, naming foundational concepts that can directly align with their mathematical and computational ontology will be more conducive to constructing stable and scalable cognitive systems. In this sense, a naming path centered on “symbol unit” not only addresses language selection issues but represents a consistent expression of the essence of computation, with “Symbol Element” being the natural counterpart within this framework.

Defining concepts from the symbol layer aligns with the essence of computation; naming concepts from the semantic layer is closer to explanation than definition.

6. Break in Language: Mapping Failures in Back-Translation Mechanism

Article viewpoint (comprehensive interpretation): “Word Element” has gradually formed a usage base in the Chinese academic community and has certain dissemination advantages.

In a cross-linguistic context, attention must be paid to the systemic impacts brought by the “break in back-translation” of terms. Whether a scientific and technological term possesses long-term viability depends not only on its expressive ability within the Chinese context but also on whether it can achieve stable mapping in the international academic system. An ideal term should possess “reversibility,” that is, achieve semantic consistency in back and forth between different languages.

The above judgment reflects the acceptability of “Word Element” in the local context but still leaves room for further discussion from a cross-linguistic perspective. If a term is only valid in a single language system and cannot form a stable corresponding relationship in the international context, it may introduce additional understanding costs in academic exchanges.

Specifically, “Word Element” lacks a clear and unique corresponding path in the back-translation process. When it is reverted to English, it often generates discrepancies among multiple similar concepts: for example, “word unit” lacks a strict academic definition, “morpheme” corresponds to morphemes in linguistics, while “lexeme” points to lexemes. None of these concepts accurately cover the meaning of Token in the computational context, leading to category shifts.

In contrast, “Symbol Element” can more naturally correspond to “symbolic unit”. This concept has a clear theoretical basis and stable use in fields such as information theory, discrete mathematics, and multimodal representations, allowing it to maintain consistent semantic pointing across different contexts. Therefore, a one-to-one mapping relationship is more easily formed between Chinese and English.

From a practical perspective, once a term enters academic papers, technical documents, and international communication scenarios, its back-translation capability directly impacts expression efficiency and understanding accuracy. If a term requires additional explanation to complete cross-linguistic conversion, its long-term usage costs will continually accumulate.

Thus, in cross-linguistic systems, the main issue faced by “Word Element” lies in the instability of mapping paths, while “Symbol Element” demonstrates greater certainty in semantic correspondence and conceptual consistency. In the context of increasingly globalized artificial intelligence, choosing terms with good back-translation traits will be more conducive to constructing open and interoperable academic and technical systems.

The international reversibility of terminology is essentially a key metric of its long-term academic viability.

7. The Fallacy of Uniformity: Formal Consistency Does Not Equal Structural Consistency

Article viewpoint (compiled expert opinions): “Word Element” maintains consistency in expression style with terms like “embedding” and “attention,” being concise, abstract, and aligning with the Chinese technical context.

Conclusion precedes: The unity of the terminology system should be based on “conceptual isomorphism” rather than “linguistic isomorphism.”

In the support argument for “Word Element,” a common reason is that its expression style is consistent with terms like “embedding” and “attention,” being concise, abstract, and aligning with the Chinese technical context. This reason captures the genuine need for unity in the terminology system, but the issue lies in—if unity only remains at the language level rather than at the structural level, it will slide from “order” to “illusion.”

“Embedding” and “attention” have become stable terms because they correspond to clear computational structures: the former is vector mapping, and the latter is weight mechanisms, with their naming directly pointing to the essence of computation. In contrast, “Word Element” belongs to interpretative naming, its rationality depends on the framework of a “broadly defined word.” Once detached from explanation, this naming itself lacks self-consistent structural direction.

This difference brings up a key issue: Formally consistent, semantically deviated.

The former reduces expression costs, while the latter ensures cognitive stability. If “linguistic isomorphism” is prioritized, complexity does not disappear but instead shifts into a long-term cognitive burden; only naming based on “conceptual isomorphism” can maintain stability amid cross-context and multimodal evolution.

When “embedding,” “attention,” and “Word Element” appear simultaneously, it is easy to form the illusion of “conceptual co-level.” But in reality, the first two are mechanisms, while the latter is an object; the former two have strict definitions, while the latter relies on contextual interpretation. This structural misalignment will embed latent fractures within the cognitive system.

More importantly, when the naming of a foundational concept relies on analogy rather than structural definition, its impact does not stay within a single term, but rather spreads to the entire terminology system. When subsequent concepts attempt to revolve around this naming, they will inevitably have to rely on continuous explanations to maintain consistency, thus creating latent structural misalignment.

In this sense, “Symbol Element” provides a pathway that is closer to the underlying structure. It directly points to the fundamental objects in computational systems—symbols—without relying on analogical explanations and can maintain consistency across various contexts.

Terminology is not just a label, but an entry point to cognition. Good terminology gradually eliminates the need for explanation, while poor terminology adds to the burden of annotations. When foundational concepts deviate from structure, the terminology system can only rely on explanations for maintenance, whereas it cannot sustain itself through definitions.

Conclusion

Essentially, the choice of terminology is not merely a linguistic issue, but an early shaping of the cognitive structure within a field. Once naming deviates from its structural ontology in the initial stages, the subsequent system can only maintain operation through continuous explanations, while struggling to form a self-consistent network of concepts.

As artificial intelligence moves towards generalization and multimodal integration, a term that can align with computational ontology and possess stability across contexts is more likely to become a long-term effective cognitive cornerstone. In this sense, the naming path centered on “symbol unit” demonstrates a more balanced adaptability in addressing both the essence of technology and cognitive clarity.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。