Source: Cointelegraph
Original: “Decentralized OORT AI Dataset Ranks High on Google Kaggle”
The AI training image dataset developed by decentralized AI solution provider OORT has achieved significant success on Google’s Kaggle platform.
The "Diverse Tools Kaggle" dataset list from OORT was released in early April; since then, it has climbed to the homepage in several categories. Kaggle is an online platform under Google for data science and machine learning competitions, learning, and collaboration.
Ramkumar Subramaniam, a core contributor to the crypto AI project OpenLedger, told Cointelegraph, "The homepage ranking on Kaggle is a strong social signal indicating that the dataset is attracting active participation from key communities such as data scientists, machine learning engineers, and practitioners."
Max Li, founder and CEO of OORT, revealed to Cointelegraph that the company "has observed encouraging participation metrics, validating that the training data collected through decentralized models indeed has early market demand and relevance." He added:
"Spontaneous interest from the community, including active usage and contributions—clearly demonstrates how decentralized, community-driven data pipelines like OORT can achieve rapid distribution and widespread participation without relying on centralized intermediaries."
Li also stated that OORT plans to release multiple datasets in the coming months. These include an in-car voice command dataset, a smart home voice command dataset, and a deepfake video dataset aimed at enhancing the media authenticity verification capabilities of AI.
Cointelegraph independently verified that the aforementioned datasets successfully made it to the homepage in Kaggle's general AI, retail and shopping, manufacturing, and engineering categories earlier this month. As of the time of publication, the dataset no longer maintained these ranking positions following a potentially unrelated dataset update on May 6 and another update on May 14.
While acknowledging this achievement, Subramaniam told Cointelegraph, "This is not a definitive indicator of practical application or enterprise-grade quality." He pointed out that the uniqueness of the OORT dataset "lies not only in its ranking but also in the source channels and incentive mechanisms behind the dataset." He further explained:
"Unlike centralized providers that may rely on opaque processes, a transparent, token-incentivized system can offer traceability, community co-management, and continuous optimization, provided that the appropriate governance structure is established."
Lex Sokolin, a partner at AI venture capital firm Generative Ventures, stated that while he believes these results are not difficult to replicate, "it does demonstrate that crypto projects can leverage decentralized incentive mechanisms to organize economically valuable activities."
Data from AI research organization Epoch AI indicates that human-generated text AI training data is expected to be exhausted by 2028. The pressure has become so great that investors are currently facilitating deals for AI companies to obtain rights to use copyrighted materials.
Research reports on the increasing scarcity of AI training data and how this may constrain the development of the field have circulated for years. While synthetic (AI-generated) data is being increasingly utilized and achieving some success, human-generated data is still widely regarded as the superior choice, as this high-quality data can cultivate better-performing AI models.
In the field of AI training images, the situation is becoming increasingly complex, as artists are consciously sabotaging training efforts. To protect their works from unauthorized use in AI training, the Nightshade tool allows creators to "poison" their images, severely impacting model performance.
Subramaniam noted, "We are entering an era where high-quality image data is becoming increasingly scarce." He also emphasized that the widespread use of image poisoning techniques makes this challenge even more severe:
"With the rise of image obfuscation techniques and adversarial watermarking as AI training poisoning methods, open-source datasets are facing dual challenges of quantity and credibility."
In response to this situation, Subramaniam stated that verifiable and community-contributed incentive datasets "are more valuable than ever." He believes that such projects "not only serve as alternatives but will also become important pillars for AI alignment and data provenance in the data economy."
Related: Kima Joins Mastercard Sandbox for Stablecoin Card Recharge
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。