National Data Bureau: By the end of 2025, more than 100,000 high-quality datasets will have been established nationwide, roughly equivalent to 310 times the total digital resources of the National Library of China.

robot
Abstract generation in progress

The Daily Economic News reporter | Zhou Yifei    The Daily Economic News editor | Bi Luming

On March 24, the State Council Information Office held a press conference to brief on matters related to the 9th Digital China Development Summit.

On-site, Liu Riehong, Director of the National Data Administration, said that as of this March, China’s daily Token (token) call volume has already exceeded 140 trillion. Compared with 100 billion at the beginning of 2024—an increase of more than 1,000 times; compared with 100 trillion at the end of 2025—within just three months, it has again grown by more than 40%. The substantial increase in daily Token call volume fully shows that China’s artificial intelligence development has entered a phase of rapid growth.

Image source: Photo taken on site by The Daily Economic News reporter Zhou Yifei

As of this March, China’s daily Token call volume has increased by more than 1,000 times from 100 billion at the beginning of 2024

Daily intelligent assistants, intelligent analysis on the industry side, and more cannot do without massive, high-quality data as support. What work has the National Data Administration done to empower artificial intelligence innovation and development with high-quality datasets, and what further plans are there next?

Liu Riehong said that the National Data Administration highly values the work of empowering artificial intelligence innovation and development with data factors. To address the issue of “small and scattered” problems in high-quality dataset construction, it worked with 26 departments to select 72 lead units for building high-quality datasets, 140 units for pilot work and early trials, and 104 typical cases. It has built an ecosystem for constructing high-quality datasets characterized by lead units driving others, participation by multiple parties, joint tackling of key challenges, co-building and sharing, and win-win cooperation—continuously advancing the construction of high-quality datasets.

To promote the development of the data labeling industry, the National Data Administration has arranged seven cities—Chengdu, Shenyang, Hefei, Changsha, Haikou, Baoding, and Datong—that shoulder pilot and early-trial tasks for building data labeling. It issued the “Implementation Opinions on Promoting the High-Quality Development of the Data Labeling Industry,” selected 47 excellent data labeling cases, and guided the hosting of seven supply-and-demand docking meetings for data labeling. Next, the National Data Administration will focus on regions that are strong in scientific and technological innovation, have solid development foundations, and feature distinct industrial characteristics. It will focus on two directions—“knowledge-intensive” and “technology-driven”—and carry out, in a phased and tiered manner, a batch of technology-advanced, distinctive, and high-efficiency data labeling industry innovation pilot zones, to enable high-quality dataset labeling innovation experimentation.

Liu Riehong further pointed out that the National Data Administration also continues to foster a market consensus of “paying for high-quality data,” and to support the listing, uploading, and trading of high-quality industry datasets on data exchanges. It supports institutions such as data circulation service platforms and data merchants to provide services for circulation and trading. It encourages various data circulation service institutions to explore diversified models for the circulation and utilization of high-quality datasets, promote orderly matching between the supply and demand of high-quality datasets, and support high-quality datasets to flow in the industry.

China’s work in building high-quality datasets has achieved phased results. As of the end of 2025, more than 100,000 high-quality datasets have been built nationwide. The total scale exceeds 890 PB (a computer storage capacity unit), which is roughly 310 times the total digital resources of the National Library of China. As of this March, China’s daily Token call volume has exceeded 140 trillion. Compared with 100 billion at the beginning of 2024, it has grown by more than 1,000 times; compared with 100 trillion at the end of 2025, within just three months it has again grown by more than 40%.

“Such a substantial increase in daily Token call volume fully indicates that China’s artificial intelligence development has entered a phase of rapid growth. Application scenarios are being continuously deepened—from being able to converse to being able to make decisions and execute. The competitiveness of China’s AI industry has also been significantly enhanced. Now, the heated discussion about Token going overseas is one sign of this enhanced industrial competitiveness. From the data perspective, it also indicates that dataset supply is increasing in large volumes, and the value of data factors is being continuously released. Data-factor empowerment for AI innovation and development has entered a stage of healthy interaction.” Liu Riehong said.

Liu Riehong emphasized that next, the National Data Administration will continue to push forward data-factor empowerment for artificial intelligence innovation and development. It will coordinate with all parties to implement in depth a new round of the high-quality dataset construction action plan, including six major special action initiatives: strengthening the foundation and expanding capacity, tackling annotation challenges, improving quality and efficiency, empowering applications, providing management and services, and releasing value. Pulling forward progress by scenario demand, it will accelerate the work of pilot projects and early trials, and build AI-Ready (AI readiness level) high-quality datasets that are technically feasible, practically convenient, and supported by quality assurance, to improve both quantity and quality in the supply of high-quality datasets.

Promoting the introduction of policy documents on data-factor empowerment for new industrialization

The reporter of The Daily Economic News also noted that recently, the Ministry of Industry and Information Technology issued a notice to launch the Industrial Data Foundation-Building Action, carrying out pilot work and early trials for the construction of high-quality industry datasets aimed at empowering artificial intelligence. How will progress be further made afterward?

Wang Yanqing, Director of the Information Technology Development Department of the Ministry of Industry and Information Technology, said that next, to do well with the work of pilot projects and early trials, the Ministry of Industry and Information Technology needs to continue doing three things. First is to strengthen support and guarantees. It will work with local bureaus of industry and information technology and data authorities to provide resource support and guidance for the joint pilot bodies, promptly follow up to resolve issues encountered, pool experience, and speed up the formation of outcomes that can be promoted.

Second is to strengthen policy guidance. It will promote the introduction of policy documents on data-factor empowerment for new industrialization, issue reference guidelines for the application of data factors in industrial scenarios, and strengthen guidance on development and promotion of models.

Third is to cultivate a good ecosystem. It will accelerate the development of industrial data standards, grow and strengthen data service enterprises such as data consulting, data governance, and data labeling, support hosting a number of technical seminars and supply-and-demand docking meetings, and at the same time strengthen and optimize open-source artificial intelligence communities, building a high ground for the aggregation of high-quality open-source data resources. In particular, at the summit that is about to be held this year, the Ministry of Industry and Information Technology will also host a special meeting on data-factor empowerment for new industrialization, and invite representatives from pilot and early-trial units to share some experience. It will also launch a competition for data-factor empowerment for new industrialization in 2026.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin