Robots "cannot be fed enough," and real-world data remains a key battleground for embodied intelligence.

robot
Abstract generation in progress

Data issues are one of the core challenges facing the intelligent upgrade of the robotics industry today.

On March 28, as a parallel sub-forum of the Zhongguancun Forum, the 2026 China Science Fiction Conference was held in Beijing, and the Beijing Shijingshan Embodied Intelligence Tactile and Multimodal Perception Data Training Innovation Center was officially unveiled on-site at Shijingshan District, Beijing.

According to a reporter, the center was jointly built by Beijing Shijingshan Technology Innovation Group Co., Ltd. and Hisan Technology, targeting the needs for the development of the embodied intelligence industry. It has identified three major technical directions: tactile, heterogeneous, and autonomous number-free personless data collection, and it will build a full-process technology transformation platform that integrates multimodal data acquisition, algorithm training, and scenario deployment.

Data issues are one of the core challenges facing the intelligent upgrade of the robotics industry today. During the Zhongguancun Forum “Billion-Yuan Embodied Intelligence Dialogue” session, Xingtong Yiyuan co-founder Xi Yue said that the biggest difficulty in the current development of embodied intelligence is still data.

From the perspective of specific business deployment scenarios, Xi Yue believes that collecting data from real scenarios is difficult and requires the scenario party to open permissions. At the same time, large-scale collection is costly and takes a long time. Moreover, existing alternative solutions have limitations. In the industry’s commonly used 1:1 replication approach of building training facilities to replicate real scenarios, because it depends on engineers participating end to end in data collection, training, deployment, and issue diagnosis, overall efficiency is low and costs are high.

In Xi Yue’s view, the industry can build a “data collection–model iteration” closed-loop data flywheel, enabling robots to autonomously handle all kinds of extreme situations in real environments and continuously improve system efficiency; second, it can promote a combined data-collection mode of “human demonstrations + real-machine data acquisition,” but currently it still needs to work on the differences between the two in terms of body configuration, motion forms, and sensing methods.

Tang Wenbin, founder of Yuanli Lingji, acknowledges that data is one of the bottlenecks for embodied intelligence today, but it is not the whole problem. In his view, data acquisition is essentially a question of money and time. By investing capital to buy robots, build training facilities, hire remote operation personnel, outsource labeling, and so on, you can quickly accumulate data volumes of millions of hours and sample sets at the billion scale. Therefore, “whether or not you have data” is not an industry barrier. What truly creates competitive advantage is whether a company can automatically stream data back from real scenarios, and whether it can build an efficient closed-loop data flywheel.

Zhi Pingfang currently has multiple ways to obtain data, but when returning to reality, co-founder Zhang Peng still believes that the value of data from real scenarios is irreplaceable—this is also the direction that the industry must focus on right now. The portion that enables data to stream back and then be deposited through products deployed on the front line is the most valuable data asset. Under the premise of ensuring security, Zhi Pingfang will also share this part of data with customers.

According to a reporter, the fourth-phase project of the Beijing Shijingshan humanoid robot data-acquisition and training center is currently mainly cooperating with companies including Leju, Hisan, Reilman, and Lingchu, attempting to solve the data shortage and quality bottlenecks in the robotics industry.

Regarding the current supply-and-demand issue of robot data, an industry insider told a reporter that the embodied intelligence sector is currently undergoing a reconstruction of data systems. With the rise of non-robot-body data technologies (such as EGO for first-person perspective data and UMI solutions for general operating interfaces), the remote operation data-acquisition “factories” that previously relied on heavy asset investment may face development challenges.

From the perspective of data value, the insider said that real-scenario data is still the “pinnacle” data needed for robot model training, but the industry generally faces two major core problems: first, the lack of standardized design for data quality and data pipelines; second, there is a significant industry gap in data processing capabilities. Not all vendors have the technical strength to build efficient data processing systems, and there is also a lack of unified mechanisms for sharing data technology know-how and a baseline evaluation system within the industry, leading to uneven data application efficiency.

The insider said that if in the future non-body data technologies like EGO and UMI can be widely adopted, it could further amplify the core scarcity of scenario resources. Companies may also be able to get rid of reliance on traditional data-acquisition factories and complete data acquisition directly in real scenarios. The accessibility and diversity of scenarios will become key variables in data competitiveness.

Therefore, judging from the trend of technological upgrades and iterations, embodied intelligence needs training data on the scale of hundreds of millions to billions of hours, but the current total amount is still seriously insufficient. However, some core assets that are misaligned with mainstream technical routes may face risks of devaluation in the future—for example, heavy-asset centers that rely on robot bodies and fixed facilities may see capacity utilization decline and unit costs soar.

In the long term, the core logic of competition in the data domain will undergo a fundamental shift—from hardware-based rivalry over “whether to have large-scale training centers,” to aspects such as the ability to obtain data from real scenarios and the efficiency of closed-loop iteration between scenarios and data.

A massive amount of information and precise interpretation—find it all on the Sina Finance APP

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin