Mobilizing the Masses to Join In, JD.com Aims to "Refine Elixir" with Embodied Data

On March 16th, JD.com announced the construction of the world’s largest and most comprehensive embodied intelligence data collection center, emphasizing its focus on the robotics track after a period of silence and being overshadowed by lobster-related news.

In a sense, this is a large-scale data production movement with a strong industrial Internet flavor.

The mobilization involved over 100,000 internal employees, up to 500,000 industry personnel externally, and even mobilized over 100,000 citizens in Suqian alone—an unprecedented mass mobilization effort aimed at using scale-driven aesthetic violence to break through the current Achilles’ heel of embodied intelligence: data scarcity.

Today, as model architectures gradually converge and computational thresholds become more transparent, high-quality physical interaction data has become the decisive factor for whether robots can truly penetrate various industries.

Behind this “largest data collection operation in human history,” industry consensus is revealed: as embodied intelligence responsible for motion control—the “small brain”—becomes more advanced, the core battle for the industry’s future lies in how to feed higher-quality data to develop truly physically-aware “brains.”

From JD’s grand narrative to industry micro-reality, whether the data generated by hundreds of thousands of people is a gold mine or just gravel remains uncertain.

The Involved Workforce

JD.com’s boldness and necessity in launching this massive data collection campaign hinge on its vast and highly complex self-operated supply chain.

Unlike pure software internet companies, JD itself is a large physical-world interaction platform, and the maturity of embodied intelligence directly impacts its fulfillment costs and operational efficiency over the next decade.

This layout is deeply coupled with the robotics industry ecosystem in Beijing Yizhuang.

Yizhuang Economic and Technological Development Zone has already gathered over 300 robotics-related companies, with an industry chain exceeding 10 billion yuan, offering more than 40 real application scenarios, becoming a core hub for domestic humanoid robot industry. As a local “chain leader,” JD has previously announced an acceleration plan for the robotics industry.

JD’s significant investment in soft infrastructure, exemplified by the data collection center, is essentially filling a critical gap in the industry chain. Yizhuang provides “bodies” and testing grounds, while JD attempts to inject understanding of the real world into robots through vast scenario data.

This integration of hardware and software aims to create a closed-loop business cycle from data flywheel to hardware iteration.

Coordinating hundreds of thousands of people is no easy feat.

According to plans, scenarios include logistics, industrial, and retail sectors. In practice, this likely relies on JD’s existing digital management network—for example, equipping frontline couriers and warehouse workers with wearable devices featuring visual and even force sensors for daily operations.

From the perspective of frontline employees and mobilized citizens in Suqian, this effort is highly complex.

Employees inadvertently become data teachers for robots, which aim to replace high-intensity manual labor in the future. Designing reasonable compensation and benefit-sharing mechanisms to prevent employee resistance is a challenge JD must address.

However, specific implementation details have not yet been communicated to employees.

A JD employee in Beijing told Wall Street Journal that he has not heard about this initiative. In his view, if there is appropriate compensation, it should be considered a market activity, and participation depends on individual choice. Another JD employee in Suqian also said he has not received any notice.

Although official statements mention that “all data collection will be conducted strictly in accordance with laws and regulations,” the reality is often more complex.

For example, in the logistics scenario, warehouse workflows are standardized, but delivery to thousands of households and retail scenes involve a large amount of consumer facial features and privacy data.

With increasingly strict data compliance, the cost of anonymizing and cleaning the unstructured data collected from tens of thousands of people could be astronomical.

Moravec’s Paradox Revisited

In 1988, robotics pioneer Hans Moravec concluded:

“It is easy to make a computer pass an IQ test or play chess at an adult level, but extremely difficult—almost impossible—to give it the perception and motor skills of a one-year-old.”

Today, the main reflection of Moravec’s paradox in embodied intelligence centers on the industry’s data vacuum.

The success of large models is built on directly consuming trillions of high-quality text data accumulated over thirty years of internet activity. But the physical world does not have a ready-made internet. To scale embodied intelligence in the real world, a huge data barrier must be overcome.

JD’s large-scale effort targets this core issue and the difficulties behind data collection.

First, the limitations of simulation need to be addressed.

Currently, the industry’s main data acquisition methods have become highly differentiated and are struggling within their respective bottlenecks.

Most startups rely heavily on simulation environments, such as NVIDIA’s Isaac Sim or MuJoCo, where robots perform millions of reinforcement learning iterations in virtual worlds. This approach is low-cost, fast, and avoids hardware damage from trial-and-error.

However, experienced practitioners increasingly recognize the limitations of “Sim-to-Real.”

The complexity of the physical world involves not only visual changes like lighting and shadows but also subtle physical contact feedback—such as cable flexibility, non-rigid deformation of clothing, tiny friction variations when screwing in a bolt, or electromagnetic noise in sensors.

Current physics engines lack the computational power to perfectly simulate these high-dimensional, nonlinear micro-physical laws. This results in models that perform flawlessly in simulation but suffer from serious “brain freezes” or motion distortions when deployed on real hardware.

Since simulation has its limits, the fallback is the real world.

From the viral success of Stanford’s Mobile ALOHA to leading companies like Figure AI, Yushu, and Zhiyuan, many are now using remote operation—where humans wear motion capture suits or VR devices to control robots like avatars, recording first-person visual, joint angle, and force data.

This is currently recognized as the highest quality data collection method. However, it faces the second major commercial challenge: an extremely unprofitable input-output ratio.

Industry estimates suggest that a full-sized humanoid robot costs hundreds of thousands to over a million dollars in hardware alone, and collecting effective data via remote operation not only incurs high hardware depreciation but also requires paying skilled operators.

Wall Street Journal learned that high-quality complex interaction data can cost hundreds of dollars per sample to collect and clean, with a very high failure rate.

This workshop-style, manual data collection model cannot support embodied intelligence moving toward generality at the scale of hundreds of billions or trillions of parameters.

To lower the barrier, giants like Google have launched open-source datasets such as Open X-Embodiment, aiming to centralize data from major labs worldwide for industry-wide use. Domestic companies have also released open-source datasets with hundreds of thousands of real-machine data points.

But there’s another major challenge hidden here: the extreme fragmentation of robot hardware itself. Dog-shaped, wheeled, bipedal humanoid robots, and even different manufacturers’ humanoids—each with different joint degrees of freedom, motor torque, sensor layouts, and center of gravity—are fundamentally incompatible.

A high-quality grasping dataset trained on a UR5 robotic arm cannot be directly transferred to a Tesla Optimus or JD’s logistics robot.

This “cross-physical-body mapping” difficulty causes most open-source datasets to become scattered islands, unable to generate scale effects.

Perhaps under these three major dilemmas, the commercial logic of the embodied intelligence track has fundamentally changed: whoever owns real-world deployment scenarios will have a sustainable moat for acquiring cheap, high-quality closed-loop data.

This explains why Tesla and JD have chosen routes vastly different from other hardware startups.

Tesla leverages its massive gigafactories, allowing Optimus to continuously learn and improve directly on real battery sorting lines; JD, on the other hand, aims to build a semi-automated data pipeline through its nationwide logistics network, hundreds of thousands of industrial workers, and extensive physical retail systems.

This approach transforms supply chain barriers into AI-era data barriers.

In contrast, many robot startups without their own scenarios are forced to pivot—they either sell hardware at low prices to universities and research institutions in exchange for shared data, or spend heavily on factory leasing or hire emerging embodied intelligence data service providers like JianZhi for custom data.

In essence, JD’s entry has torn away the veil of algorithms in the embodied intelligence industry, plunging it into a period of heavy asset competition—fighting over capital, scenarios, and human resources.

In the face of data scarcity, the algorithm moat is thinning, while giants controlling real physical interaction entry points are quietly consolidating their path toward AGI.

The Need for Higher-Quality Data

Faced with JD’s plan to accumulate over 10 million hours of real scenario data within two years, industry reactions are not uniformly enthusiastic but more cautious.

In the context of embodied intelligence, data quality and modality are far more important than mere duration.

The core pain point highlighted by the algorithm industry is: what’s currently lacking is not first-person human perspective videos, but “state-action pairs” with precise physical feedback.

For example, citizens in Suqian wearing cameras shopping or couriers recording delivery processes generate vast amounts of generalized visual data at internet scale.

While valuable for training world models—helping robots understand what a door or an apple is—such visual data is almost useless for training control strategies, like how much force to use to grip an apple without crushing it.

A robotics industry insider told Wall Street Journal that what robots lack is valuable data, especially real machine data. In his view, JD’s operation is more like outsourcing business process operations (BPO), providing personnel and venues.

When humans perform physical grasping, they rely on extremely complex tactile, force, and spatial coordinate adjustments—high-dimensional implicit knowledge that ordinary wearable devices cannot capture. If JD’s tens of thousands of workers only contribute videos, the loss when converting that into robot-executable actions will be enormous.

Another top domestic robotics executive once said that the industry’s primary problem is the “lack of unified data set standards.”

For example, each company’s robots have different joint degrees of freedom, sensor placements, and drive types. How can the massive human motion data collected by JD be mapped onto different robot configurations?

Without a unified underlying standard, this 10,000+ hours of data may ultimately only serve JD’s proprietary robots, not becoming a foundational infrastructure for industry-wide progress.

This may explain why JD’s initial plan emphasizes “1 million hours of robot body data collection.” The industry’s true future development involves: using generalized human videos for pretraining, high-quality robot body data for skill learning, and reinforcement learning for self-evolution.

JD’s announcement of building an embodied intelligence data collection center marks the beginning of a scaled, engineering-driven approach to address the industry’s data shortage.

Combining real scenarios with large-scale human involvement indeed offers a new path for data accumulation.

But to truly realize robotic “intelligent emergence,” merely increasing data volume is not enough.

How to ensure high-dimensionality and high quality in massive data collection, how to establish unified data standards, and how to properly handle privacy and compliance issues during large-scale collection—these will be critical questions for companies and the entire industry on the path to commercialization.

Risk Warning and Disclaimer

Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, viewpoints, or conclusions herein are suitable for their circumstances. Invest at your own risk.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin