◢ Whitepaper · Embodied-AI Data

Why Data Scale Isn't the Moat

A million hours of unverified, unstructured, untraceable footage is not a moat — it's a liability someone downstream has to clean. The moat is verifiability, provenance, and a semantic taxonomy of real work. CNet is built on those three, end to end.

01The thesis

The race in embodied-AI data is framed as a race for scale. That frame is wrong.

Scale answers "does it exist?" — not "can I use it?" When a clip's camera calibration has drifted, the IMU is tens of milliseconds out of sync, or one frame's annotation is from a previous schema, the model doesn't throw an error. It quietly learns the wrong thing — and by the time your loss won't drop, you can't trace which frame caused it.

Positioning map: scale vs verifiability — Fig 1 · The moat isn't the lower-right "scale" — it's the upper-left "verifiable · structured."

02Positioning

Dimension	Build · Egocentric-1M	Mecka · EgoVerse	CyberOrigin · CNet
Orientation	scale-first	consortium-first	verifiable-asset-first
Verification	none (self-audit)	consortium review	11 quality gates + trust gate + daily audit
Taxonomy	one class	no taxonomy	occupation × scene × process × action
Provenance	frame metadata	experiment metadata	frame-level sidecar + signed certificate
Model	open / PBC	data service	data asset + end-to-end platform

The left two columns are valid paths — scale, and consortium. CNet takes a third: not biggest, not endorsed-by-committee, but every clip verifiable on its own.

03Five differentiators

1 · Verified, not vast

Every clip passes 11 quality gates + a detection-spec trust gate, re-audited daily. We price by usable time — a clip that fails QC is not billed and not shipped.

Verify at capture vs clean after — Fig 2 · Where verification happens decides what enters the store.

2 · A taxonomy of real work

CNet maps the physical world as occupation × scene × process (O*NET-style), down to second-level actions. Buyers query by work semantics — "bimanual assembly, automotive welding, 5s action window" — not "video_batch_07." The taxonomy also surfaces what's missing, so coverage grows on purpose.

Occupation x scene x process taxonomy — Fig 4 · Query reality by work semantics, not folder names.

3 · Provenance to the frame

Each clip carries a birth certificate — device, operator, time, QC result — as a frame-level sidecar with a Process/Action timeline, signed. You don't trust the vendor; you verify the certificate.

Frame-level birth certificate — Fig 3 · Every clip carries its birth certificate.

4 · End-to-end owned

Capture hardware (first-person rig + data glove), the collection-operations platform, the processing/QC pipeline, the perception stack, and the developer tooling are one chain. No black boxes between silicon and tensor.

The owned chain CAPTURE to BUILD — Fig 5 · End to end, owned. Break one link and intelligence starves.

5 · China supplier network

A transparent, managed supplier network across China's industrial scenes — coverage that's hard to assemble from outside, with a cost structure and onboarding speed others can't match.

04The bottom line

Not a dataset. A data asset.

Datasets are sold by the hour. Assets have provenance, verification, and an audit trail. When everyone is comparing whose number is bigger, ask the more useful question: of that million hours, how much can you use as-is? We've made our answer verifiable — per clip, signed. You don't have to trust us. Just verify.

Request data access →