EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data

Abstract

Egocentric human experience data presents a vast resource for scaling up end-to-end imitation learning for robotic manipulation. However, significant domain gaps in visual appearance, sensor modalities, and kinematics between human and robot impede knowledge transfer. This paper presents EgoBridge, a unified co-training framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. Through a measure of discrepancy on the joint policy latent features and actions based on Optimal Transport (OT), we learn observation representations that not only align between the human and robot domain but also preserve the action-relevant information critical for policy learning. EgoBridge achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines in three real-world single-arm and bimanual manipulation tasks. EgoBridge also generalizes to new objects, scenes, and tasks seen only in human data, where baselines fail entirely.

Latent Alignment with Joint Optimal Transport

Latent alignment with OT. Various domain gaps limit transfer between human and robot. Our key insight: leverage inherent human ↔ robot motion similarities to supervise latent alignment. We formalize this as an Optimal Transport problem, probabilistically mapping human + robot data.

Dynamic Time Warping (DTW)

Joint Latent-Action Cost with OT. To integrate action information into OT, we leverage Dynamic Time Warping (DTW). In a batch, DTW finds the most behaviorally similar human samples to a given sample. These "pseudo-pairs" are used to discount the latent cost in OT, aligning the joint latent–action distribution.

Architecture

Unified Policy. Our method consists of a simple unified policy which integrates directly with human-robot co-training. (a) We embed human and robot samples into a shared latent space with encoder f_φ. (b) We leverage OT as an auxiliary loss function to encourage the encoder f_φ to align human and robot latents. (c) We co-train the policy on human and robot data by jointly predicting actions and computing a combined BC loss.

Results

Experiments: In-domain Performance

(a) Laundry: the robot folds a polo shirt placed randomly on the table.

EgoBridge outperforms Robot-BC and other human-augmented baselines by 44% in three real-world tasks.

Experiments: Scene Generalization

EgoBridge generalizes to an entirely new scene and object combination unseen in robot data where human-augmented baselines fail entirely in the Scoop Coffee task.

Experiments: Behavior Generalization

EgoBridge allows the robot to generalize to a completely out-of-distribution set of trajectories in the Drawer task where the gripper moving to the top right corner is never observed.

Experiments: Simulation PushT

We simulate a human-robot gap in a reproducible planar pushing task. EgoBridge outperforms standard DA baselines, showing cross-embodiment transfer across new backgrounds and new motions.

Latent Visualization

Visualization of aligned latents. We visualize the t-SNE of the latent embeddings produced by the encoder and find that EgoBridge not only aligns human–robot distributions but also structures them into shared semantic clusters.

Conclusion

We presented EgoBridge, a novel co-training framework designed to enable robots to learn effectively from egocentric human data by explicitly addressing domain gaps. By leveraging Optimal Transport on joint policy latent feature–action distributions, guided by Dynamic Time Warping cost on action trajectories, EgoBridge successfully aligns human and robot representations while preserving critical action-relevant information. Our experiments demonstrated significant improvements in real-world task success rates (up to 44% absolute gain) and, importantly, showed robust generalization to novel objects, scenes, and even tasks observed only in human demonstrations, where baselines often failed.

BibTeX

@misc{punamiya2025egobridgedomainadaptationgeneralizable,
  title        = {EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data},
  author       = {Ryan Punamiya and Dhruv Patel and Patcharapong Aphiwetsa and 
                  Pranav Kuppili and Lawrence Y. Zhu and Simar Kareer and 
                  Judy Hoffman and Danfei Xu},
  year         = {2025},
  eprint       = {2509.19626},
  archivePrefix= {arXiv},
  primaryClass = {cs.RO},
  url          = {https://arxiv.org/abs/2509.19626}
}

EgoBridge | Domain Adaptation for Generalizable Imitation from Egocentric Human Data