Cross-Dataset Universal Embedding Space
Autonomous driving systems often struggle to generalize across diverse datasets due to varying sensor configurations, annotation quality, and environmental conditions. We propose a novel framework that enables universal traffic scene understanding by leveraging two key ingredients: (i) a temporal graph representing the traffic scene over time which we term Traffic Scene Graph (TSG) (ii) a knowledge-transfer mechanism that projects these graph features into a common embedding space via an Unsupervised Scene Translation (UST) network. By performing inference in this common space — the Shared Graph Scene Embedding Space (SGSES) — we can train a single network for downstream tasks that generalizes across datasets and scene types. To illustrate this approach, we focus on a risk-assessment task, transferring knowledge from a high-quality dataset (e.g., NuPlan) to a lower-quality dataset (e.g., Learn to Drive (L2D)). Our pipeline first extracts comprehensive features — including 3D relative velocities, depth estimates, and visual embeddings — using a multi-stage process combining object detection, depth estimation, and learned image representations. These features are used to build the TSG that are processed by dataset-specific encoders. A projection head trained with contrastive learning aligns the resulting embeddings in SGSES. On the risk-prediction task, our framework significantly improves L2D’s performance via knowledge transfer from NuPlan, yielding a substantial reduction in risk-assessment error compared to single-dataset baselines. Our results demonstrate a scalable solution to lever- age multiple autonomous-driving datasets, lowering annotation requirements and enhancing generalization across diverse driving environments.
Project Overview
Autonomous driving systems often struggle to generalize across diverse datasets due to varying sensor configurations, annotation quality, and environmental conditions. We propose a novel framework that enables universal traffic scene understanding by leveraging two key ingredients: (i) a temporal graph representing the traffic scene over time which we term Traffic Scene Graph (TSG) (ii) a knowledge-transfer mechanism that projects these graph features into a common embedding space via an Unsupervised Scene Translation (UST) network. By performing inference in this common space — the Shared Graph Scene Embedding Space (SGSES) — we can train a single network for downstream tasks that generalizes across datasets and scene types. To illustrate this approach, we focus on a risk-assessment task, transferring knowledge from a high-quality dataset (e.g., NuPlan) to a lower-quality dataset (e.g., Learn to Drive (L2D)). Our pipeline first extracts comprehensive features — including 3D relative velocities, depth estimates, and visual embeddings — using a multi-stage process combining object detection, depth estimation, and learned image representations. These features are used to build the TSG that are processed by dataset-specific encoders. A projection head trained with contrastive learning aligns the resulting embeddings in SGSES. On the risk-prediction task, our framework significantly improves L2D’s performance via knowledge transfer from NuPlan, yielding a substantial reduction in risk-assessment error compared to single-dataset baselines. Our results demonstrate a scalable solution to lever- age multiple autonomous-driving datasets, lowering annotation requirements and enhancing generalization across diverse driving environments.
The Challenge
Autonomous driving models struggle to generalize across datasets due to differences in sensors, annotations, and environments. Models trained on one dataset perform poorly when applied to another with different data quality or modalities. This limits reuse of data and increases dependence on high-quality annotations.
Our Solution
Framework was develop that represents traffic scenes as temporal graphs and aligns them into a shared embedding space across datasets. A knowledge transfer mechanism enables learning from high-quality datasets and applying it to lower-quality ones without paired data. This allows a single model to generalize across different datasets and sensor configurations.
Technology Stack
Results That Matter
Measurable impact delivered through innovative AI solutions
Goal 1
Goal 2
Ready to achieve similar results for your business?
Project Gallery
Visual highlights from the implementation
Ready to Transform Your Business?
Let's discuss how we can create a custom AI solution that delivers measurable results for your organization.

