CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

University of Texas at Austin
CREStE learns representations and rewards for mapless navigation by distilling priors from visual foundation models trained on internet scale data and learning from counterfactual demonstrations.

Abstract

CREStE (Counterfactuals for Reward Enhancement with Structured Embeddings) is the first approach to learn representations that address the full mapless navigation problem. CREStE learns generalizable bird's eye view (BEV) scene representations for urban environments by distilling priors from visual foundation models trained on internet-scale data. Using this representation, we predict BEV reward maps for navigation that are aligned with expert and counterfactual demonstrations. CREStE outperforms all state-of-the-art approaches in mapless urban navigation, traversing a 2 kilometer mission with just 1 intervention, demonstrating our generalizability to unseen semantic entities and terrains, challenging scenarios with little room for error, and fine-grained human preferences.

Our approach acheives this without an exhaustive list of semantic classes, large-scale robot datasets, or carefully designed reward functions. We acheive this with the following contributions: 1) A novel model architecture and learning objective that leverages visual foundation models to learn geometrically grounded semantic, geometric, and instance-aware representations 2) A counterfactual-based inverse reinforcement learning objective and framework for learning reward functions that attend to the most important features for navigation.

Learning Priors from Visual Foundation Models

CREStE proposes a novel architecture and distillation objective for synergizing semantic and instance priors from Dinov2 and SegmentAnythingv2, resulting in a lightweight perceptual encoder that predicts a generalizable BEV representation from a single RGB-D image.

Learning Rewards from Counterfactuals

CREStE introduces a principled counterfactual-based inverse reinforcement learning objective and active learning framework that queries humans for counterfactual annotations to align rewards with human preferences.

Kilometer Scale Mapless Navigation Deployment

We deploy CREStE on a 2 kilometer unseen urban loop to evaluate it on the task of long-horizon mapless navigation. Trained with only 2.5 hours of real-world demonstrations, CREStE is able to complete the entire mission with just a single intervention, demonstrating its robustness and generalizability to diverse urban environments. We include short clips from this deployment below, including the sole failure, and link the full uncut video externally for viewing.

Map

Select a location to expand a video.

Additional Quantitative Studies

We evaluate CREStE in 5 different urban environments across Austin, Texas with a variety of challenging terrains, dynamic obstacles, and diverse semantic entities. We denote the unseen environments below in red and seen environments below in green. We compare CREStE against SOTA mapless navigation approaches, and measure the average time to reach subgoal (AST), percentage of subgoals reached per mission (%S), and the number of interventions required per 100 meters (NIR).

Additional Quantitative Studies
Below, we compare CREStE against two SOTA baselines that perform geometric obstacle avoidance and follow terrain-preferences. While these approaches consider important factors for navigation, they are unable to generalize to diverse urban scenes with uneven elevation, unseen semantic classes and terrains, and novel lighting and viewpoint conditions. See our paper for full details on our quantitative experiments.

Geometric Only

Terrain + Geometric (PACER+G)

CREStE (Ours)

Acknowledgements

This work has taken place in the Autonomous Mobile Robotics Laboratory (AMRL) and Machine Decision-making through Interaction Laboratory (MIDI) at UT Austin. AMRL research is supported in part by NSF (CAREER-2046955, PARTNER-2402650) and ARO (W911NF-24-2-0025). MIDI research is supported in part by NSF (CAREER-2340651, PARTNER-2402650), DARPA (HR00112490431), and ARO (W911NF-24-1-0193). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

BibTeX

@article{zhang2025creste,
    author    = {Zhang, Arthur and Sikchi, Harsh and Zhang, Amy and Biswas, Joydeep},
    title     = {CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance},
    journal   = {arXiv},
    year      = {2025},
  }