The UT Campus Object Dataset is the largest multiclass, multimodal urban robotics dataset to date, with 1.3 million 3D bounding box annotations for 53 object classes, 204 million annotated points for 24 terrain classes, and globally consistent pseudo-ground truth poses.

3D bounding box annotations from point clouds in CODa visualized on RGB images.

Abstract

We introduce the UT Campus Object Dataset (CODa), an egocentric robot perception dataset collected on the University of Texas Austin Campus. Our dataset contains 8.5 hours of multimodal sensor data: hardware synchronized high resolution 3D point clouds and stereo RGB cameras, RGB-D videos, and 9-DOF IMU data. We provide 58 minutes of ground-truth annotations containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic classes, 5000 frames of 3D semantic annotations for urban terrain, and pseudo-ground truth localization. We repeatedly traverse identical geographic locations for a wide range of indoor and outdoor areas, weather conditions, and times of the day. Using CODa, we empirically demonstrate that: 1) 3D object detection performance in urban settings is significantly higher when trained using CODa compared to existing datasets even when employing state-of-the-art domain adaptation approaches, 2) sensor-specific fine-tuning improves 3D object detection accuracy and 3) pretraining on CODa improves cross-dataset 3D object detection performance in urban settings compared to pretraining on AV datasets.

3D terrain annotations from point clouds in CODa visualized on RGB images.

Summary Video

Rights

CODa is available for non-commercial use under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (“CC BY-NC-SA 4.0”). The CC BY-NC-SA 4.0 may be accessed at https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode. When you download or use the Datasets from this website or elsewhere, you are agreeing to comply with the terms of CC BY-NC-SA 4.0 as applicable, and also agreeing to the Dataset Terms. Where these Dataset Terms conflict with the terms of CC BY-NC-SA 4.0, these Dataset Terms shall prevail.

Citation

If you find our Dataset and Paper useful, please cite both the dataset and paper:

Dataset

@misc{ coda2023tdr,
  author = {Zhang, Arthur and Eranki, Chaitanya and Zhang, Christina and Hong, Raymond and Kalyani, Pranav and Kalyanaraman, Lochana and Gamare, Arsh and Esteva, Maria and Biswas, Joydeep},
  publisher = {Texas Data Repository},
  title = { {UT Campus Object Dataset (CODa)}},
  year = {2023},
  version = {DRAFT VERSION},
  doi = {10.18738/T8/BBOQMV},
  url = {https://doi.org/10.18738/T8/BBOQMV}
}

Paper

@misc{zhang2023robust,
  title={Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset},
  author={Arthur Zhang and Chaitanya Eranki and Christina Zhang and Ji-Hwan Park and Raymond Hong and Pranav Kalyani and Lochana Kalyanaraman and Arsh Gamare and Arnav Bagad and Maria Esteva and Joydeep Biswas},
  year={2023},
  eprint={2309.13549}
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

Announcements

December 2023 - Dataset Version 2 release
- Reduced stereo rectification calibration error
- Added 3D point cloud to rectified image projection matrices
- Added disparity to 3D world coordinate projection matrices
- Fixed dense 6DOF robot poses using Lie Algebra interpolation
- Introduced globally consistent 6 DOF robot poses using AIST interactive slam package
- Added pseudo-ground truth stereo depth images for ZED RGB-D Camera (Cam3/4)
- Migrated September 2023 dataset release to CODa_v1

September 2023 - Initial Dataset Release with:
- Egocompensated 3D point clouds
- Synchronized, rectified stereo RGB images
- Dense locally consistent 6DOF poses (using linear interpolation)
- 1.3 million 3D bounding box annotations for 53 object classes, 204 million annotated points for 24 terrain classes
- Per sequence timestamps for each synchronized frame
- Sensor intrinsic calibrations for all modalities
- Sensor extrinsic calibrations for 3D LiDAR, stereo RGB cameras, ZED RGB-D camera (Cam3/4), and 9 DOF IMU