Welcome to the JRDB wesbite, home of the JackRabbot Dataset and Benchmark! JRDB is a novel dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of multimodal sensor data including stereo cylindrical 360° RGB video at 15 fps, 3D point clouds from two Velodyne 16 Lidars, line 3D point clouds from two Sick Lidars, audio signal, RGBD video at 30 fps, 360° spherical image from a fisheye camera and encoder values from the robot’s wheels. Our dataset includes data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, from both a stationary and navigating robot platform. The dataset has been annotated with over 2.3 million bounding boxes spread over 5 individual cameras and 1.8 million associated 3D cuboids around all people in the scenes, totalling over 3500 time consistent trajectories. Together with the JRDB dataset and annotations, we have launched a benchmark and metrics for 2D and 3D person detection and tracking. The goal of JRDB is to provide a new source of data and a test-bench for research in the areas of autonomous robot navigation and all perceptual tasks related to social robotics in human environments.

News and Announcements

  • Sept. 12, 2019: 2D and 3D detection benchmarks open for submission! Submit here.
  • Sept. 4, 2019: 2D and 3D tracking benchmarks open for submission! Submit here.
  • Aug. 8, 2019: Dataset released! Download train and test splits.
  • Key Dataset Features and Novelties

    Our dataset and corresponding challenges allow for the improved evaluation of detection and tracking algorithms by including the following:

  • Crowded sequences: Some sequences include up to 260 annotated persons in a given environment.
  • Novel environments: The dataset includes both indoor and outdoor environments. This is the first large dataset with annotated indoor scenes. In outdoor scenes our dataset is acquired from a pedestrian perspective (e.g. from curbs) instead of from a vehicle perspective.
  • Novel perspective: The data is acquired from a robot platform of human-comparable size and therefore it is similar to the egocentric view of a person. This creates scenarios with multiple occlusions and poses a significant challenge for detection, tracking and other perception tasks.
  • Stationary and dynamic sensor perspectives: The sequences present a combination of sensor signals acquired from stationary and moving robot perspectives.
  • Multi-modal sensor streams: Appart from multiple RGB and Lidar signals our dataset includes RGBD images, GPS signals, IMU measurements, odometry values, and audio recordings.
  • Organizers

    Hamid Rezatofighi

    Hamid

    Rezatofighi

    Roberto Martín-Martín

    Roberto

    Martín-Martín

    Claudia D'Arpino

    Claudia

    D'Arpino

    Ian Reid

    Ian

    Reid

    Silvio Savarese

    Silvio

    Savarese

    Team

    Alan Federman

    Alan Federman

    Patrick Goebel

    Patrick Goebel

    JunYoung Gwak

    JunYoung Gwak

    Mihir Patel

    Mihir Patel

    Mahsa Ehsanpour

    Mahsa Ehsanpour