Submission details of JRMOT

Name JRMOT
Paper Link https://arxiv.org/abs/2002.08397
Code Link https://sites.google.com/view/jrmot
MOTA 0.201515
MOTP 0.424617
IDs 4207
False Positives 19711
False Negatives 765907
Input N/A
Runtime 0.06 s
Environment 1 GPU (Titan X)
Abstract A robotic agent navigating autonomously needs to perceive and track the motion of objects and other agents in its surroundings to be able to plan and execute robust and safe trajectories. To be useful for planning and execution, the motion should be perceived in 3D Cartesian space. However, most recent multi-object tracking (MOT) research have focused on tracking people and moving objects in 2D RGB video sequences. This is because searching in 3D is costly, there is a lack of annotated datasets with 3D labels of moving agents and a relative scarcity of data with 3D sensor modalities. In this work we present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time, state-of-the-art tracking performance. Our system incorporates recent advancements in neural network based re-identification as well as 2D and 3D detectors and descriptors. We integrate them into a joint probabilistic data-association framework within a multi-modal recursive Kalman architecture to achieve online, real-time 3D MOT. As part of our work, we release the JRDB dataset, a novel large scale 2D+3D dataset and benchmark annotated with over 2 million boxes and 3500 time consistent 2D+3D trajectories across 54 indoor and outdoor scenes. This dataset, that we use to train and test our model, contains over 60 minutes of data including 360 degree cylindrical RGB video and 3D pointclouds, in social settings. The presented 3D MOT system demonstrates state-ofthe-art performance against competing methods on the popular 2D tracking KITTI benchmark and serves as a competitive 3D tracking baseline for our dataset and benchmark. Moreover, our tests on our social robot JackRabbot indicate that the system is capable of tracking multiple pedestrians fast and reliably.

Detailed results

Per-sequence results

sequence name MOTA MOTP IDs False Positives False Negatives
cubberly-auditorium-2019-04-22_1 29.94 41.63 86 273 9012
discovery-walk-2019-02-28_0 24.21 40.46 59 164 9954
discovery-walk-2019-02-28_1 29.92 40.05 65 417 9741
food-trucks-2019-02-12_0 30.45 45.12 283 530 44031
gates-ai-lab-2019-04-17_0 38.77 41.37 196 1478 14193
gates-basement-elevators-2019-01-17_0 50.34 37.13 79 157 5701
gates-foyer-2019-01-17_0 53.83 43.93 62 619 4177
gates-to-clark-2019-02-28_0 45.06 39.09 17 94 1290
hewlett-class-2019-01-23_0 52.32 43.43 140 373 7409
hewlett-class-2019-01-23_1 79.56 39.46 33 125 905
huang-2-2019-01-25_1 33.51 38.55 33 256 4119
huang-intersection-2019-01-22_0 12.78 42.17 127 3492 37161
indoor-coupa-cafe-2019-02-06_0 15.27 43.45 274 1472 57955
lomita-serra-intersection-2019-01-30_0 24.41 42.09 40 113 15337
meyer-green-2019-03-16_1 14.00 39.20 58 216 19655
nvidia-aud-2019-01-25_0 21.56 45.41 148 785 26093
nvidia-aud-2019-04-18_1 39.64 44.62 38 275 4500
nvidia-aud-2019-04-18_2 36.32 40.10 50 187 6316
outdoor-coupa-cafe-2019-02-06_0 8.30 46.95 179 1328 49127
quarry-road-2019-02-28_0 16.20 35.75 16 467 4909
serra-street-2019-01-30_0 11.83 40.96 110 941 37821
stlc-111-2019-04-19_1 56.37 40.24 63 274 2405
stlc-111-2019-04-19_2 56.88 38.85 35 81 1451
tressider-2019-03-16_2 20.86 38.69 59 101 20636
tressider-2019-04-26_0 9.57 44.51 619 2361 114202
tressider-2019-04-26_1 19.25 40.39 708 534 157478
tressider-2019-04-26_3 15.27 46.26 630 2598 100329
total 20.15 42.46 4207 19711 765907