Submission details of JRMOT2D

Name JRMOT2D
Paper Link https://arxiv.org/pdf/2002.08397.pdf
Code Link https://sites.google.com/view/jrmot
MOTA 0.225414
MOTP 0.236158
IDs 7719
False Positives 65550
False Negatives 667783
Input N/A
Runtime 0.06 s
Environment 1 GPU (Titan X)
Abstract A robotic agent navigating autonomously needs to perceive and track the motion of objects and other agents in its surroundings to be able to plan and execute robust and safe trajectories. To be useful for planning and execution, the motion should be perceived in 3D Cartesian space. However, most recent multi-object tracking (MOT) research have focused on tracking people and moving objects in 2D RGB video sequences. This is because searching in 3D is costly, there is a lack of annotated datasets with 3D labels of moving agents and a relative scarcity of data with 3D sensor modalities. In this work we present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time, state-of-the-art tracking performance. Our system incorporates recent advancements in neural network based re-identification as well as 2D and 3D detectors and descriptors. We integrate them into a joint probabilistic data-association framework within a multi-modal recursive Kalman architecture to achieve online, real-time 3D MOT. As part of our work, we release the JRDB dataset, a novel large scale 2D+3D dataset and benchmark annotated with over 2 million boxes and 3500 time consistent 2D+3D trajectories across 54 indoor and outdoor scenes. This dataset, that we use to train and test our model, contains over 60 minutes of data including 360 degree cylindrical RGB video and 3D pointclouds, in social settings. The presented 3D MOT system demonstrates state-ofthe-art performance against competing methods on the popular 2D tracking KITTI benchmark and serves as a competitive 3D tracking baseline for our dataset and benchmark. Moreover, our tests on our social robot JackRabbot indicate that the system is capable of tracking multiple pedestrians fast and reliably.

Detailed results

Per-sequence results

sequence name MOTA MOTP IDs False Positives False Negatives
cubberly-auditorium-2019-04-22_1 29.79 24.87 140 1326 7824
discovery-walk-2019-02-28_0 21.73 23.58 95 1072 9257
discovery-walk-2019-02-28_1 32.73 21.89 115 971 8660
food-trucks-2019-02-12_0 36.59 22.33 324 979 39342
gates-ai-lab-2019-04-17_0 15.54 18.58 287 8811 12712
gates-basement-elevators-2019-01-17_0 48.13 20.24 101 337 5996
gates-foyer-2019-01-17_0 35.08 19.77 98 3592 2924
gates-to-clark-2019-02-28_0 38.81 17.15 35 387 1139
hewlett-class-2019-01-23_0 54.49 19.50 159 727 6639
hewlett-class-2019-01-23_1 82.42 18.46 37 175 691
huang-2-2019-01-25_1 29.44 22.58 56 1021 3601
huang-intersection-2019-01-22_0 13.18 27.01 160 2424 38126
indoor-coupa-cafe-2019-02-06_0 21.75 24.24 648 4194 46486
lomita-serra-intersection-2019-01-30_0 17.10 25.67 58 2191 14824
meyer-green-2019-03-16_1 12.64 24.50 114 1544 18793
nvidia-aud-2019-01-25_0 19.06 26.16 375 4463 21976
nvidia-aud-2019-04-18_1 35.08 19.25 36 1136 4011
nvidia-aud-2019-04-18_2 23.89 27.53 64 2898 4788
outdoor-coupa-cafe-2019-02-06_0 11.68 25.78 272 3249 40781
quarry-road-2019-02-28_0 3.33 25.65 31 1765 3915
serra-street-2019-01-30_0 8.78 26.08 190 3348 35705
stlc-111-2019-04-19_1 58.01 18.08 76 453 2092
stlc-111-2019-04-19_2 52.40 18.66 35 437 1256
tressider-2019-03-16_2 16.67 24.54 116 1580 20515
tressider-2019-04-26_0 15.12 25.05 1365 5631 94626
tressider-2019-04-26_1 26.62 23.75 1145 3286 136862
tressider-2019-04-26_3 19.94 26.09 1587 7553 84242
total 22.54 23.62 7719 65550 667783