Submission details of Person-MinkUNet

Name Person-MinkUNet
Paper Link https://www.vision.rwth-aachen.de/publication/00211/
Code Link N/A
AP 0.764179
Input N/A
Runtime 0.059 s
Environment 1 TITAN RTX
Abstract We take pointcloud from both upper and lower LiDARs, and voxelize them with 0.05x0.05x0.1 m voxels. The vowelized points are then passed into a Minkowski u-net backbone (implementation from https://github.com/mit-han-lab/spvnas/). We use a one-stage detector paradigm, where each active voxel directly generates a box, not using RPN or pooling. From all the boxes, a non-maximum-suppression is applied to obtain final detections. The network is trained for 40 epochs, with batch size 36 on a RTX TITAN. We use Adam optimizer, with a 1e-3 learning rate, with exponential decay to 1e-6 starting at the 15 epoch. For data augmentation, we used random scaling and random rotation along vertical axis.

Detailed results

Overall precision/recall curve

Per-sequence results

sequence name AP
cubberly-auditorium-2019-04-22_1 0.829508
discovery-walk-2019-02-28_0 0.9459
discovery-walk-2019-02-28_1 0.877201
food-trucks-2019-02-12_0 0.861628
gates-ai-lab-2019-04-17_0 0.758734
gates-basement-elevators-2019-01-17_0 0.866201
gates-foyer-2019-01-17_0 0.853691
gates-to-clark-2019-02-28_0 0.934993
hewlett-class-2019-01-23_0 0.852305
hewlett-class-2019-01-23_1 0.944934
huang-2-2019-01-25_1 0.79817
huang-intersection-2019-01-22_0 0.899338
indoor-coupa-cafe-2019-02-06_0 0.634596
lomita-serra-intersection-2019-01-30_0 0.927623
meyer-green-2019-03-16_1 0.671656
nvidia-aud-2019-01-25_0 0.667807
nvidia-aud-2019-04-18_1 0.742776
nvidia-aud-2019-04-18_2 0.846623
outdoor-coupa-cafe-2019-02-06_0 0.669936
quarry-road-2019-02-28_0 0.805768
serra-street-2019-01-30_0 0.907425
stlc-111-2019-04-19_1 0.939276
stlc-111-2019-04-19_2 0.885658
tressider-2019-03-16_2 0.848943
tressider-2019-04-26_0 0.70935
tressider-2019-04-26_1 0.781765
tressider-2019-04-26_3 0.716176
total 0.764179