Multimodal Virtual Point 3D Detection

3D Lidar sensors are used to provide accurate depth measurements for autonomous driving in a form of 3D detection of various surrounding objects. However, they are costly and receive only one or two measurements for small or far away objects, and are also influenced by other negative factors such as rainfall.

A recent paper on proposes a simple and effective framework to fuse 3D Lidar and high-resolution RGB sensors measurements.

Autonomous vehicle with LiDAR sensors on top. Image credit: Ford

RGB measurements are lifted into 3D virtual points by mapping them into the scene using close-by depth measurements of a Lidar sensor. A high-resolution 3D point cloud is generated near target objects. A center-based 3D detector then identifies all objects in the scene. 2D object detections are well optimized and highly accurate using the suggested approach, even for small objects.

Furthermore, virtual points reduce the density imbalance between close and faraway objects, making the point cloud measurement of these objects more consistent.

Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches. Code and more visualizations are available at this https URL

Research paper: Yin, T., Zhou, X., and Krähenbühl, P., “Multimodal Virtual Point 3D Detection”, 2021. Link: