Learning Common and Transferable Feature Representations for Multi-Modal Data
- Conference Paper
LiDAR sensors are crucial in automotive perception for accurate object detection. However, LiDAR data is hard to interpret for humans and consequently time-consuming to label. Whereas camera data is easy interpretable and thus, comparably simpler to label. Within this work we present a transductive transfer learning approach to transfer the knowledge for the object detection task from images to point cloud data. We propose a multi-modal adversarial Auto Encoder architecture which disentangles uni-modal features into two groups: common (transferable) features, and complementary (modality-specific) features. This disentanglement is based on the hypothesis that a set of common features exist. An important point of our framework is that the disentanglement is learned in an unsupervised manner. Furthermore, the results show that only a small amount of multi-modal data is needed to learn the disentanglement, and thus to transfer the knowledge between modalities. As a result we our experiments show that training with 75% less data of the KITTI objects, the classification accuracy achieved is of 71.75%, only 3.12% less than when using the full data set. The implications of these findings can have great impact in perception pipelines based on LIDAR data. Mehr anzeigen
Buchtitel2020 IEEE Intelligent Vehicles Symposium (IV)
Seiten / Artikelnummer
AnmerkungenDue to the Coronavirus (COVID-19) the conference was conducted virtually.