Classification of Road Users and Roadside Infrastructure using Multi-Modal Sensor Input

Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
A reliable environmental perception is crucial for autonomous vehicles to ensure safe maneuver planning and trajectory execution. Road users and roadside infrastructure classification is an integral feature of automotive perception, particularly in cluttered urban environments. Within urban environments, vulnerable road users like pedestrians and cyclists are present, which can quickly change their direction of movement. Therefore, accurate road user classification is necessary to consider their actions in maneuver plannings and thus, ensure an anticipatory and safe driving behavior.
The requirements for a reliable object classification are adequate sensor modalities to perceive the environment. Hence, autonomous cars are equipped with a multi- modal sensor setup such as Light Detection and Ranging (LiDAR), Radio Detection and Ranging (RaDAR), and RGB camera sensors. These modalities rely on different measurement principles, which are robust against various environmental influences. With this multi-modal setup, an autonomous vehicle captures complementary information and thus, continue to sense the environment even if one modality gets disturbed by environmental influences. However, capturing complementary features is only the prerequisite of a reliable perception. Furthermore, a sensor fusion module is mandatory, which overcomes gradual sensor failures and total sensor losses to provide a reliable perception.
Additionally, automotive perception requires object recognition algorithms to detect and classify objects in raw sensor data. For this task, Neural Networks (NNs) are suitable algorithms because first, they achieve state-of-the-art performance in object recognition, and second, NNs steadily improve their performance with larger training data sets. However, NNs are also known for their limitations on data, which does not follow the training data distribution, and even worse, assign high confidences to these out-of-distribution (OOD) samples. Thus, perception systems or fusion modules cannot distinguish correct from erroneous classifications, which further poses risks for a safety-critical application like autonomous driving. Since it is infeasible to represent every environmental influence on sensors and resulting noise patterns on the data in the training set, classifiers must detect these OOD samples and assign low confidences to them. Only thus, a fusion module can identify erroneous classifications due to OOD data and ensure a reliable perception.
It is a common notion that a NN’s performance improves with training data quantity. However, for LiDAR and RaDAR data, only a few and relatively small data sets exist compared to cameras because point clouds are time-consuming, and thus, expensive to label. Therefore, strategies must be developed to ease the labeling effort to support the generation of labeled point cloud data sets, which following enables the development of NN based object recognition algorithm. Within this thesis, a multi-modal object classification utilizing NNs is presented. This classification relies on encoders, which are trained unsupervised with multimodal data and additional regularization techniques to learn robust features. Then uni-modal classifiers use these features as input in combination with a late fusion technique to become robust against novel noise patterns. The late fusion module exploits complementary measurement principles to overcome gradual sensor failures and total sensor losses. This approach relies on labeled data for each input modality due to supervised classification algorithms. Therefore, a transductive transfer learning approach is proposed to transfer the knowledge from images to point cloud data to support supervised point cloud classifier development by providing additional training data. This method relies on unsupervised training procedures and thus, transfers labels from images to point clouds without manual labeling effort. Furthermore, an approach for detecting OOD samples is introduced within classification modules and thus, enables the application of NNs through assigning low confidences to OOD data. The proposed method relies on auxiliary training techniques and post hoc statistics, which do not require additional data sets during training nor run time costs during inference.
In summary, the main contributions of this thesis are the support of NN based algorithms for reliable object classification in safety-critical applications like autonomous driving. Further, the presented unsupervised label transfer supports point cloud data set generation, which saves millions in labeling costs since auto- motive qualification requires labeled data in the order of hundreds of thousands of kilometers for perception algorithms. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000493879Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Siegwart, Roland Y.
Examiner: Frazzoli, Emilio
Examiner: Kochenderfer, Mykel J.
Publisher
ETH ZurichOrganisational unit
03737 - Siegwart, Roland Y. / Siegwart, Roland Y.
More
Show all metadata
ETH Bibliography
yes
Altmetrics