Hermann Blum


Loading...

Last Name

Blum

First Name

Hermann

Organisational unit

Search Results

Publications 1 - 10 of 41
  • Ji, Guangda; Weder, Silvan; Engelmann, Francis; et al. (2025)
    2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    Neural network performance scales with both model size and data volume, as shown in both language and image processing. This requires scaling-friendly architectures and large datasets. While transformers have been adapted for 3D vision, a `GPT-moment' remains elusive due to limited training data. We introduce ARKit LabelMaker, a large-scale real-world 3D dataset with dense semantic annotation that is more than three times larger than prior largest dataset. Specifically, we extend ARKitScenes with automatically generated dense 3D labels using an extended LabelMaker pipeline, tailored for large-scale pre-training. Training on our dataset improves accuracy across architectures, achieving state-of-the-art 3D semantic segmentation scores on ScanNet and ScanNet200, with notable gains on tail classes. Our code is available at https://labelmaker.org and our dataset at https://huggingface.co/datasets/labelmaker/arkit_labelmaker.
  • Blum, Hermann; Gawel, Abel Roman; Siegwart, Roland; et al. (2018)
    2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Sensor fusion is a fundamental process in robotic systems as it extends the perceptual range and increases robustness in real-world operations. Current multi-sensor deep learning based semantic segmentation approaches do not provide robustness to under-performing classes in one modality, or require a specific architecture with access to the full aligned multi-sensor training data. In this work, we analyze statistical fusion approaches for semantic segmentation that overcome these drawbacks while keeping a competitive performance. The studied approaches are modular by construction, allowing to have different training sets per modality and only a much smaller subset is needed to calibrate the statistical models. We evaluate a range of statistical fusion approaches and report their performance against state-of-the-art baselines on both realworld and simulated data. In our experiments, the approach improves performance in IoU over the best single modality segmentation results by up to 5%. We make all implementations and configurations publicly available.
  • Blum, Hermann; Milano, Francesco; Zurbrügg, René; et al. (2021)
    5th Annual Conference on Robot Learning (CoRL 2021)
    We propose a novel robotic system that can improve its semantic perception during deployment. Contrary to the established approach of learning semantics from large datasets and deploying fixed models, we propose a framework in which semantic models are continuously updated on the robot to adapt to the deployment environments. Our system therefore tightly couples multi-sensor perception and localisation to continuously learn from self-supervised pseudo labels. We study this system in the context of a construction robot registering LiDAR scans of cluttered environments against building models. Our experiments show how the robot's semantic perception improves during deployment and how this translates into improved 3D localisation by filtering the clutter out of the LiDAR scan, even across drastically different environments. We further study the risk of catastrophic forgetting that such a continuous learning setting poses. We find memory replay an effective measure to reduce forgetting and show how the robotic system can improve even when switching between different environme nts. On average, our system improves by 60% in segmentation and 10% in localisation compared to deployment of a fixed model, and it keeps this improvement up while adapting to further environments.
  • Müller, Marcus G.; Durner, Maximilian; Boerdijk, Wout; et al. (2023)
    2023 IEEE Aerospace Conference
    Terrain Segmentation information is crucial input for current and future planetary robotic missions. Labeling training data for terrain segmentation is a difficult task and can often cause semantic ambiguity. As a result, large portion of an image usually remains unlabeled. Therefore, it is difficult to evaluate network performance on such regions. Worse is the problem of using such a network for inference, since the quality of predictions cannot be guaranteed if trained with a standard semantic segmentation network. This can be very dangerous for real autonomous robotic missions since the network could predict any of the classes in a particular region, and the robot does not know how much of the prediction to trust. To overcome this issue, we investigate the benefits of uncertainty estimation for terrain segmentation. Knowing how certain the network is about its prediction is an important element for a robust autonomous navigation. In this paper, we present neural networks, which not only give a terrain segmentation prediction, but also an uncertainty estimation. We compare the different methods on the publicly released real world Mars data from the MSL mission.
  • Di Biase, Giancarlo; Blum, Hermann; Siegwart, Roland; et al. (2021)
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    The inability of state-of-the-art semantic segmentation methods to detect anomaly instances hinders them from being deployed in safety-critical and complex applications, such as autonomous driving. Recent approaches have focused on either leveraging segmentation uncertainty to identify anomalous areas or re-synthesizing the image from the semantic label map to find dissimilarities with the input image. In this work, we demonstrate that these two methodologies contain complementary information and can be combined to produce robust predictions for anomaly segmentation. We present a pixel-wise anomaly detection framework that uses uncertainty maps to improve over existing re-synthesis methods in finding dissimilarities between the input and generated images. Our approach works as a general framework around already trained segmentation networks, which ensures anomaly detection without compromising segmentation accuracy, while significantly out performing all similar methods. Top-2 performance across a range of different anomaly datasets shows the robustness of our approach to handling different anomaly instances.
  • Behrens, Tjark; Zurbrügg, René; Pollefeys, Marc; et al. (2025)
    IEEE Robotics and Automation Letters
    Recent approaches have successfully focused on the segmentation of static reconstructions, thereby equipping downstream applications with semantic 3D understanding. However, the world in which we live is dynamic, characterized by numerous interactions between the environment and humans or robotic agents. Static semantic maps are unable to capture this information, and the naive solution of rescanning the environment after every change is both costly and ineffective in tracking e.g. objects being stored away in drawers. With Lost & Found we present an approach that addresses this limitation. Based solely on egocentric recordings with corresponding hand position and camera pose estimates, we are able to track the 6DoF poses of the moving object within the detected interaction interval. These changes are applied online to a transformable scene graph that captures object-level relations. Compared to state-of-the-art object pose trackers, our approach is more reliable in handling the challenging egocentric viewpoint and the lack of depth information. It outperforms the second-best approach by 34% and 56% for translational and orientational error, respectively, and produces visibly smoother 6DoF object trajectories. In addition, we illustrate how the acquired interaction information in the dynamic scene graph can be employed in the context of robotic applications that would otherwise be unfeasible: We show how our method allows to command a mobile manipulator through teach & repeat, and how information about prior interaction allows a mobile manipulator to retrieve an object hidden in a drawer.
  • Sun, Boyang; Xing, Jiaxu; Blum, Hermann; et al. (2022)
    2022 International Conference on Robotics and Automation (ICRA)
    Autonomous robots deal with unexpected scenarios in real environments. Given input images, various visual perception tasks can be performed, e.g., semantic segmentation, depth estimation and normal estimation. These different tasks provide rich information for the whole robotic perception system. All tasks have their own characteristics while sharing some latent correlations. However, some of the task predictions may suffer from the unreliability dealing with complex scenes and anomalies. We propose an attention-based failure detection approach by exploiting the correlations among multiple tasks. The proposed framework infers task failures by evaluating the individual prediction, across multiple visual perception tasks for different regions in an image. The formulation of the evaluations is based on an attention network supervised by multi-task uncertainty estimation and their corresponding prediction errors. Our proposed framework¹ generates more accurate estimations of the prediction error for the different task’s predictions.
  • Global Localization in Meshes
    Item type: Conference Paper
    Dreher, Marc; Blum, Hermann; Siegwart, Roland; et al. (2021)
    ISARC Proceedings ~ Proceedings of the 38th International Symposium on Automation and Robotics in Construction (ISARC)
    Safely waking up a robot at an unknown location and subsequent autonomous operation are key requirements for on-site construction robots. In this regard, single-shot global localization in a known map is a challenging problem due to incomplete observations of the environment and sensor obstructions by unmapped clutter. In this work, we address global localization of sparse multi-beam LiDAR measurements in a 3D mesh building model, a typical setup for construction robots. Our solution extracts and summarizes planes from the LiDAR scan and matches them to the building mesh. We evaluate different options for the registration problem, and evaluate the system on simulated and real-world datasets. The best performing system uses a combination of the Randomized Hough Transform (RHT) and a modified version of the Plane Registration based on a Unit Sphere (PRRUS) algorithm. For sparse and noisy robotic sensors, our system outperforms contemporary systems like Go-ICP by a large margin.
  • Blum, Hermann; Sarlin, Paul-Edouard; Nieto, Juan; et al. (2021)
    International Journal of Computer Vision
    Deep learning has enabled impressive progress in the accuracy of semantic segmentation. Yet, the ability to estimate uncertainty and detect failure is key for safety-critical applications like autonomous driving. Existing uncertainty estimates have mostly been evaluated on simple tasks, and it is unclear whether these methods generalize to more complex scenarios. We present Fishyscapes, the first public benchmark for anomaly detection in a real-world task of semantic segmentation for urban driving. It evaluates pixel-wise uncertainty estimates towards the detection of anomalous objects. We adapt state-of-the-art methods to recent semantic segmentation models and compare uncertainty estimation approaches based on softmax confidence, Bayesian learning, density estimation, image resynthesis, as well as supervised anomaly detection methods. Our results show that anomaly detection is far from solved even for ordinary situations, while our benchmark allows measuring advancements beyond the state-of-the-art. Results, data and submission information can be found at https://fishyscapes.com/.
  • Marchal, Nicolas; Moraldo, Charlotte; Siegwart, Roland; et al. (2019)
    arXiv
    Deep learning has enabled remarkable advances in scene understanding, particularly in semantic segmentation tasks. Yet, current state of the art approaches are limited to a closed set of classes, and fail when facing novel elements, also known as out of distribution (OoD) data. This is a problem as autonomous agents will inevitably come across a wide range of objects, all of which cannot be included during training. We propose a novel method to distinguish any object (foreground) from empty building structure (background) in indoor environments. We use normalizing flow to estimate the probability distribution of high-dimensional background descriptors. Foreground objects are therefore detected as areas in an image for which the descriptors are unlikely given the background distribution. As our method does not explicitly learn the representation of individual objects, its performance generalizes well outside of the training examples. Our model results in an innovative solution to reliably segment foreground from background in indoor scenes, which open s the way to a safer deployment of robots in human environments.
Publications 1 - 10 of 41