Michael Hersche
Loading...
30 results
Filters
Reset filtersSearch Results
Publications 1 - 10 of 30
- Scalable Evaluation and Neural Models for Compositional GeneralizationItem type: Conference PaperCamposampiero, Giacomo; Barbiero, Pietro; Hersche, Michael; et al. (2025)Compositional generalization—a key open challenge in modern machine learning—requires models to predict unknown combinations of known concepts. However, assessing compositional generalization remains a fundamental challenge due to the lack of standardized evaluation protocols and the limitations of current benchmarks, which often favor efficiency over rigor. At the same time, general-purpose vision architectures lack the necessary inductive biases, and existing approaches to endow them compromise scalability. As a remedy, this paper introduces: 1) a rigorous evaluation framework that unifies and extends previous approaches while reducing computational requirements from combinatorial to constant; 2) an extensive and modern evaluation on the status of compositional generalization in supervised vision backbones, training more than 5000 models; 3) Attribute Invariant Networks, a class of models establishing a new Pareto frontier in compositional generalization, achieving a 23.43% accuracy improvement over baselines while reducing parameter overhead from 600% to 16% compared to fully disentangled counterparts.
- Integrating Event-based Dynamic Vision Sensors with Sparse Hyperdimensional Computing: A Low-power Accelerator with Online CapabilityItem type: Conference Paper
ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and DesignHersche, Michael; Mello Rella, Edoardo; Di Mauro, Alfio; et al. (2020)We propose to embed features extracted from event-driven dynamic vision sensors to binary sparse representations in hyperdimensional (HD) space for regression. This embedding compresses events generated across 346x260 differential pixels to a sparse 8160-bit vector by applying random activation functions. The sparse representation not only simplifies inference, but also enables online learning with the same memory footprint. Specifically, it allows efficient updates by retaining binary vector components over the course of online learning that cannot be otherwise achieved with dense representations demanding multibit vector components. We demonstrate online learning capability: using estimates and confidences of an initial model trained with only 25% of data, our method continuously updates the model for the remaining 75% of data, resulting in a close match with accuracy obtained with an oracle model on ground truth labels. When mapped on an 8-core accelerator, our method also achieves lower error, latency, and energy compared to other sparse/dense alternatives. Furthermore, it is 9.84x more energy-efficient and 6.25x faster than an optimized 9-layer perceptron with comparable accuracy. - Energy Efficient In-Memory Hyperdimensional Encoding for Spatio-Temporal Signal ProcessingItem type: Journal Article
IEEE Transactions on Circuits and Systems II. Express BriefsKarunaratne, Geethan; Le Gallo, Manuel; Hersche, Michael; et al. (2021)The emerging brain-inspired computing paradigm known as hyperdimensional computing (HDC) has been proven to provide a lightweight learning framework for various cognitive tasks compared to the widely used deep learning-based approaches. Spatio-temporal (ST) signal processing, which encompasses biosignals such as electromyography (EMG) and electroencephalography (EEG), is one family of applications that could benefit from an HDC-based learning framework. At the core of HDC lie manipulations and comparisons of large bit patterns, which are inherently ill-suited to conventional computing platforms based on the von-Neumann architecture. In this work, we propose an architecture for ST signal processing within the HDC framework using predominantly in-memory compute arrays. In particular, we introduce a methodology for the in-memory hyperdimensional encoding of ST data to be used together with an in-memory associative search module. We show that the in-memory HDC encoder for ST signals offers at least 1.80× energy efficiency gains, 3.36× area gains, as well as 9.74× throughput gains compared with a dedicated digital hardware implementation. At the same time it achieves a peak classification accuracy within 0.04% of that of the baseline HDC framework. - Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?Item type: Conference Paper
Proceedings of Machine Learning Research ~ Proceedings of The 19th International Conference on Neurosymbolic Learning and ReasoningCamposampiero, Giacomo; Hersche, Michael; Wattenhofer, Roger; et al. (2025)This work presents a first evaluation of two state-of-the-art Large Reasoning Models (LRMs), OpenAI’s o3-mini and DeepSeek R1, on analogical reasoning, focusing on well-established nonverbal human IQ tests based on Raven’s progressive matrices. We benchmark with the I-RAVEN dataset and its extension, I-RAVEN-X, which tests the ability to generalize to longer reasoning rules and ranges of the attribute values. To assess the influence of visual uncertainties on these symbolic analogical reasoning tests, we extend the I-RAVEN-X dataset, which otherwise assumes an oracle perception. We adopt a two-fold strategy to simulate this imperfect visual perception: 1) we introduce confounding attributes which, being sampled at random, do not contribute to the prediction of the correct answer of the puzzles, and 2) smooth the distributions of the input attributes’ values. We observe a sharp decline in OpenAI’s o3-mini task accuracy, dropping from 86.6% on the original I-RAVEN to just 17.0%—approaching random chance—on the more challenging I-RAVEN-X, which increases input length and range and emulates perceptual uncertainty. This drop occurred despite spending 3.4x more reasoning tokens. A similar trend is also observed for DeepSeek R1: from 80.6% to 23.2%. On the other hand, a neuro-symbolic probabilistic abductive model, ARLC, that achieves state-of-the-art performances on I-RAVEN, can robustly reason under all these out-of-distribution tests, maintaining strong accuracy with only a modest accuracy reduction from 98.6% to 88.0%. Our code is available at https://github.com/IBM/raven-large-language-models. - 12 mJ per Class On-Device Online Few-Shot Class-Incremental LearningItem type: Conference Paper
2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)Wibowo, Yoga Esa; Cioflan, Cristian; Ingolfsson, Thorir Mar; et al. (2024)Few-Shot Class-Incremental Learning (FSCIL) enables machine learning systems to expand their inference capabilities to new classes using only a few labeled examples, without forgetting the previously learned classes. Classical backpropagation-based learning and its variants are often unsuitable for battery-powered, memory-constrained systems at the extreme edge. In this work, we introduce Online Few-Shot Class-Incremental Learning (O-FSCIL), based on a lightweight model consisting of a pretrained and metalearned feature extractor and an expandable explicit memory storing the class prototypes. The architecture is pretrained with a novel feature orthogonality regularization and metalearned with a multi-margin loss. For learning a new class, our approach extends the explicit memory with novel class prototypes, while the remaining architecture is kept frozen. This allows learning previously unseen classes based on only a few examples with one single pass (hence online). O-FSCIL obtains an average accuracy of 68.62% on the FSCIL CIFAR100 benchmark, achieving state-of-the-art results. Tailored for ultra-low-power platforms, we implement O-FSCIL on the 60mW GAP9 microcontroller, demonstrating online learning capabilities within just 12 mJ per new class. - Mixed-Precision Quantization and Parallel Implementation of Multispectral Riemannian Classification for Brain-Machine InterfacesItem type: Conference Paper
2021 IEEE International Symposium on Circuits and Systems (ISCAS)Wang, Xiaying; Schneider, Tibor; Hersche, Michael; et al. (2021)With Motor-Imagery (MI) Brain-Machine Interfaces (BMIs) we may control machines by merely thinking of performing a motor action. Practical use cases require a wearable solution where the classification of the brain signals is done locally near the sensor using machine learning models embedded on energy-efficient microcontroller units (MCUs), for assured privacy, user comfort, and long-term usage. In this work, we provide practical insights on the accuracy-cost tradeoff for embedded BMI solutions. Our proposed Multispectral Riemannian Classifier reaches 75.1% accuracy on 4-class MI task. We further scale down the model by quantizing it to mixed-precision representations with a minimal accuracy loss of 1%, which is still 3.2% more accurate than the state-of-the- art embedded convolutional neural network. We implement the model on a low-power MCU with parallel processing units taking only 33.39 ms and consuming 1.304 mJ per classification. © 2021 IEEE - Physically-Constrained Adversarial Attacks on Brain-Machine InterfacesItem type: Conference Paper
Workshop on Trustworthy and Socially Responsible Machine Learning (TSRML 2022) - NeurIPS 2022Wang, Xiaying; Siller, Octavio R.Q.; Hersche, Michael; et al. (2022)Deep learning (DL) has been widely employed in brain--machine interfaces (BMIs) to decode subjects' intentions based on recorded brain activities enabling direct interaction with machines. BMI systems play a crucial role in medical applications and have recently gained an increasing interest as consumer-grade products. Failures in such systems might cause medical misdiagnoses, physical harm, and financial loss. Especially with the current market boost of such devices, it is of utmost importance to analyze and understand in-depth, potential malicious attacks to develop countermeasures and avoid future damages. This work presents the first study that analyzes and models adversarial attacks based on physical domain constraints in DL-based BMIs. Specifically, we assess the robustness of EEGNet which is the current state-of-the-art network embedded in a real-world, wearable BMI. We propose new methods that incorporate domain-specific insights and constraints to design natural and imperceptible attacks and to realistically model signal propagation over the human scalp. Our results show that EEGNet is significantly vulnerable to adversarial attacks with an attack success rate of more than 50%. - In-memory factorization of holographic perceptual representationsItem type: Journal Article
Nature NanotechnologyLangenegger, Jovin; Karunaratne, Geethan; Hersche, Michael; et al. (2023)Disentangling the attributes of a sensory signal is central to sensory perception and cognition and hence is a critical task for future artifcial intelligence systems. Here we present a compute engine capable of efciently factorizing high-dimensional holographic representations of combinations of such attributes, by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing, and the intrinsic stochasticity associated with analogue in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least fve orders of magnitude larger problems that cannot be solved otherwise, as well as substantially lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix–vector multiplication operations take a constant time, irrespective of the size of the matrix, thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to reliably and efciently factorize visual perceptual representations. - In-memory Realization of In-situ Few-shot Continual Learning with a Dynamically Evolving Explicit MemoryItem type: Conference Paper
ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)Karunaratne, Geethan; Hersche, Michael; Langenegger, Jovin; et al. (2022)Continually learning new classes from few training examples without forgetting previous old classes demands a flexible architecture with an inevitably growing portion of storage, in which new examples and classes can be incrementally stored and efficiently retrieved. One viable architectural solution is to tightly couple a stationary deep neural network to a dynamically evolving explicit memory (EM). As the centerpiece of this architecture, we propose an EM unit that leverages energy-efficient in-memory compute (IMC) cores during the course of continual learning operations. We demonstrate for the first time how the EM unit can physically superpose multiple training examples, expand to accommodate unseen classes, and perform similarity search during inference, using operations on an IMC core based on phase-change memory (PCM). Specifically, the physical superposition of few encoded training examples is realized via in-situ progressive crystallization of PCM devices. The classification accuracy achieved on the IMC core remains within a range of 1.28%-2.5% compared to that of the state-of-the-art full-precision baseline software model on both the CIFAR-100 and miniImageNet datasets when continually learning 40 novel classes (from only five examples per class) on top of 60 old classes. - Constrained Few-shot Class-incremental LearningItem type: Conference Paper
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Hersche, Michael; Karunaratne, Geethan; Cherubini, Giovanni; et al. (2022)Continually learning new classes from fresh data without forgetting previous knowledge of old classes is a very challenging research problem. Moreover, it is imperative that such learning must respect certain memory and computational constraints such as (i) training samples are limited to only a few per class,(ii) the computational cost of learning a novel class remains constant, and (iii) the memory footprint of the model grows at most linearly with the number of classes observed. To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. C-FSCIL provides three update modes that offer a trade-off between accuracy and compute-memory cost of learning novel classes. C-FSCIL exploits hyperdimensional embedding that allows to continually express many more classes than the fixed dimensions in the vector space, with minimal interference. The quality of class vector representations is further improved by aligning them quasi-orthogonally to each other by means of novel loss functions. Experiments on the CIFAR100, miniImageNet, and Omniglot datasets show that C-FSCIL outperforms the baselines with remarkable accuracy and compression. It also scales up to the largest problem size ever tried in this few-shot setting by learning 423 novel classes on top of 1200 base classes with less than 1.6% accuracy drop. Our code is available at https://github. com/IBM/constrained-FSCIL
Publications 1 - 10 of 30