Jakob Heiss
Loading...
11 results
Filters
Reset filtersSearch Results
Publications1 - 10 of 11
- Inductive Bias of Neural Networks and Selected ApplicationsItem type: Doctoral ThesisHeiss, Jakob (2024)This thesis is concerned with the inductive bias of (deep) neural networks (NNs) NN_θ ∶ X → Y. For commonly used loss functionals L, a large set of functions f ∶ X → Y minimizes L equally well. Therefore, it is up to the learning method (e.g., a NN architecture, with certain hyper-parameters including a certain training algorithm) to choose one function among all functions with sufficiently small loss. The preference underlying this choice is an inductive bias. Our main theorem derives a regularization-functional P on function space that exactly mimics the preferences of ℓ2-regularized deep ReLU-NNs with sufficiently many neurons per layer. Interestingly, we can prove that this preference structure over functions cannot be mimicked by any shallow Gaussian process (GP). This result contrasts prominent results stating that other large-width limits of NNs are equivalent to GPs [Jacot et al., 2018, Neal, 1996]. We analyze the differences between “deep inductive biases” such as those we have characterized for deep ℓ2-regularized ReLU NNs (with various hyperparameters) and “shallow inductive biases” such as those of GPs (including certain large width limits of deep NNs). Both in the context of multi-task learning and in the context of uncertainty quantification, we can pinpoint these differences very precisely. From our main theory, we derived a lossless compression algorithm that can provably reduce the number of neurons of an ℓ2-regularized ReLU-NN without changing the function represented by the NN. In our numerical experiments, we can reduce the size of trained NNs by a factor of up to 100 almost without changing their predictions. We also extend the theory of a NN-based learning method (PD-NJ-ODE), which can forecast irregularly, incompletely, and noisily observed time series, and discuss its inductive bias. Further, we developed multiple NN-based improvements for combinatorial auctions. To this end, we introduced a NN architecture with an inductive bias tailored to monotonic value functions. Additionally, we introduced an uncertainty quantification method for NNs and used it for exploration in the spirit of Bayesian optimization. Moreover, we developed a new training algorithm that can deal with loss functionals, which enforce global linear inequalities.
- Infinite width (finite depth) neural networks benefit from multi-task learning unlike shallow Gaussian Processes - an exact quantitative macroscopic characterizationItem type: Working PaperHeiss, Jakob; Teichmann, Josef; Wutte, Hanna (2021)We prove in this paper that optimizing wide ReLU neural networks (NNs) with at least one hidden layer using l2-regularization on the parameters enforces multi-task learning due to representation-learning -- also in the limit of width to infinity. This is in contrast to multiple other results in the literature, in which idealized settings are assumed and where wide (ReLU)-NNs loose their ability to benefit from multi-task learning in the infinite width limit. We deduce the ability of multi-task learning from proving an exact quantitative macroscopic characterization of the learned NN in an appropriate function space.
- How Implicit Regularization of ReLU Neural Networks Characterizes the Learned FunctionItem type: Working Paper
arXivHeiss, Jakob; Teichmann, Josef; Wutte, Hanna (2020)Today, various forms of neural networks are trained to perform approximation tasks in many fields. However, the estimates obtained are not fully understood on function space. Empirical results suggest that typical training algorithms favor regularized solutions. These observations motivate us to analyze properties of the neural networks found by gradient descent initialized close to zero, that is frequently employed to perform the training task. As a starting point, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we rigorously show that for such networks ridge regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a correspondence between the early stopped gradient descent and the smoothing spline regression. Our analysis might give valuable insight on the properties of the solutions obtained using gradient descent methods in general settings. - Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation FrameworkItem type: Journal Article
Transactions on Machine Learning ResearchAndersson, William; Heiss, Jakob; Krach, Florian; et al. (2024)The Path-Dependent Neural Jump Ordinary Differential Equation (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them. In particular, we can lift the assumption of independence by extending the theory to much more realistic settings of conditional independence without any need to change the algorithm. Moreover, we introduce a new loss function, which allows us to deal with noisy observations and explain why the previously used loss function did not lead to a consistent estimator. - Monotone-Value Neural Networks: Exploiting Preference Monotonicity in Combinatorial AssignmentItem type: Conference Paper
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22)Weissteiner, Jakob; Heiss, Jakob; Siems, Julien; et al. (2022)Many important resource allocation problems involve the combinatorial assignment of items, e.g., auctions or course allocation. Because the bundle space grows exponentially in the number of items, preference elicitation is a key challenge in these domains. Recently, researchers have proposed ML-based mechanisms that outperform traditional mechanisms while reducing preference elicitation costs for agents. However, one major shortcoming of the ML algorithms that were used is their disregard of important prior knowledge about agents' preferences. To address this, we introduce monotone-value neural networks (MVNNs), which are designed to capture combinatorial valuations, while enforcing monotonicity and normality. On a technical level, we prove that our MVNNs are universal in the class of monotone and normalized value functions, and we provide a mixed-integer linear program (MILP) formulation to make solving MVNN-based winner determination problems (WDPs) practically feasible. We evaluate our MVNNs experimentally in spectrum auction domains. Our results show that MVNNs improve the prediction performance, they yield state-of-the-art allocative efficiency in the auction, and they also reduce the run-time of the WDPs. Our code is available on GitHub: https://github.com/marketdesignresearch/MVNN. - Nonparametric filtering, estimation and classification using neural jump ODEsItem type: Journal Article
Statistics & Risk ModelingHeiss, Jakob; Krach, Florian; Schmidt, Thorsten; et al. (2025)Neural Jump ODEs model the conditional expectation between observations by neural ODEs and jump at arrival of new observations. They have demonstrated effectiveness for fully data-driven online forecasting in settings with irregular and partial observations, operating under weak regularity assumptions. This work extends the framework to input-output systems, enabling direct applications in online filtering and classification. We establish theoretical convergence guarantees for this approach, providing a robust solution to L 2 L<>{2} -optimal filtering. Empirical experiments highlight the model’s superior performance over classical parametric methods, particularly in scenarios with complex underlying distributions. These results emphasize the approach’s potential in time-sensitive domains such as finance and health monitoring, where real-time accuracy is crucial. - NOMU: Neural Optimization-based Model UncertaintyItem type: Conference Paper
Proceedings of Machine Learning Research ~ Proceedings of the 39th International Conference on Machine LearningHeiss, Jakob; Weissteiner, Jakob; Wutte, Hanna; et al. (2022)We study methods for estimating model uncertainty for neural networks (NNs) in regression. To isolate the effect of model uncertainty, we focus on a noiseless setting with scarce training data. We introduce five important desiderata regarding model uncertainty that any method should satisfy. However, we find that established benchmarks often fail to reliably capture some of these desiderata, even those that are required by Bayesian theory. To address this, we introduce a new approach for capturing model uncertainty for NNs, which we call Neural Optimization-based Model Uncertainty (NOMU). The main idea of NOMU is to design a network architecture consisting of two connected sub-NNs, one for model prediction and one for model uncertainty, and to train it using a carefully-designed loss function. Importantly, our design enforces that NOMU satisfies our five desiderata. Due to its modular architecture, NOMU can provide model uncertainty for any given (previously trained) NN if given access to its training data. We evaluate NOMU in various regressions tasks and noiseless Bayesian optimization (BO) with costly evaluations. In regression, NOMU performs at least as well as state-of-the-art methods. In BO, NOMU even outperforms all considered benchmarks. - Bayesian Optimization-Based Combinatorial AssignmentItem type: Conference Paper
Proceedings of the 37th AAAI Conference on Artificial IntelligenceWeissteiner, Jakob; Heiss, Jakob; Siems, Julien; et al. (2023)We study the combinatorial assignment domain, which includes combinatorial auctions and course allocation. The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning-based preference elicitation algorithms that aim to elicit only the most important information from agents. However, the main shortcoming of this prior work is that it does not model a mechanism's uncertainty over values for not yet elicited bundles. In this paper, we address this shortcoming by presenting a Bayesian optimization-based combinatorial assignment (BOCA) mechanism. Our key technical contribution is to integrate a method for capturing model uncertainty into an iterative combinatorial auction mechanism. Concretely, we design a new method for estimating an upper uncertainty bound that can be used to define an acquisition function to determine the next query to the agents. This enables the mechanism to properly explore (and not just exploit) the bundle space during its preference elicitation phase. We run computational experiments in several spectrum auction domains to evaluate BOCA's performance. Our results show that BOCA achieves higher allocative efficiency than state-of-the-art approaches. - Machine Learning-Powered Combinatorial Clock AuctionItem type: Conference Paper
Proceedings of the AAAI Conference on Artificial Intelligence ~ AAAI-24 Technical Tracks 9Soumalias, Ermis Nikiforos; Weissteiner, Jakob; Heiss, Jakob; et al. (2024)We study the design of iterative combinatorial auctions (ICAs). The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning (ML)-based preference elicitation algorithms that aim to elicit only the most important information from bidders. However, from a practical point of view, the main shortcoming of this prior work is that those designs elicit bidders' preferences via value queries (i.e., “What is your value for the bundle {A, B}?''). In most real-world ICA domains, value queries are considered impractical, since they impose an unrealistically high cognitive burden on bidders, which is why they are not used in practice. In this paper, we address this shortcoming by designing an ML-powered combinatorial clock auction that elicits information from the bidders only via demand queries (i.e., “At prices p, what is your most preferred bundle of items?''). We make two key technical contributions: First, we present a novel method for training an ML model on demand queries. Second, based on those trained ML models, we introduce an efficient method for determining the demand query with the highest clearing potential, for which we also provide a theoretical foundation. We experimentally evaluate our ML-based demand query mechanism in several spectrum auction domains and compare it against the most established real-world ICA: the combinatorial clock auction (CCA). Our mechanism significantly outperforms the CCA in terms of efficiency in all domains, it achieves higher efficiency in a significantly reduced number of rounds, and, using linear prices, it exhibits vastly higher clearing potential. Thus, with this paper we bridge the gap between research and practice and propose the first practical ML-powered ICA. - Revealing the temporal dynamics of antibiotic anomalies in the infant gut microbiome with neural jump ODEsItem type: Working Paper
arXivAdamov, Anja; Chardonnet, Markus; Krach, Florian; et al. (2025)Detecting anomalies in irregularly sampled multi-variate time-series is challenging, especially in data-scarce settings. Here we introduce an anomaly detection framework for irregularly sampled time-series that leverages neural jump ordinary differential equations (NJODEs). The method infers conditional mean and variance trajectories in a fully path dependent way and computes anomaly scores. On synthetic data containing jump, drift, diffusion, and noise anomalies, the framework accurately identifies diverse deviations. Applied to infant gut microbiome trajectories, it delineates the magnitude and persistence of antibiotic-induced disruptions: revealing prolonged anomalies after second antibiotic courses, extended duration treatments, and exposures during the second year of life. We further demonstrate the predictive capabilities of the inferred anomaly scores in accurately predicting antibiotic events and outperforming diversity-based baselines. Our approach accommodates unevenly spaced longitudinal observations, adjusts for static and dynamic covariates, and provides a foundation for inferring microbial anomalies induced by perturbations, offering a translational opportunity to optimize intervention regimens by minimizing microbial disruptions.
Publications1 - 10 of 11