Journal: Statistics and Computing

Loading...

Abbreviation

Stat. comput.

Publisher

Springer

Journal Volumes

ISSN

0960-3174
1573-1375

Description

Search Results

Publications 1 - 10 of 16
  • Twin Boosting
    Item type: Journal Article
    Buehlmann, Peter; Hothorn, Torsten (2010)
    Statistics and Computing
  • Rügamer, David; Baumann, Philipp F. M.; Kneib, Thomas; et al. (2023)
    Statistics and Computing
    Probabilistic forecasting of time series is an important matter in many applications and research fields. In order to draw conclusions from a probabilistic forecast, we must ensure that the model class used to approximate the true forecasting distribution is expressive enough. Yet, characteristics of the model itself, such as its uncertainty or its feature-outcome relationship are not of lesser importance. This paper proposes Autoregressive Transformation Models (ATMs), a model class inspired by various research directions to unite expressive distributional forecasts using a semi-parametric distribution assumption with an interpretable model specification. We demonstrate the properties of ATMs both theoretically and through empirical evaluation on several simulated and real-world forecasting datasets.
  • Sparse conformal predictors
    Item type: Journal Article
    Hebiri, Mohamed (2010)
    Statistics and Computing
  • Molkenthin, Christian; Donner, Christian; Reich, Sebastian; et al. (2022)
    Statistics and Computing
    The spatio-temporal epidemic type aftershock sequence (ETAS) model is widely used to describe the self-exciting nature of earthquake occurrences. While traditional inference methods provide only point estimates of the model parameters, we aim at a fully Bayesian treatment of model inference, allowing naturally to incorporate prior knowledge and uncertainty quantification of the resulting estimates. Therefore, we introduce a highly flexible, non-parametric representation for the spatially varying ETAS background intensity through a Gaussian process (GP) prior. Combined with classical triggering functions this results in a new model formulation, namely the GP-ETAS model. We enable tractable and efficient Gibbs sampling by deriving an augmented form of the GP-ETAS inference problem. This novel sampling approach allows us to assess the posterior model variables conditioned on observed earthquake catalogues, i.e., the spatial background intensity and the parameters of the triggering function. Empirical results on two synthetic data sets indicate that GP-ETAS outperforms standard models and thus demonstrate the predictive power for observed earthquake catalogues including uncertainty quantification for the estimated parameters. Finally, a case study for the l’Aquila region, Italy, with the devastating event on 6 April 2009, is presented.
  • Lang, Stefan; Umlauf, Nikolaus; Wechselberger, Peter; et al. (2014)
    Statistics and Computing
  • Pfahler, Simon; Georg, Peter; Schill, Rudolf; et al. (2024)
    Statistics and Computing
    The Kullback-Leibler (KL) divergence is frequently used in data science. For discrete distributions on large state spaces, approximations of probability vectors may result in a few small negative entries, rendering the KL divergence undefined. We address this problem by introducing a parameterized family of substitute divergence measures, the shifted KL (sKL) divergence measures. Our approach is generic and does not increase the computational overhead. We show that the sKL divergence shares important theoretical properties with the KL divergence and discuss how its shift parameters should be chosen. If Gaussian noise is added to a probability vector, we prove that the average sKL divergence converges to the KL divergence for small enough noise. We also show that our method solves the problem of negative entries in an application from computational oncology, the optimization of Mutual Hazard Networks for cancer progression using tensor-train approximations.
  • Bodenham, Dean A.; Adams, Niall M. (2016)
    Statistics and Computing
    In many applications, the cumulative distribution function (cdf) FQN of a positively weighted sum of N i.i.d. chi-squared random variables QN is required. Although there is no known closed-form solution for FQN, there are many good approximations. When computational efficiency is not an issue, Imhof’s method provides a good solution. However, when both the accuracy of the approximation and the speed of its computation are a concern, there is no clear preferred choice. Previous comparisons between approximate methods could be considered insufficient. Furthermore, in streaming data applications where the computation needs to be both sequential and efficient, only a few of the available methods may be suitable. Streaming data problems are becoming ubiquitous and provide the motivation for this paper. We develop a framework to enable a much more extensive comparison between approximate methods for computing the cdf of weighted sums of an arbitrary random variable. Utilising this framework, a new and comprehensive analysis of four efficient approximate methods for computing FQN is performed. This analysis procedure is much more thorough and statistically valid than previous approaches described in the literature. A surprising result of this analysis is that the accuracy of these approximate methods increases with N.
  • Schölkopf, Bernhard; Muandet, Krikamol; Fukumizu, Kenji; et al. (2015)
    Statistics and Computing
    We describe a method to perform functional operations on probability distributions of random variables. The method uses reproducing kernel Hilbert space representations of probability distributions, and it is applicable to all operations which can be applied to points drawn from the respective distributions. We refer to our approach as kernel probabilistic programming. We illustrate it on synthetic data and show how it can be used for nonparametric structural equation models, with an application to causal inference.
  • Meyer, Daniel W. (2018)
    Statistics and Computing
    The estimation of probability densities based on available data is a central task in many statistical applications. Especially in the case of large ensembles with many samples or high-dimensional sample spaces, computationally efficient methods are needed. We propose a new method that is based on a decomposition of the unknown distribution in terms of so-called distribution elements (DEs). These elements enable an adaptive and hierarchical discretization of the sample space with small or large elements in regions with smoothly or highly variable densities, respectively. The novel refinement strategy that we propose is based on statistical goodness-of-fit and pairwise (as an approximation to mutual) independence tests that evaluate the local approximation of the distribution in terms of DEs. The capabilities of our new method are inspected based on several examples of different dimensionality and successfully compared with other state-of-the-art density estimators.
  • Missing values
    Item type: Journal Article
    Städler, Nicolas; Bühlmann, Peter (2012)
    Statistics and Computing
Publications 1 - 10 of 16