Journal: Electronic Journal of Statistics

Loading...

Abbreviation

Electron. J. Statist.

Publisher

Cornell University

Journal Volumes

ISSN

1935-7524

Description

Search Results

Publications1 - 10 of 25
  • Ernest, Jan; Bühlmann, Peter (2015)
    Electronic Journal of Statistics
  • van de Geer, Sara (2014)
    Electronic Journal of Statistics
  • Chen, Yuxin; Hassani, S. Hamed; Krause, Andreas (2017)
    Electronic Journal of Statistics
  • Mitchell, Charles; van de Geer, Sara (2009)
    Electronic Journal of Statistics
    Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization and prove them at a very general level, for general margin and power tail behavior of the excess losses. These bounds we then apply to typical examples.
  • Rütimann, Philipp; Bühlmann, Peter (2009)
    Electronic Journal of Statistics
  • Emmenegger, Corinne; Bühlmann, Peter (2021)
    Electronic Journal of Statistics
    The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a regularization and selection scheme, regsDML, which leads to narrower confidence intervals. It selects either the TSLS DML estimator or a regularization-only estimator depending on whose estimated variance is smaller. The regularization-only estimator is tailored to have a low mean squared error. The regsDML estimator is fully data driven. The regsDML estimator converges at the parametric rate, is asymptotically Gaussian distributed, and asymptotically equivalent to the TSLS DML estimator, but regsDML exhibits substantially better finite sample properties. The regsDML estimator uses the idea of k-class estimators, and we show how DML and k-class estimation can be combined to estimate the linear coefficient in a partially linear endogenous model. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method is available in the R-package dmlalg.
  • van de Geer, Sara; Bühlmann, Peter; Zhou, Shuheng (2011)
    Electronic Journal of Statistics
    We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error, ℓq-error (q∈{1,2}), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favoring thresholding over the adaptive Lasso. As regards prediction and estimation, the difference is virtually negligible, but our bound for the number of false positives is larger for the adaptive Lasso than for thresholding. We also study the adaptive Lasso under beta-min conditions, which are conditions on the size of the coefficients. We show that for exact variable selection, the adaptive Lasso generally needs more severe beta-min conditions than thresholding. Both the two-stage methods add value to the one-stage Lasso in the sense that, under appropriate restricted and sparse eigenvalue conditions, they have similar prediction and estimation error as the one-stage Lasso but substantially less false positives. Regarding the latter, we provide a lower bound for the Lasso with respect to false positive selections.
  • Separating populations with wide data
    Item type: Journal Article
    Blum, Avrim; Coja-Oghlan, Amin; Frieze, Alan; et al. (2009)
    Electronic Journal of Statistics
    In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of k product distributions. We are interested in the case that individual features are of low average quality γ, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size—the product of number of data points n and the number of features K—needed to correctly perform this partitioning as a function of 1/γ for K>n. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
  • Lecué, Guillaume; Mitchell, Charles (2012)
    Electronic Journal of Statistics
    We prove oracle inequalities for three different types of adaptation procedures inspired by cross-validation and aggregation. These procedures are then applied to the construction of Lasso estimators and aggregation with exponential weights with data-driven regularization and temperature parameters, respectively. We also prove oracle inequalities for the cross-validation procedure itself under some convexity assumptions.
  • Balabdaoui, Fadoua (2017)
    Electronic Journal of Statistics
Publications1 - 10 of 25