Journal: Electronic Journal of Statistics
Loading...
Abbreviation
Electron. J. Statist.
Publisher
Cornell University
25 results
Search Results
Publications1 - 10 of 25
- Marginal integration for nonparametric causal inferenceItem type: Journal Article
Electronic Journal of StatisticsErnest, Jan; Bühlmann, Peter (2015) - On the uniform convergence of empirical norms and inner products, with application to causal inferenceItem type: Journal Article
Electronic Journal of Statisticsvan de Geer, Sara (2014) - Near-optimal Bayesian active learning with correlated and noisy testsItem type: Journal Article
Electronic Journal of StatisticsChen, Yuxin; Hassani, S. Hamed; Krause, Andreas (2017) - General oracle inequalities for model selectionItem type: Journal Article
Electronic Journal of StatisticsMitchell, Charles; van de Geer, Sara (2009)Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization and prove them at a very general level, for general margin and power tail behavior of the excess losses. These bounds we then apply to typical examples. - High dimensional sparse covariance estimation via directed acyclic graphsItem type: Journal Article
Electronic Journal of StatisticsRütimann, Philipp; Bühlmann, Peter (2009) - Regularizing double machine learning in partially linear endogenous modelsItem type: Journal Article
Electronic Journal of StatisticsEmmenegger, Corinne; Bühlmann, Peter (2021)The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a regularization and selection scheme, regsDML, which leads to narrower confidence intervals. It selects either the TSLS DML estimator or a regularization-only estimator depending on whose estimated variance is smaller. The regularization-only estimator is tailored to have a low mean squared error. The regsDML estimator is fully data driven. The regsDML estimator converges at the parametric rate, is asymptotically Gaussian distributed, and asymptotically equivalent to the TSLS DML estimator, but regsDML exhibits substantially better finite sample properties. The regsDML estimator uses the idea of k-class estimators, and we show how DML and k-class estimation can be combined to estimate the linear coefficient in a partially linear endogenous model. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method is available in the R-package dmlalg. - The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)Item type: Journal Article
Electronic Journal of Statisticsvan de Geer, Sara; Bühlmann, Peter; Zhou, Shuheng (2011)We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error, ℓq-error (q∈{1,2}), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favoring thresholding over the adaptive Lasso. As regards prediction and estimation, the difference is virtually negligible, but our bound for the number of false positives is larger for the adaptive Lasso than for thresholding. We also study the adaptive Lasso under beta-min conditions, which are conditions on the size of the coefficients. We show that for exact variable selection, the adaptive Lasso generally needs more severe beta-min conditions than thresholding. Both the two-stage methods add value to the one-stage Lasso in the sense that, under appropriate restricted and sparse eigenvalue conditions, they have similar prediction and estimation error as the one-stage Lasso but substantially less false positives. Regarding the latter, we provide a lower bound for the Lasso with respect to false positive selections. - Separating populations with wide dataItem type: Journal Article
Electronic Journal of StatisticsBlum, Avrim; Coja-Oghlan, Amin; Frieze, Alan; et al. (2009)In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of k product distributions. We are interested in the case that individual features are of low average quality γ, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size—the product of number of data points n and the number of features K—needed to correctly perform this partitioning as a function of 1/γ for K>n. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small. - Oracle inequalities for cross-validation type proceduresItem type: Journal Article
Electronic Journal of StatisticsLecué, Guillaume; Mitchell, Charles (2012)We prove oracle inequalities for three different types of adaptation procedures inspired by cross-validation and aggregation. These procedures are then applied to the construction of Lasso estimators and aggregation with exponential weights with data-driven regularization and temperature parameters, respectively. We also prove oracle inequalities for the cross-validation procedure itself under some convexity assumptions. - Revisiting the Hodges-Lehmann estimator in a location mixture model: Is asymptotic normality good enough?Item type: Journal Article
Electronic Journal of StatisticsBalabdaoui, Fadoua (2017)
Publications1 - 10 of 25