Antoine Maillard


Loading...

Last Name

Maillard

First Name

Antoine

Organisational unit

Search Results

Publications1 - 10 of 20
  • Bandeira, Afonso S.; Maillard, Antoine; Zhivotovskiy, Nikita (2022)
    arXiv
    In this expository note, we discuss an early partial coloring result of B. Kashin [C. R. Acad. Bulgare Sci., 1985]. Although this result only implies Spencer's six standard deviations [Trans. Amer. Math. Soc., 1985] up to a $\log\log n$ factor, Kashin's argument gives a simple proof of the existence of a constant discrepancy partial coloring in the setup of Komlós conjecture.
  • Bandeira, Afonso S.; Maillard, Antoine (2025)
    Electronic Journal of Probability
    We consider the problem (P) of exactly fitting an ellipsoid (centered at 0) to n standard Gaussian random vectors in Rd, as n,d→∞ with n/d2→α>0. This problem is conjectured to undergo a sharp transition: with high probability, (P) has a solution if α<1∕4, while (P) has no solutions if α>1∕4. So far, only a trivial bound α>1∕2 is known to imply the absence of solutions, while the sharpest results on the positive side assume α≤η (for η>0 a small constant) to prove that (P) is solvable. In this work we show a universality property for the minimal fitting error achievable by ellipsoids: we show that, to leading order, it coincides with the minimal error in a so-called “Gaussian equivalent” problem, for which the satisfiability transition can be rigorously analyzed. Our main results follow from this finding, and they are twofold. On the positive side, we prove that if α<1∕4, there exists an ellipsoid fitting all the points up to a small error, and that the lengths of its principal axes are bounded above and below. On the other hand, for α>1∕4, we show that achieving small fitting error is not possible if the length of the ellipsoid’s shortest axis does not approach 0 as d→∞ (and in particular there does not exist any ellipsoid fit whose shortest axis length is bounded away from 0 as d→∞). To the best of our knowledge, our work is the first rigorous result characterizing the expected phase transition in ellipsoid fitting at α=1∕4. In a companion non-rigorous work, the second author and D. Kunisky give a general analysis of ellipsoid fitting using the replica method of statistical physics, which inspired the present work.
  • Maillard, Antoine; Bandeira, Afonso S. (2023)
    arXiv
  • Maillard, Antoine; Krzakala, Florent; Mezard, Marc; et al. (2022)
    Journal of Statistical Mechanics: Theory and Experiment
    Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.
  • Erba, Vittorio; Troiani, Emanuele; Biggio, Luca; et al. (2024)
    arXiv
    Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physics approach relies on simplified and analytically tractable models of data. However, simple tractable models for long sequences of high-dimensional tokens are largely underexplored. Inspired by the crucial role models such as the single-layer teacher-student perceptron (aka generalized linear regression) played in the theory of fully connected neural networks, in this paper, we introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens. We note that modern architectures naturally subsume the BSR model due to the skip connections. Building on recent methodological progress, we compute the Bayes-optimal generalization error for the model in the limit of long sequences of high-dimensional tokens, and provide a message-passing algorithm that matches this performance. We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression. We also unveil surprising properties of the gradient descent algorithms in the BSR model.
  • Bandeira, Afonso S.; Maillard, Antoine; Zhivotovskiy, Nikita (2022)
    Portugaliae Mathematica
    In this expository note, we discuss an early partial coloring result of B. Kashin (1985). Although this result only implies Spencer’s six standard deviations (1985) up to a loglognloglogn factor, Kashin’s argument gives a simple proof of the existence of a constant discrepancy partial coloring in the setup of the Komlós conjecture.
  • Troiani, Emanuele; Erba, Vittorio; Krzakala, Florent; et al. (2022)
    arXiv
    In this manuscript we consider denoising of large rectangular matrices: given a noisy observation of a signal matrix, what is the best way of recovering the signal matrix itself? For Gaussian noise and rotationally-invariant signal priors, we completely characterize the optimal denoiser and its performance in the high-dimensional limit, in which the size of the signal matrix goes to infinity with fixed aspects ratio, and under the Bayes optimal setting, that is when the statistician knows how the signal and the observations were generated. Our results generalise previous works that considered only symmetric matrices to the more general case of non-symmetric and rectangular ones. We explore analytically and numerically a particular choice of factorized signal prior that models cross-covariance matrices and the matrix factorization problem. As a byproduct of our analysis, we provide an explicit asymptotic evaluation of the rectangular Harish-Chandra-Itzykson-Zuber integral in a special case.
  • Maillard, Antoine; Bandeira, Afonso S.; Belius, David; et al. (2023)
    arXiv
  • Maillard, Antoine; Troiani, Emanuele; Martin, Simon; et al. (2024)
    arXiv
    We consider the problem of learning a target function corresponding to a single hidden layer neural network, with a quadratic activation function after the first layer, and random weights. We consider the asymptotic limit where the input dimension and the network width are proportionally large. Recent work [Cui & al '23] established that linear regression provides Bayes-optimal test error to learn such a function when the number of available samples is only linear in the dimension. That work stressed the open challenge of theoretically analyzing the optimal test error in the more interesting regime where the number of samples is quadratic in the dimension. In this paper, we solve this challenge for quadratic activations and derive a closed-form expression for the Bayes-optimal test error. We also provide an algorithm, that we call GAMP-RIE, which combines approximate message passing with rotationally invariant matrix denoising, and that asymptotically achieves the optimal performance. Technically, our result is enabled by establishing a link with recent works on optimal denoising of extensive-rank matrices and on the ellipsoid fitting problem. We further show empirically that, in the absence of noise, randomly-initialized gradient descent seems to sample the space of weights, leading to zero training loss, and averaging over initialization leads to a test error equal to the Bayes-optimal one.
  • Maillard, Antoine (2024)
    arXiv
    Given a sequence of $d \times d$ symmetric matrices $\{\mathbf{W}_i\}_{i=1}^n$, and a margin $Δ> 0$, we investigate whether it is possible to find signs $(ε_1, \dots, ε_n) \in \{\pm 1\}^n$ such that the operator norm of the signed sum satisfies $\|\sum_{i=1}^n ε_i \mathbf{W}_i\|_\mathrm{op} \leq Δ$. Kunisky and Zhang (2023) recently introduced a random version of this problem, where the matrices $\{\mathbf{W}_i\}_{i=1}^n$ are drawn from the Gaussian orthogonal ensemble. This model can be seen as a random variant of the celebrated Matrix Spencer conjecture and as a matrix-valued analog of the symmetric binary perceptron in statistical physics. In this work, we establish a satisfiability transition in this problem as $n, d \to \infty$ with $n / d^2 \to τ> 0$. Our contributions are twofold. First, we prove that the expected number of solutions with margin $Δ=κ\sqrt{n}$ has a sharp threshold at a critical $τ_1(κ)$: for $τ< τ_1(κ)$ the problem is typically unsatisfiable, while for $τ> τ_1(κ)$ the average number of solutions becomes exponentially large. Second, combining a second-moment method with recent results from Altschuler (2023) on margin concentration in perceptron-type problems, we identify a second threshold $τ_2(κ)$, such that for $τ>τ_2(κ)$ the problem admits solutions with high probability. In particular, we establish for the first time that a system of $n = Θ(d^2)$ Gaussian random matrices can be balanced so that the spectrum of the resulting matrix macroscopically shrinks compared to the typical semicircle law. Our proofs rely on deriving concentration inequalities and large deviation properties for the law of correlated Gaussian matrices under spectral norm constraints, extending results in the literature and offering insights of independent interest.
Publications1 - 10 of 20