Search

JavaScript is disabled for your browser. Some features of this site may not work without it.

Now showing items 31-40 of 26568

Momentum Provably Improves Error Feedback!

Fatkhullin, Ilyas; Tyurin, Alexander; Richtárik, Peter (2024)

Advances in Neural Information Processing Systems 36

Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al. (2014) proposed an error feedback (EF) mechanism, which we refer to as EF14, as an ...

Conference Paper

The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes

Recasens, David; Oswald, Martin R.; Pollefeys, Marc; et al. (2024)

Advances in Neural Information Processing Systems 36

Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging ...

Conference Paper

Students Parrot Their Teachers: Membership Inference on Model Distillation

Jagielski, Matthew; Nasr, Milad; Choquette-Choo, Christopher; et al. (2024)

Advances in Neural Information Processing Systems 36

Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled ``student'' models protect the privacy of training data, as they only interact with this data indirectly through a ``teacher'' model. In this work, we design membership inference attacks to systematically study the privacy provided by knowledge distillation to ...

Conference Paper

Learning Layer-wise Equivariances Automatically using Gradients

van der Ouderaa, Tycho F.A.; Immer, Alexander; van der Wilk, Mark (2024)

Advances in Neural Information Processing Systems 36

Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from ...

Conference Paper

Learning DAGs from Data with Few Root Causes

Misiakos, Panagiotis; Wendler, Chris; Püschel, Markus (2024)

Advances in Neural Information Processing Systems 36

We present a novel perspective and algorithm for learning directed acyclic graphs (DAGs) from data generated by a linear structural equation model (SEM). First, we show that a linear SEM can be viewed as a linear transform that, in prior work, computes the data from a dense input vector of random valued root causes (as we will call them) associated with the nodes. Instead, we consider the case of (approximately) few root causes and also ...

Conference Paper

Empowering Convolutional Neural Networks with MetaSin Activation

Salehi, Farnood; Aydin, Tunç Ozan; Gaillard, André; et al. (2024)

Advances in Neural Information Processing Systems 36

RELU networks have remained the default choice for models in the area of image prediction despite their well-established spectral bias towards learning low frequencies faster, and consequently their difficulty of reproducing high frequency visual details. As an alternative, sin networks showed promising results in learning implicit representations of visual data. However training these networks in practically relevant settings proved to ...

Conference Paper

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Yang, Junchi; Li, Xiang; Fatkhullin, Ilyas; et al. (2024)

Advances in Neural Information Processing Systems 36

The classical analysis of Stochastic Gradient Descent (SGD) with polynomially decaying stepsize $\eta_t = \eta/\sqrt{t}$ relies on well-tuned $\eta$ depending on problem parameters such as Lipschitz smoothness constant, which is often unknown in practice. In this work, we prove that SGD with arbitrary $\eta > 0$, referred to as untuned SGD, still attains an order-optimal convergence rate $O(T^{-1/4})$ in terms of gradient norm for minimizing ...

Conference Paper

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Bertschinger, Daniel; Hertrich, Christoph; Jungeblut, Paul; et al. (2024)

Advances in Neural Information Processing Systems 36

We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is $\exists\mathbb{R}$-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a ...

Conference Paper

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Anagnostidis, Sotiris; Pavllo, Dario; Biggio, Luca; et al. (2024)

Advances in Neural Information Processing Systems 36

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in ...

Conference Paper

Robust Knowledge Transfer in Tiered Reinforcement Learning

Huang, Jiawei; He, Niao (2024)

Advances in Neural Information Processing Systems 36

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge ...

Conference Paper

Results

Momentum Provably Improves Error Feedback!

The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes

Students Parrot Their Teachers: Membership Inference on Model Distillation

Learning Layer-wise Equivariances Automatically using Gradients

Learning DAGs from Data with Few Root Causes

Empowering Convolutional Neural Networks with MetaSin Activation

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Robust Knowledge Transfer in Tiered Reinforcement Learning

Refine by

Research Collection

Search

Search

Results

Momentum Provably Improves Error Feedback! ﻿

The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes ﻿

Students Parrot Their Teachers: Membership Inference on Model Distillation ﻿

Learning Layer-wise Equivariances Automatically using Gradients ﻿

Learning DAGs from Data with Few Root Causes ﻿

Empowering Convolutional Neural Networks with MetaSin Activation ﻿

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods ﻿

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete ﻿

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers ﻿

Robust Knowledge Transfer in Tiered Reinforcement Learning ﻿

Refine by

Momentum Provably Improves Error Feedback!

The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes

Students Parrot Their Teachers: Membership Inference on Model Distillation

Learning Layer-wise Equivariances Automatically using Gradients

Learning DAGs from Data with Few Root Causes

Empowering Convolutional Neural Networks with MetaSin Activation

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Robust Knowledge Transfer in Tiered Reinforcement Learning