Catch up on the latest AI articles

NeurIPS2020 Highlights

NeurIPS2020 Highlights

Conference on Neural Information Processing Systems (NIPS) is the world's leading conference on machine learning and will be held online in 2020 due to the coivd-19 pandemic.

Here's a summary of the conference. Here is some information about this summary.

This summary has been compiled by the Paper Digest Team into a platform called Paper Digest.
Paper Digest is a scientific and technical knowledge graph and text analysis platform for tracking, summarizing, and searching the scientific literature.
The Paper Digest Team will also be a New York-based research group working on text analysis. Text analysis is to produce results.
We are publishing this time with permission to reprint their great work in our article.
We are publishing this time with permission to reprint their amazing work in our article.
The original article is also accompanied by a code, which can be viewed here if you're interested. If you're interested in it, you can browse it here.




A graph similarity for deep learning

We adopt kernel distance and propose transform-sum-cat as an alternative to aggregate-transform to reflect the continuous similarity between the node neighborhoods in the neighborhood aggregation.


An Unsupervised Information-Theoretic Perceptual Quality Metric

We combine recent advances in information-theoretic objective functions with a computational architecture informed by the physiology of the human visual system and unsupervised training on pairs of video frames, yielding our Perceptual Information Metric (PIM).


Self-Supervised MultiModal Versatile Networks

In this work, we learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams.


Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method

We consider the task of solving generic inverse problems, where one wishes to determine the hidden parameters of a natural system that will give rise to a particular set of measurements.


Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

In this paper, we derive the efficiency bound of OPE under a covariate shift.


Neural Methods for Point-wise Dependency Estimation

In this work, instead of estimating the expected dependency, we focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.


Fast and Flexible Temporal Point Processes with Triangular Maps

By exploiting the recent developments in the field of normalizing flows, we design TriTPP – a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel.


Backpropagating Linearly Improves Transferability of Adversarial Examples

In this paper, we study the transferability of such examples, which lays the foundation of many black-box attacks on DNNs.


PyGlove: Symbolic Programming for Automated Machine Learning

In this paper, we introduce a new way of programming AutoML based on symbolic programming.


Fourier Sparse Leverage Scores and Approximate Kernel Learning

We prove new explicit upper bounds on the leverage scores of Fourier sparse functions under both the Gaussian and Laplace measures.


Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds

In this work, we give a general approach for improving regret bounds in online submodular maximization by exploiting “first-order” regret bounds for online linear optimization.


Synbols: Probing Learning Algorithms with Synthetic Datasets

In this sense, we introduce Synbols — Synthetic Symbols — a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images.


Adversarially Robust Streaming Algorithms via Differential Privacy

We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy.


Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering

In this paper, we propose a data debugging framework to identify overly personalized ratings whose existence degrades the performance of a given collaborative filtering model.


Cascaded Text Generation with Markov Transformers

This work proposes an autoregressive model with sub-linear parallel time generation.


Improving Local Identifiability in Probabilistic Box Embeddings

In this work we model the box parameters with min and max Gumbel distributions, which were chosen such that the space is still closed under the operation of intersection.


Permute-and-Flip: A new mechanism for differentially private selection

In this work, we propose a new mechanism for this task based on a careful analysis of the privacy constraints.


Deep reconstruction of strange attractors from time series

Inspired by classical analysis techniques for partial observations of chaotic attractors, we introduce a general embedding technique for univariate and multivariate time series, consisting of an autoencoder trained with a novel latent-space loss function.


Reciprocal Adversarial Learning via Characteristic Functions

We generalise this by comparing the distributions rather than their moments via a powerful tool, i.e., the characteristic function (CF), which uniquely and universally comprising all the information about a distribution.


Statistical Guarantees of Distributed Nearest Neighbor Classification

Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension.


Stein Self-Repulsive Dynamics: Benefits From Past Samples

We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions.


The Statistical Complexity of Early-Stopped Mirror Descent

In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods.


Algorithmic recourse under imperfect causal knowledge: a probabilistic approach

To address this limitation, we propose two probabilistic approaches to select optimal actions that achieve recourse with high probability given limited causal knowledge (e.g., only the causal graph).


Quantitative Propagation of Chaos for SGD in Wide Neural Networks

In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (i.e., the size of the hidden layer) $N \to \plusinfty$.


A Causal View on Robustness of Neural Networks

We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data.


Minimax Classification with 0-1 Loss and Performance Guarantees

This paper presents minimax risk classifiers (MRCs) that do not rely on a choice of surrogate loss and family of rules.


How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

In this paper, we propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients, which explicitly learns the action-value gradient.


Coresets for Regressions with Panel Data

This paper introduces the problem of coresets for regression problems to panel data settings.


Learning Composable Energy Surrogates for PDE Order Reduction

To address this, we leverage parametric modular structure to learn component-level surrogates, enabling cheaper high-fidelity simulation.


Efficient Contextual Bandits with Continuous Actions

We create a computationally tractable learning algorithm for contextual bandits with continuous actions having unknown structure.


Achieving Equalized Odds by Resampling Sensitive Attributes

We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness.


Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates

This paper aims to propose a collision avoidance method that accounts for both measurement uncertainty and motion uncertainty.


Hard Shape-Constrained Kernel Machines

In this paper, we prove that hard affine shape constraints on function derivatives can be encoded in kernel machines which represent one of the most flexible and powerful tools in machine learning and statistics.


A Closer Look at the Training Strategy for Modern Meta-Learning

The support/query (S/Q) episodic training strategy has been widely used in modern meta-learning algorithms and is believed to improve their generalization ability to test environments. This paper conducts a theoretical investigation of this training strategy on generalization.


On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.


Generalised Bayesian Filtering via Sequential Monte Carlo

We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification.


Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time

We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint.


Flows for simultaneous manifold learning and density estimation

We introduce manifold-learning flows (?-flows), a new class of generative models that simultaneously learn the data manifold as well as a tractable probability density on that manifold.


Simultaneous Preference and Metric Learning from Paired Comparisons

In this paper, we consider the problem of learning an ideal point representation of a user’s preferences when the distance metric is an unknown Mahalanobis metric.


Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee

In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors, and develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution.


Learning Manifold Implicitly via Explicit Heat-Kernel Learning

In this paper, we propose the concept of implicit manifold learning, where manifold information is implicitly obtained by learning the associated heat kernel.


Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network

To better utilize the document network, we first propose graph Poisson factor analysis (GPFA) that constructs a probabilistic model for interconnected documents and also provides closed-form Gibbs sampling update equations, moving beyond sophisticated approximate assumptions of existing RTMs.


One-bit Supervision for Image Classification

This paper presents one-bit supervision, a novel setting of learning from incomplete annotations, in the scenario of image classification.


What is being transferred in transfer learning?

In this paper, we provide new tools and analysis to address these fundamental questions.


Submodular Maximization Through Barrier Functions

In this paper, we introduce a novel technique for constrained submodular maximization, inspired by barrier functions in continuous optimization.


Neural Networks with Recurrent Generative Feedback

The proposed framework, termed Convolutional Neural Networks with Feedback (CNN-F), introduces a generative feedback with latent variables to existing CNN architectures, where consistent predictions are made through alternating MAP inference under a Bayesian framework.


Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction

Motivated by this challenge, we introduce a realistic problem of few-shot out-of-graph link prediction, where we not only predict the links between the seen and unseen nodes as in a conventional out-of-knowledge link prediction task but also between the unseen nodes, with only few edges per node.


Exploiting weakly supervised visual patterns to learn from partial annotations

Instead, in this paper, we exploit relationships among images and labels to derive more supervisory signal from the un-annotated labels.


Improving Inference for Neural Image Compression

We consider the problem of lossy image compression with deep latent variable models.


Neuron Merging: Compensating for Pruned Neurons

In this work, we propose a novel concept of neuron merging applicable to both fully connected layers and convolution layers, which compensates for the information loss due to the pruned neurons/filters.


FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

In this paper we propose FixMatch, an algorithm that is a significant simplification of existing SSL methods.


Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem.


Towards Playing Full MOBA Games with Deep Reinforcement Learning

In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning.


Rankmax: An Adaptive Projection Alternative to the Softmax Function

In this work, we propose a method that adapts this parameter to individual training examples.


Online Agnostic Boosting via Regret Minimization

In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret.


Causal Intervention for Weakly-Supervised Semantic Segmentation

We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS).


Belief Propagation Neural Networks

To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP).


Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with ReLU activations.


Post-training Iterative Hierarchical Data Augmentation for Deep Networks

In this paper, we propose a new iterative hierarchical data augmentation (IHDA) method to fine-tune trained deep neural networks to improve their generalization performance.


Debugging Tests for Model Explanations

We investigate whether post-hoc model explanations are effective for diagnosing model errors–model debugging.


Robust compressed sensing using generative models

In this paper we propose an algorithm inspired by the Median-of-Means (MOM).


Fairness without Demographics through Adversarially Reweighted Learning

In this work we address this problem by proposing Adversarially Reweighted Learning (ARL).


Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

In this work, we tackle these two problems separately, by explicitly learning latent representations that can accelerate reinforcement learning from images.


Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

In this paper, we present a different approach. Rather than following the gradient, which corresponds to a locally greedy direction, we instead follow the eigenvectors of the Hessian.


The route to chaos in routing games: When is price of anarchy too optimistic?

We study MWU using the actual game costs without applying cost normalization to $[0,1]$.


Online Algorithm for Unsupervised Sequential Selection with Contextual Information

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.


Adapting Neural Architectures Between Domains

This paper aims to improve the generalization of neural architectures via domain adaptation.


What went wrong and when?\\ Instance-wise feature importance for time-series black-box models

We propose FIT, a framework that evaluates the importance of observations for a multivariate time-series black-box model by quantifying the shift in the predictive distribution over time.


Towards Better Generalization of Adaptive Gradient Methods

To close this gap, we propose \textit{\textbf{S}table \textbf{A}daptive \textbf{G}radient \textbf{D}escent} (\textsc{SAGD}) for nonconvex optimization which leverages differential privacy to boost the generalization performance of adaptive gradient methods.


Learning Guidance Rewards with Trajectory-space Smoothing

This paper is in the same vein — starting with a surrogate RL objective that involves smoothing in the trajectory-space, we arrive at a new algorithm for learning guidance rewards.


Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}.


Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding

In this paper, we explore a new method for learning hyperbolic representations by taking a metric-first approach.


Deep Structural Causal Models for Tractable Counterfactual Inference

We formulate a general framework for building structural causal models (SCMs) with deep learning components.


Convolutional Generation of Textured 3D Meshes

A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.


A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms.


Better Set Representations For Relational Reasoning

To resolve this limitation, we propose a simple and general network module called Set Refiner Network (SRN).


AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

In this paper, we develop a model- and resource-dependent representation for synchronization, which unifies multiple synchronization aspects ranging from architecture, message partitioning, placement scheme, to communication topology.


A Combinatorial Perspective on Transfer Learning

In this work we study how the learning of modular solutions can allow for effective generalization to both unseen and potentially differently distributed data.


Hardness of Learning Neural Networks with Natural Weights

We prove negative results in this regard, and show that for depth-$2$ networks, and many “natural" weights distributions such as the normal and the uniform distribution, most networks are hard to learn.


Higher-Order Spectral Clustering of Directed Graphs

Based on the Hermitian matrix representation of digraphs, we present a nearly-linear time algorithm for digraph clustering, and further show that our proposed algorithm can be implemented in sublinear time under reasonable assumptions.


Primal-Dual Mesh Convolutional Neural Networks

We propose a method that combines the advantages of both types of approaches, while addressing their limitations: we extend a primal-dual framework drawn from the graph-neural-network literature to triangle meshes, and define convolutions on two types of graphs constructed from an input mesh.


The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

We address this limitation by conditional meta-learning, inferring a conditioning function mapping task’s side information into a meta-parameter vector that is appropriate for that task at hand.


Watch out! Motion is Blurring the Vision of Your Deep Neural Networks

We propose a novel adversarial attack method that can generate visually natural motion-blurred adversarial examples, named motion-based adversarial blur attack (ABBA).


Sinkhorn Barycenter via Functional Gradient Descent

In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence.


Coresets for Near-Convex Functions

We suggest a generic framework for computing sensitivities (and thus coresets) for wide family of loss functions which we call near-convex functions.


Bayesian Deep Ensembles via the Neural Tangent Kernel

We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit.


Improved Schemes for Episodic Memory-based Lifelong Learning

In this paper, we provide the first unified view of episodic memory based approaches from an optimization’s perspective.


Adaptive Sampling for Stochastic Risk-Averse Learning

We propose an adaptive sampling algorithm for stochastically optimizing the {\em Conditional Value-at-Risk (CVaR)} of a loss distribution, which measures its performance on the $\alpha$ fraction of most difficult examples.


Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring

We present a simple and effective approach for non-blind image deblurring, combining classical techniques and deep learning.


Discovering Reinforcement Learning Algorithms

This paper introduces a new meta-learning approach that discovers an entire update rule which includes both what to predict’ (e.g. value functions) and how to learn from it’ (e.g. bootstrapping) by interacting with a set of environments.


Taming Discrete Integration via the Boon of Dimensionality

The key contribution of this work addresses this scalability challenge via an efficient reduction of discrete integration to model counting.


Blind Video Temporal Consistency via Deep Video Prior

To address this issue, we present a novel and general approach for blind video temporal consistency.


Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

In this paper, we ?rst provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning, and false negatives tend to have stable predictions over many training iterations.


Model Selection for Production System via Automated Online Experiments

We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments.


On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

In this paper, we analyze the trajectories of stochastic gradient descent (SGD) with the aim of understanding their convergence properties in non-convex problems.


Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond

In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing existing LiRPA algorithms such as CROWN to operate on general computational graphs.


Adaptation Properties Allow Identification of Optimized Neural Codes

Here we solve an inverse problem: characterizing the objective and constraint functions that efficient codes appear to be optimal for, on the basis of how they adapt to different stimulus distributions.


Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-{\L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate.


Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

In this paper, we aim to address the fundamental open question about the sample complexity of model-based MARL.


Conservative Q-Learning for Offline Reinforcement Learning

In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the expected value of a policy under this Q-function lower-bounds its true value.


Online Influence Maximization under Linear Threshold Model

In this paper, we address OIM in the linear threshold (LT) model.


Ensembling geophysical models with Bayesian Neural Networks

We develop a novel data-driven ensembling strategy for combining geophysical models using Bayesian Neural Networks, which infers spatiotemporally varying model weights and bias while accounting for heteroscedastic uncertainties in the observations.


Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

In this paper, we take attempt to incorporate the cyclic mechanism with the vision task of semi-supervised video object segmentation.


Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability

We introduce a less restrictive framework, Asymmetric Shapley values (ASVs), which are rigorously founded on a set of axioms, applicable to any AI system, and can flexibly incorporate any causal structure known to be respected by the data.


Understanding Deep Architecture with Reasoning Layer

In this paper, we take an initial step toward an understanding of such hybrid deep architectures by showing that properties of the algorithm layers, such as convergence, stability and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.


Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.


Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

We show that using \emph{pessimistic value estimates} in the low-data regions in Bellman optimality and evaluation back-up can yield more adaptive and stronger guarantees when the concentrability assumption does not hold.


Detection as Regression: Certified Object Detection with Median Smoothing

This work is motivated by recent progress on certified classification by randomized smoothing. We start by presenting a reduction from object detection to a regression problem.


Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming

We study the problem of learning a linear model to set the reserve price in an auction, given contextual information, in order to maximize expected revenue from the seller side.


ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks

We introduce an approach to training a given compact network.


FleXOR: Trainable Fractional Quantization

In this paper, we propose an encryption algorithm/architecture to compress quantized weights so as to achieve fractional numbers of bits per weight.


The Implications of Local Correlation on Learning Some Deep Functions

We introduce a property of distributions, denoted “local correlation”, which requires that small patches of the input image and of intermediate layers of the target function are correlated to the target label.


Learning to search efficiently for causally near-optimal treatments

We formalize this problem as learning a policy for finding a near-optimal treatment in a minimum number of trials using a causal inference framework.


A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses

In this paper, we propose a game-theoretic framework for studying attacks and defenses which exist in equilibrium.


Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts

In this work we propose the Posterior Network (PostNet), which uses Normalizing Flows to predict an individual closed-form posterior distribution over predicted probabilites for any input sample.


Recurrent Quantum Neural Networks

In this work we construct the first quantum recurrent neural network (QRNN) with demonstrable performance on non-trivial tasks such as sequence learning and integer digit classification.


No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix

In this paper, we study the dynamics of follow the regularized leader (FTRL), arguably the most well-studied class of no-regret dynamics, and we establish a sweeping negative result showing that the notion of mixed Nash equilibrium is antithetical to no-regret learning.


A Unifying View of Optimism in Episodic Reinforcement Learning

In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem.


Continuous Submodular Maximization: Beyond DR-Submodularity

In this paper, we propose the first continuous optimization algorithms that achieve a constant factor approximation guarantee for the problem of monotone continuous submodular maximization subject to a linear constraint.


An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

In this paper, we follow recent approaches of deriving asymptotically optimal algorithms from problem-dependent regret lower bounds and we introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.


Assessing SATNet's Ability to Solve the Symbol Grounding Problem

In this paper, we clarify SATNet’s capabilities by showing that in the absence of intermediate labels that identify individual Sudoku digit images with their logical representations, SATNet completely fails at visual Sudoku (0% test accuracy).


A Bayesian Nonparametrics View into Deep Representations

We investigate neural network representations from a probabilistic perspective.


On the Similarity between the Laplace and Neural Tangent Kernels

Here we show that NTK for fully connected networks with ReLU activation is closely related to the standard Laplace kernel.


A causal view of compositional zero-shot recognition

Here we describe an approach for compositional generalization that builds on causal ideas.


HiPPO: Recurrent Memory with Optimal Polynomial Projections

We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases.


Auto Learning Attention

In this paper, we devise an Auto Learning Attention (AutoLA) method, which is the first attempt on automatic attention design.


CASTLE: Regularization via Auxiliary Causal Graph Discovery

We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.


Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution.


Explainable Voting

We prove, however, that outcomes of the important Borda rule can be explained using O(m^2) steps, where m is the number of alternatives.


Deep Archimedean Copulas

In this paper, we introduce ACNet, a novel differentiable neural network architecture that enforces structural properties and enables one to learn an important class of copulas–Archimedean Copulas.


Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization

In this paper, we identify several crucial issues and misconceptions about the use of linear embeddings for BO.


UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging

In this paper, we reformulate the modulo image unwrapping problem into a series of binary labeling problems and propose a modulo edge-aware model, named as UnModNet, to iteratively estimate the binary rollover masks of the modulo image for unwrapping.


Thunder: a Fast Coordinate Selection Solver for Sparse Learning

In this paper, we propose a novel active incremental approach to further improve the efficiency of the solvers.


Neural Networks Fail to Learn Periodic Functions and How to Fix It

As a fix of this problem, we propose a new activation, namely, $x + \sin^2(x)$, which achieves the desired periodic inductive bias to learn a periodic function while maintaining a favorable optimization property of the $\relu$-based activations.


Distribution Matching for Crowd Counting

In this paper, we show that imposing Gaussians to annotations hurts generalization performance.


Correspondence learning via linearly-invariant embedding

In this paper, we propose a fully differentiable pipeline for estimating accurate dense correspondences between 3D point clouds.


Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning

In this paper, we propose to automatically learn PDRs via an end-to-end deep reinforcement learning agent.


On Adaptive Attacks to Adversarial Example Defenses

While prior evaluation papers focused mainly on the end result—showing that a defense was ineffective—this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack.


Sinkhorn Natural Gradient for Generative Models

In this regard, we propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence.


Online Sinkhorn: Optimal Transport distances from sample streams

This paper introduces a new online estimator of entropy-regularized OT distances between two such arbitrary distributions.


Ultrahyperbolic Representation Learning

In this paper, we propose a representation living on a pseudo-Riemannian manifold of constant nonzero curvature.


Locally-Adaptive Nonparametric Online Learning

We fill this gap by introducing efficient online algorithms (based on a single versatile master algorithm) each adapting to one of the following regularities: (i) local Lipschitzness of the competitor function, (ii) local metric dimension of the instance sequence, (iii) local performance of the predictor across different regions of the instance space.


Compositional Generalization via Neural-Symbolic Stack Machines

To tackle this issue, we propose the Neural-Symbolic Stack Machine (NeSS).


Graphon Neural Networks and the Transferability of Graph Neural Networks

In this paper we introduce graphon NNs as limit objects of GNNs and prove a bound on the difference between the output of a GNN and its limit graphon-NN.


Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with $k$ the number of arms and $T$ the time horizon, we consider the case where $k \geq \sqrt{T}$.


Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

We introduce the gamma-model, a predictive model of environment dynamics with an infinite, probabilistic horizon.


Deep Transformers with Latent Depth

We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection.


Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows

In this work, we propose NeuralMeshFlow (NMF) to generate two-manifold meshes for genus-0 shapes.


Statistical control for spatio-temporal MEG/EEG source imaging with desparsified mutli-task Lasso

To deal with this, we adapt the desparsified Lasso estimator —an estimator tailored for high dimensional linear model that asymptotically follows a Gaussian distribution under sparsity and moderate feature correlation assumptions— to temporal data corrupted with autocorrelated noise.


A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

In this paper, we propose a novel MIP formulation, based on 1-norm support vector machine model, to train a binary oblique ODT for classification problems.


Efficient Exact Verification of Binarized Neural Networks

We present a new system, EEV, for efficient and exact verification of BNNs.


Ultra-Low Precision 4-bit Training of Deep Neural Networks

In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.


Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.


On Numerosity of Deep Neural Networks

Recently, a provocative claim was published that number sense spontaneously emerges in a deep neural network trained merely for visual object recognition. This has, if true, far reaching significance to the fields of machine learning and cognitive science alike. In this paper, we prove the above claim to be unfortunately incorrect.


Outlier Robust Mean Estimation with Subgaussian Rates via Stability

We study the problem of outlier robust high-dimensional mean estimation under a bounded covariance assumption, and more broadly under bounded low-degree moment assumptions.


Self-Supervised Relationship Probing

In this work, we introduce a self-supervised method that implicitly learns the visual relationships without relying on any ground-truth visual relationship annotations.


Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback

To circumvent the use of RCTs, we build an information theoretic counterfactual variational information bottleneck (CVIB), as an alternative for debiasing learning without RCTs.


Prophet Attention: Predicting Attention with Future Attention

In this paper, we propose the Prophet Attention, similar to the form of self-supervision.


Language Models are Few-Shot Learners

Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.


Margins are Insufficient for Explaining Gradient Boosting

In this work, we first demonstrate that the k’th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters. We then explain the short comings of the k’th margin bound and prove a stronger and more refined margin-based generalization bound that indeed succeeds in explaining the performance of modern gradient boosters.


Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics

To address these shortcomings, we propose a novel attribution prior, where the Fourier transform of input-level attribution scores are computed at training-time, and high-frequency components of the Fourier spectrum are penalized.


MomentumRNN: Integrating Momentum into Recurrent Neural Networks

We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs.


Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility.


Projected Stein Variational Gradient Descent

In this work, we propose a {projected Stein variational gradient descent} (pSVGD) method to overcome this challenge by exploiting the fundamental property of intrinsic low dimensionality of the data informed subspace stemming from ill-posedness of such problems.


Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

In this paper we develop a statistical minimax framework to characterize the fundamental limits of transfer learning in the context of regression with linear and one-hidden layer neural network models.


SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point-clouds, which is equivariant under continuous 3D roto-translations.


On the equivalence of molecular graph convolution and molecular wave function with poor basis set

In this study, we demonstrate that the linear combination of atomic orbitals (LCAO), an approximation introduced by Pauling and Lennard-Jones in the 1920s, corresponds to graph convolutional networks (GCNs) for molecules.


The Power of Predictions in Online Control

We study the impact of predictions in online Linear Quadratic Regulator control with both stochastic and adversarial disturbances in the dynamics.


Learning Affordance Landscapes for Interaction Exploration in 3D Environments

We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen).


Cooperative Multi-player Bandit Optimization

We design a distributed learning algorithm that overcomes the informational bias players have towards maximizing the rewards of nearby players they got more information about.


Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

We propose novel algorithms with first- and second-order regret bounds for adversarial linear bandits.


Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency.


A Loss Function for Generative Neural Networks Based on Watson?s Perceptual Model

We propose such a loss function based on Watson’s perceptual model, which computes a weighted distance in frequency space and accounts for luminance and contrast masking.


Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains

We propose to jointly analyze experts’ eye movements and verbal narrations to discover important and interpretable knowledge patterns to better understand their decision-making processes.


Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner.


Optimizing Neural Networks via Koopman Operator Theory

Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection.


SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

We introduce a new perspective on SVGD that instead views SVGD as the kernelized gradient flow of the chi-squared divergence.


Adversarial Robustness of Supervised Sparse Coding

In this work, we strike a better balance by considering a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate.


Differentiable Meta-Learning of Bandit Policies

In this work, we learn such policies for an unknown distribution P using samples from P.


Biologically Inspired Mechanisms for Adversarial Robustness

In this work, we investigate the role of two biologically plausible mechanisms in adversarial robustness.


Statistical-Query Lower Bounds via Functional Gradients

For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statistical-query algorithm with tolerance $n^{-(1/\epsilon)^b}$ must use at least $2^{n^c} \epsilon$ queries for some constants $b, c > 0$, where $n$ is the dimension and $\epsilon$ is the accuracy parameter.


Near-Optimal Reinforcement Learning with Self-Play

This paper closes this gap for the first time: we propose an optimistic variant of the Nash Q-learning algorithm with sample complexity \tlO(SAB), and a new Nash V-learning algorithm with sample complexity \tlO(S(A+B)).


Network Diffusions via Neural Mean-Field Dynamics

We propose a novel learning framework based on neural mean-field dynamics for inference and estimation problems of diffusion on networks.


Self-Distillation as Instance-Specific Label Smoothing

With this in mind, we offer a new interpretation for teacher-student training as amortized MAP estimation, such that teacher predictions enable instance-specific regularization.


Towards Problem-dependent Optimal Learning Rates

In this paper we propose a new framework based on a "uniform localized convergence" principle.


Cross-lingual Retrieval for Iterative Self-Supervised Training

In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs.


Rethinking pooling in graph neural networks

In this paper, we build upon representative GNNs and introduce variants that challenge the need for locality-preserving representations, either using randomization or clustering on the complement graph.


Pointer Graph Networks

Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model generalisation ability.


Gradient Regularized V-Learning for Dynamic Treatment Regimes

In this paper, we introduce Gradient Regularized V-learning (GRV), a novel method for estimating the value function of a DTR.


Faster Wasserstein Distance Estimation with the Sinkhorn Divergence

In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms.


Forethought and Hindsight in Credit Assignment

We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions.


Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification

This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses.


Rescuing neural spike train models from bad MLE

To alleviate this, we propose to directly minimize the divergence between neural recorded and model generated spike trains using spike train kernels.


Lower Bounds and Optimal Algorithms for Personalized Federated Learning

In this work, we consider the optimization formulation of personalized federated learning recently introduced by Hanzely & Richtarik (2020) which was shown to give an alternative explanation to the workings of local SGD methods.


Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

We propose a general framework of adversarial certification with non-Gaussian noise and for more general types of attacks, from a unified \functional optimization perspective.


Deep Imitation Learning for Bimanual Robotic Manipulation

We present a deep imitation learning framework for robotic bimanual manipulation in a continuous state-action space.


Stationary Activations for Uncertainty Calibration in Deep Learning

We introduce a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Mat\’ern family of kernels in Gaussian process (GP) models.


Ensemble Distillation for Robust Model Fusion in Federated Learning

In this work we investigate more powerful and more flexible aggregation schemes for FL.


Falcon: Fast Spectral Inference on Encrypted Data

In this paper, we propose a fast, frequency-domain deep neural network called Falcon, for fast inferences on encrypted data.


On Power Laws in Deep Ensembles

In this work, we focus on a classification problem and investigate the behavior of both non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble as a function of the ensemble size and the member network size.


Practical Quasi-Newton Methods for Training Deep Neural Networks

We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).


Approximation Based Variance Reduction for Reparameterization Gradients

In this work we present a control variate that is applicable for any reparameterizable distribution with known mean and covariance, e.g. Gaussians with any covariance structure.


Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

In this work, we propose a novel framework, Inference Stage Optimization (ISO), for improving the generalizability of 3D pose models when source and target data come from different pose distributions.


Consistent feature selection for analytic deep neural networks

In this work, we investigate the problem of feature selection for analytic deep networks.


Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification by processing a sequence of relatively small inputs, which are strategically selected from the original image with reinforcement learning.


Information Maximization for Few-Shot Learning

We introduce Transductive Infomation Maximization (TIM) for few-shot learning.


Inverse Reinforcement Learning from a Gradient-based Learner

In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning.


Bayesian Multi-type Mean Field Multi-agent Imitation Learning

In this paper, we proposed Bayesian multi-type mean field multi-agent imitation learning (BM3IL).


Bayesian Robust Optimization for Imitation Learning

To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL).


Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

In this work we address the challenging problem of multiview 3D surface reconstruction.


Riemannian Continuous Normalizing Flows

To overcome this problem, we introduce Riemannian continuous normalizing flows, a model which admits the parametrization of flexible probability measures on smooth manifolds by defining flows as the solution to ordinary differential equations.


Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation

We demonstrate a biologically plausible reinforcement learning scheme for deep networks with an arbitrary number of layers.


Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

In this work, we conduct a thorough statistical study of the minimum smooth Wasserstein estimators (MSWEs), first proving the estimator’s measurability and asymptotic consistency.


Online Robust Regression via SGD on the l1 loss

In contrast, we show in this work that stochastic gradient descent on the l1 loss converges to the true parameter vector at a $\tilde{O}( 1 / (1 – \eta)^2 n )$ rate which is independent of the values of the contaminated measurements.


PRANK: motion Prediction based on RANKing

In this paper, we introduce the PRANK method, which satisfies these requirements.


Fighting Copycat Agents in Behavioral Cloning from Observation Histories

To combat this "copycat problem", we propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action.


Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model.


Structured Prediction for Conditional Meta-Learning

In this work, we propose a new perspective on conditional meta-learning via structured prediction.


Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient

In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets.


The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.


Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

This article suggests that deterministic Gradient Descent, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors.


Identifying Learning Rules From Neural Network Observables

It is an open question as to what specific experimental measurements would need to be made to determine whether any given learning rule is operative in a real biological system. In this work, we take a "virtual experimental" approach to this problem.


Optimal Approximation – Smoothness Tradeoffs for Soft-Max Functions

Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness.


Weakly-Supervised Reinforcement Learning for Controllable Behavior

In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.


Improving Policy-Constrained Kidney Exchange via Pre-Screening

We propose both a greedy heuristic and a Monte Carlo tree search, which outperforms previous approaches, using experiments on both synthetic data and real kidney exchange data from the United Network for Organ Sharing.


Learning abstract structure for drawing by efficient motor program induction

We show that people spontaneously learn abstract drawing procedures that support generalization, and propose a model of how learners can discover these reusable drawing procedures.


Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? — A Neural Tangent Kernel Perspective

This paper studies this fundamental problem in deep learning from a so-called neural tangent kernel” perspective.


Dual Instrumental Variable Regression

We present a novel algorithm for non-linear instrumental variable (IV) regression, DualIV, which simplifies traditional two-stage methods via a dual formulation.


Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes

In this paper, we focus on the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full loss function, and recovers model hyperparameters with rate $O(\frac{1}{K})$ up to a statistical error term depending on the minibatch size.


Interventional Few-Shot Learning

Thanks to it, we propose a novel FSL paradigm: Interventional Few-Shot Learning (IFSL).


Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights.


Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound complexity.


ShiftAddNet: A Hardware-Inspired Deep Network

This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts.


Network-to-Network Translation with Conditional Invertible Neural Networks

Therefore, we seek a model that can relate between different existing representations and propose to solve this task with a conditionally invertible network.


Intra-Processing Methods for Debiasing Neural Networks

In this work, we initiate the study of a new paradigm in debiasing research, intra-processing, which sits between in-processing and post-processing methods.


Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

This paper proposes two efficient algorithms for computing approximate second-order stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints.


Model-based Policy Optimization with Unsupervised Model Adaptation

In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.


Implicit Regularization and Convergence for Weight Normalization

Here, we study the weight normalization (WN) method \cite{salimans2016weight} and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions.


Geometric All-way Boolean Tensor Decomposition

In this work, we presented a computationally efficient BTD algorithm, namely Geometric Expansion for all-order Tensor Factorization (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective.


Modular Meta-Learning with Shrinkage

Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection.


A/B Testing in Dense Large-Scale Networks: Design and Inference

In this paper, we present a novel strategy for accurately estimating the causal effects of a class of treatments in a dense large-scale network.


What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

In this work we design experiments to test the key ideas in this theory.


Partially View-aligned Clustering

In this paper, we study one challenging issue in multi-view data clustering.


Partial Optimal Tranport with applications on Positive-Unlabeled Learning

In this paper, we address the partial Wasserstein and Gromov-Wasserstein problems and propose exact algorithms to solve them.


Toward the Fundamental Limits of Imitation Learning

In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs).


Logarithmic Pruning is All You Need

In this work, we remove the most limiting assumptions of this previous work while providing significantly tighter bounds: the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork.


Hold me tight! Influence of discriminative features on deep network boundaries

In this work, we borrow tools from the field of adversarial robustness, and propose a new perspective that relates dataset features to the distance of samples to the decision boundary.


Learning from Mixtures of Private and Public Populations

Inspired by the above example, we consider a model in which the population $\cD$ is a mixture of two possibly distinct sub-populations: a private sub-population $\Dprv$ of private and sensitive data, and a public sub-population $\Dpub$ of data with no privacy concerns.


Adversarial Weight Perturbation Helps Robust Generalization

In this paper, we investigate the weight loss landscape from a new perspective, and identify a clear correlation between the flatness of weight loss landscape and robust generalization gap.


Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

In this paper, a rather general online problem called \emph{dynamic resource allocation with capacity constraints (DRACC)} is introduced and studied in the realm of posted price mechanisms.


Adversarial Self-Supervised Contrastive Learning

In this paper, we propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.


Normalizing Kalman Filters for Multivariate Time Series Analysis

To this extent, we present a novel approach reconciling classical state space models with deep learning methods.


Learning to summarize with human feedback

In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences.


Fourier Spectrum Discrepancies in Deep Network Generated Images

In this paper, we present an analysis of the high-frequency Fourier modes of real and deep network generated images and show that deep network generated images share an observable, systematic shortcoming in replicating the attributes of these high-frequency modes.


Lamina-specific neuronal properties promote robust, stable signal propagation in feedforward networks

Specifically, we found that signal transformations, made by each layer of neurons on an input-driven spike signal, demodulate signal distortions introduced by preceding layers.


Learning Dynamic Belief Graphs to Generalize on Text-Based Games

In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text.


Triple descent and the two kinds of overfitting: where & why do they appear?

In this paper, we show that despite their apparent similarity, these two scenarios are inherently different.


Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

In this paper, we propose to tackle this challenge by employing neural factor graphs to induce a tighter coupling between concepts in different modalities (e.g. images and text).


Learning Graph Structure With A Finite-State Automaton Layer

In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure.


A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions

This paper studies the universal approximation property of deep neural networks for representing probability distributions.


Unsupervised object-centric video generation and decomposition in 3D

We instead propose to model a video as the view seen while moving through a scene with multiple 3D objects and a 3D background.


Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization

In this paper, we introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification.


Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?

This paper provides an attempt to fill up this gap by analyzing the learning guarantees of the corresponding learning algorithms on both SA and HL measures.


A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

We present a novel {\em automated} curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by our {\em difficulty quantum momentum} strategy.


Causal analysis of Covid-19 Spread in Germany

In this work, we study the causal relations among German regions in terms of the spread of Covid-19 since the beginning of the pandemic, taking into account the restriction policies that were applied by the different federal states.


Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms

We find separation rates for testing multinomial or more general discrete distributions under the constraint of alpha-local differential privacy.


Adaptive Gradient Quantization for Data-Parallel SGD

We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ.


Finite Continuum-Armed Bandits

Focusing on a nonparametric setting, where the mean reward is an unknown function of a one-dimensional covariate, we propose an optimal strategy for this problem.


Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

To alleviate this shortcoming, we propose a novel regularization term based on the functional entropy.


Compact task representations as a normative model for higher-order brain activity

More specifically, we focus on MDPs whose state is based on action-observation histories, and we show how to compress the state space such that unnecessary redundancy is eliminated, while task-relevant information is preserved.


Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs

We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust).


Co-exposure Maximization in Online Social Networks

In this paper, we study the problem of allocating seed users to opposing campaigns: by drawing on the equal-time rule of political campaigning on traditional media, our goal is to allocate seed users to campaigners with the aim to maximize the expected number of users who are co-exposed to both campaigns.


UCLID-Net: Single View Reconstruction in Object Space

In this paper, we show that building a geometry preserving 3-dimensional latent space helps the network concurrently learn global shape regularities and local reasoning in the object coordinate space and, as a result, boosts performance.


Reinforcement Learning for Control with Multiple Frequencies

In this paper, we formalize the problem of multiple control frequencies in RL and provide its efficient solution method.


Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Here we focus on gradient flow dynamics for phase retrieval from random measurements.


Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs

In this work, we first unify exisiting MPNNs on different structures into G-MPNN (Generalised MPNN) framework.


A Unified View of Label Shift Estimation

In this paper, we present a unified view of the two methods and the first theoretical characterization of MLLS.


Optimal Private Median Estimation under Minimal Distributional Assumptions

We study the fundamental task of estimating the median of an underlying distribution from a finite number of samples, under pure differential privacy constraints.


Breaking the Communication-Privacy-Accuracy Trilemma

In this paper, we develop novel encoding and decoding mechanisms that simultaneously achieve optimal privacy and communication efficiency in various canonical settings.


Audeo: Audio Generation for a Silent Performance Video

Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association of sounds with visual events.


Ode to an ODE

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d).


Self-Distillation Amplifies Regularization in Hilbert Space

This work provides the first theoretical analysis of self-distillation.


Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

Without a universality, there could be a well-behaved invertible transformation that the CF-INN can never approximate, hence it would render the model class unreliable. We answer this question by showing a convenient criterion: a CF-INN is universal if its layers contain affine coupling and invertible linear functions as special cases.


Community detection using fast low-cardinality semidefinite programming?

In this paper, we propose a new class of low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from max-k-cut.


Modeling Noisy Annotations for Crowd Counting

In this paper, we first model the annotation noise using a random variable with Gaussian distribution, and derive the pdf of the crowd density value for each spatial location in the image. We then approximate the joint distribution of the density values (i.e., the distribution of density maps) with a full covariance multivariate Gaussian density, and derive a low-rank approximate for tractable implementation.


An operator view of policy gradient methods

We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $\mathcal{I}$, which maps any policy $\pi$ to a better one $\mathcal{I}\pi$, and a projection operator $\mathcal{P}$, which finds the best approximation of $\mathcal{I}\pi$ in the set of realizable policies.


Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it’s augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains.


Online MAP Inference of Determinantal Point Processes

In this paper, we provide an efficient approximation algorithm for finding the most likelihood configuration (MAP) of size $k$ for Determinantal Point Processes (DPP) in the online setting where the data points arrive in an arbitrary order and the algorithm cannot discard the selected elements from its local memory.


Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

This paper presents a new matching-based framework for semi-supervised video object segmentation (VOS).


Inferring learning rules from animal decision-making

Whereas reinforcement learning often focuses on the design of algorithms that enable artificial agents to efficiently learn new tasks, here we develop a modeling framework to directly infer the empirical learning rules that animals use to acquire new behaviors.


Input-Aware Dynamic Backdoor Attack

In this work, we propose a novel backdoor attack technique in which the triggers vary from input to input.


How hard is to distinguish graphs with graph neural networks?

This study derives hardness results for the classification variant of graph isomorphism in the message-passing model (MPNN).


Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

In this paper, we show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $.


Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks

To partially answer this question, we consider the scenario when the manifold information of the underlying data is available.


Cross-Scale Internal Graph Neural Network for Image Super-Resolution

In this paper, we explore the cross-scale patch recurrence property of a natural image, i.e., similar patches tend to recur many times across different scales.


Unsupervised Representation Learning by Invariance Propagation

In this paper, we propose Invariance Propagation to focus on learning representations invariant to category-level variations, which are provided by different instances from the same category.


Restoring Negative Information in Few-Shot Object Detection

In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives.


Do Adversarially Robust ImageNet Models Transfer Better?

In this work, we identify another such aspect: we find that adversarially robust models, while less accurate, often perform better than their standard-trained counterparts when used for transfer learning.


Robust Correction of Sampling Bias using Cumulative Distribution Functions

We present a new method for handling covariate shift using the empirical cumulative distribution function estimates of the target distribution by a rigorous generalization of a recent idea proposed by Vapnik and Izmailov.


Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach

In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data.


Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

In this paper, we propose to build the pixel-level cycle association between source and target pixel pairs and contrastively strengthen their connections to diminish the domain gap and make the features more discriminative.


Classification with Valid and Adaptive Coverage

In this paper, we develop specialized versions of these techniques for categorical and unordered response labels that, in addition to providing marginal coverage, are also fully adaptive to complex data distributions, in the sense that they perform favorably in terms of approximate conditional coverage compared to alternative methods.


Learning Global Transparent Models consistent with Local Contrastive Explanations

In this work, we explore the question: Can we produce a transparent global model that is simultaneously accurate and consistent with the local (contrastive) explanations of the black-box model?


Learning to Approximate a Bregman Divergence

In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations.


Diverse Image Captioning with Context-Object Split Latent Spaces

To this end, we introduce a novel factorization of the latent space, termed context-object split, to model diversity in contextual descriptions across images and texts within the dataset.


Learning Disentangled Representations of Videos with Missing Data

We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data.


Natural Graph Networks

Here we show that instead of equivariance, the more general concept of naturality is sufficient for a graph network to be well-defined, opening up a larger class of graph networks.


Continual Learning with Node-Importance based Adaptive Group Sparse Regularization

We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL), using two group sparsity-based penalties.


Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

In this work, we propose Learning@home: a novel neural network training paradigm designed to handle large amounts of poorly connected participants.


Bidirectional Convolutional Poisson Gamma Dynamical Systems

Incorporating the natural document-sentence-word structure into hierarchical Bayesian modeling, we propose convolutional Poisson gamma dynamical systems (PGDS) that introduce not only word-level probabilistic convolutions, but also sentence-level stochastic temporal transitions.


Deep Reinforcement and InfoMax Learning

To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps.


On ranking via sorting by estimated expected utility

We provide an answer to this question in the form of a structural characterization of ranking losses for which a suitable regression is consistent.


Distribution-free binary classification: prediction sets, confidence intervals and calibration

We study three notions of uncertainty quantification—calibration, confidence intervals and prediction sets—for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data.


Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow

In this paper, we introduce subset flows, a class of flows that can tractably transform finite volumes and thus allow exact computation of likelihoods for discrete data.


Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

In this work, we focus on one-to-many sequence transduction problems, such as extracting multiple sequential sources from a mixture sequence.


Variance reduction for Random Coordinate Descent-Langevin Monte Carlo

We show by a counterexamplethat blindly applying RCD does not achieve the goal in the most general setting.


Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

We introduce IMAGINE, an intrinsically motivated deep reinforcement learning architecture that models this ability.


All Word Embeddings from One Embedding

In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding.


Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

We consider the task of sampling with respect to a log concave probability distribution.


How to Characterize The Landscape of Overparameterized Convolutional Neural Networks

Specifically, we consider the loss landscape of an overparameterized convolutional neural network (CNN) in the continuous limit, where the numbers of channels/hidden nodes in the hidden layers go to infinity.


On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples

In this paper, we describe a geometric technique that determines whether this SDP certificate is exact, meaning whether it provides both a lower-bound on the size of the smallest adversarial perturbation, as well as a globally optimal perturbation that attains the lower-bound.


Submodular Meta-Learning

In this paper, we introduce a discrete variant of the Meta-learning framework.


Rethinking Pre-training and Self-training

Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training.


Unsupervised Sound Separation Using Mixture Invariant Training

In this paper, we propose a completely unsupervised method, mixture invariant training (MixIT), that requires only single-channel acoustic mixtures.


Adaptive Discretization for Model-Based Reinforcement Learning

We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm in large (potentially continuous) state-action spaces.


CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

This paper proposes an end-to-end cross-modal retrieval network for binary source code matching, which achieves higher accuracy and requires less expert experience.


On Warm-Starting Neural Network Training

In this work, we take a closer look at this empirical phenomenon and try to understand when and how it occurs.


DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks

Informed by the KKT conditions, a local search post-processing algorithm is proposed and shown to substantially and universally improve the structural Hamming distance of all tested algorithms, typically by a factor of 2 or more.


OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification

We propose a few-shot learning method for detecting out-of-distribution (OOD) samples from classes that are unseen during training while classifying samples from seen classes using only a few labeled examples.


An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

In this paper, we show that one existing solution to this transfer problem– grounded action transformation –is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations.


Learning About Objects by Learning to Interact with Them

Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction.


Learning discrete distributions with infinite support

We present a novel approach to estimating discrete distributions with (potentially) infinite support in the total variation metric.


Dissecting Neural ODEs

In this work we “open the box”, further developing the continuous-depth formulation with the aim of clarifying the influence of several design choices on the underlying dynamics.


Teaching a GAN What Not to Learn

In this paper, we approach the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, "The art of knowing is knowing what to ignore."


Counterfactual Data Augmentation using Locally Factored Dynamics

We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for Counterfactual Data Augmentation (CoDA).


Rethinking Learnable Tree Filter for Generic Feature Transform

To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term.


Self-Supervised Relational Reasoning for Representation Learning

In this work, we propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data.


Sufficient dimension reduction for classification using principal optimal transport direction

To address this issue, we propose a novel estimation method of sufficient dimension reduction subspace (SDR subspace) using optimal transport.


Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

In this paper, we focus on a family of Wasserstein distributionally robust support vector machine (DRSVM) problems and propose two novel epigraphical projection-based incremental algorithms to solve them.


Differentially Private Clustering: Tight Approximation Ratios

For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors.


On the Power of Louvain in the Stochastic Block Model

We provide valuable tools for the analysis of Louvain, but also for many other combinatorial algorithms.


Fairness with Overlapping Groups; a Probabilistic Perspective

In algorithmically fair prediction problems, a standard goal is to ensure the equality of fairness metrics across multiple overlapping groups simultaneously. We reconsider this standard fair classification problem using a probabilistic population analysis, which, in turn, reveals the Bayes-optimal classifier.


AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control

We propose AttendLight, an end-to-end Reinforcement Learning (RL) algorithm for the problem of traffic signal control.


Searching for Low-Bit Weights in Quantized Neural Networks

Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.


Adaptive Reduced Rank Regression

To complement the upper bound, we introduce new techniques for establishing lower bounds on the performance of any algorithm for this problem.


From Predictions to Decisions: Using Lookahead Regularization

For this, we introduce look-ahead regularization which, by anticipating user actions, encourages predictive models to also induce actions that improve outcomes.


Sequential Bayesian Experimental Design with Variable Cost Structure

We propose and demonstrate an algorithm that accounts for these variable costs in the refinement decision.


Predictive inference is free with the jackknife+-after-bootstrap

In this paper, we propose the jackknife+-after-bootstrap (J+aB), a procedure for constructing a predictive interval, which uses only the available bootstrapped samples and their corresponding fitted models, and is therefore "free" in terms of the cost of model fitting.


Counterfactual Predictions under Runtime Confounding

We propose a doubly-robust procedure for learning counterfactual prediction models in this setting.


Learning Loss for Test-Time Augmentation

This paper proposes a novel instance-level test- time augmentation that efficiently selects suitable transformations for a test input.


Balanced Meta-Softmax for Long-Tailed Visual Recognition

In this paper, we show that the Softmax function, though used in most classification tasks, gives a biased gradient estimation under the long-tailed setup.


Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space.


MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

This paper introduces MDP homomorphic networks for deep reinforcement learning.


How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods

We performed a cross-analysis Amazon Mechanical Turk study comparing the popular state-of-the-art explanation methods to empirically determine which are better in explaining model decisions.


On the Error Resistance of Hinge-Loss Minimization

In this work, we identify a set of conditions on the data under which such surrogate loss minimization algorithms provably learn the correct classifier.


Munchausen Reinforcement Learning

Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward.


Object Goal Navigation using Goal-Oriented Semantic Exploration

We propose a modular system called, `Goal-Oriented Semantic Exploration’ which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category.


Efficient semidefinite-programming-based inference for binary and multi-class MRFs

In this paper, we propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF by instead exploiting a recently proposed coordinate-descent-based fast semidefinite solver.


Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost.


Semantic Visual Navigation by Watching YouTube Videos

This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos.


Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory.


SuperLoss: A Generic Loss for Robust Curriculum Learning

We propose instead a simple and generic method that can be applied to a variety of losses and tasks without any change in the learning procedure.


CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models

In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity.


Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

In this work, instead of focusing on good experiences with limited diversity, we propose to learn a trajectory-conditioned policy to follow and expand diverse past trajectories from a memory buffer.


Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations

We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive, and show this is not the case in deep networks.


Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation.


Learning Differential Equations that are Easy to Solve

We propose a remedy that encourages learned dynamics to be easier to solve.


Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses.


Influence-Augmented Online Planning for Complex Environments

In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods.


PAC-Bayes Learning Bounds for Sample-Dependent Priors

We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors.


Reward-rational (implicit) choice: A unifying formalism for reward learning

Our key observation is that different types of behavior can be interpreted in a single unifying formalism – as a reward-rational choice that the human is making, often implicitly.


Probabilistic Time Series Forecasting with Shape and Temporal Diversity

In this paper, we address this problem for non-stationary time series, which is very challenging yet crucially important.


Low Distortion Block-Resampling with Spatially Stochastic Networks

We formalize and attack the problem of generating new images from old ones that are as diverse as possible, only allowing them to change without restrictions in certain parts of the image while remaining globally consistent.


Continual Deep Learning by Functional Regularisation of Memorable Past

In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting.


Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Here we propose and mathematically analyze a general class of structure-related features, termed Distance Encoding (DE).


Fast Fourier Convolution

In this work, we propose a novel convolutional operator dubbed as fast Fourier convolution (FFC), which has the main hallmarks of non-local receptive fields and cross-scale fusion within the convolutional unit.


Unsupervised Learning of Dense Visual Representations

In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations.


Higher-Order Certification For Randomized Smoothing

In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme.


Learning Structured Distributions From Untrusted Batches: Faster and Simpler

In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity.


Hierarchical Quantized Autoencoders

This leads us to introduce a novel objective for training hierarchical VQ-VAEs.


Diversity can be Transferred: Output Diversification for White- and Black-box Attacks

To improve the efficiency of these attacks, we propose Output Diversified Sampling (ODS), a novel sampling strategy that attempts to maximize diversity in the target model’s outputs among the generated samples.


POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics.


AvE: Assistance via Empowerment

We propose a new paradigm for assistance by instead increasing the human’s ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment.


Variational Policy Gradient Method for Reinforcement Learning with General Utilities

In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases.


Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice

We study how recurrent neural networks (RNNs) solve a hierarchical inference task involving two latent variables and disparate timescales separated by 1-2 orders of magnitude.


Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction.


Efficient Low Rank Gaussian Variational Inference for Neural Networks

By using a new form of the reparametrization trick, we derive a computationally efficient algorithm for performing VI with a Gaussian family with a low-rank plus diagonal covariance structure.


Privacy Amplification via Random Check-Ins

In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients).


Probabilistic Circuits for Variational Inference in Discrete Graphical Models

In this paper, we propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN), to compute ELBO gradients exactly (without sampling) for a certain class of densities.


Your Classifier can Secretly Suffice Multi-Source Domain Adaptation

In this work, we present a different perspective to MSDA wherein deep models are observed to implicitly align the domains under label supervision.


Labelling unlabelled videos from scratch with multi-modal self-supervision

In this work, we a) show that unsupervised labelling of a video dataset does not come for free from strong feature encoders and b) propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations, by leveraging the natural correspondence between audio and visual modalities.


A Non-Asymptotic Analysis for Stein Variational Gradient Descent

In this paper, we provide a novel finite time analysis for the SVGD algorithm.


Robust Meta-learning for Mixed Linear Regression with Small Batches

We introduce a spectral approach that is simultaneously robust under both scenarios.


Bayesian Deep Learning and a Probabilistic Perspective of Generalization

We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.


Unsupervised Learning of Object Landmarks via Self-Training Correspondence

This paper addresses the problem of unsupervised discovery of object landmarks.


Randomized tests for high-dimensional regression: A more efficient and powerful solution

In this paper, we answer this question in the affirmative by leveraging the random projection techniques, and propose a testing procedure that blends the classical $F$-test with a random projection step.


Learning Representations from Audio-Visual Spatial Alignment

We introduce a novel self-supervised pretext task for learning representations from audio-visual content.


Generative View Synthesis: From Single-view Semantics to Novel-view Images

We propose to push the envelope further, and introduce Generative View Synthesis (GVS) that can synthesize multiple photorealistic views of a scene given a single semantic map.


Towards More Practical Adversarial Attacks on Graph Neural Networks

Therefore, we propose a greedy procedure to correct the importance score that takes into account of the diminishing-return pattern.


Multi-Task Reinforcement Learning with Soft Modularization

Thus, instead of naively sharing parameters across tasks, we introduce an explicit modularization technique on policy representation to alleviate this optimization issue.


Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models

In this paper, we propose a novel framework for computing Shapley values that generalizes recent work that aims to circumvent the independence assumption.


On the training dynamics of deep networks with $L_2$ regularization

We study the role of $L_2$ regularization in deep learning, and uncover simple relations between the performance of the model, the $L_2$ coefficient, the learning rate, and the number of training steps.


Improved Algorithms for Convex-Concave Minimax Optimization

This paper studies minimax optimization problems minx maxy f(x, y), where f(x, y) is mx-strongly convex with respect to x, my-strongly concave with respect to y and (Lx, Lxy, Ly)-smooth.


Deep Variational Instance Segmentation

In this paper, we propose a novel algorithm that directly utilizes a fully convolutional network (FCN) to predict instance labels.


Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence

The goal of this paper is to learn dense 3D shape correspondence for topology-varying objects in an unsupervised manner.


Deep Multimodal Fusion by Channel Exchanging

To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities.


Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems

In this paper, we motivate the need for what we call Meta-diversity search, arguing that there is not a unique ground truth interesting diversity as it strongly depends on the final observer and its motives.


AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity

We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal, in the sense of having the best accuracy for a given complexity.


Delay and Cooperation in Nonstochastic Linear Bandits

This paper offers a nearly optimal algorithm for online linear optimization with delayed bandit feedback.


Probabilistic Orientation Estimation with Matrix Fisher Distributions

This paper focuses on estimating probability distributions over the set of 3D ro- tations (SO(3)) using deep neural networks.


Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons

Overall, we present a novel normative modeling approach for spiking E-I networks, going beyond the widely-used energy-minimizing networks that violate Dale’s law.


Telescoping Density-Ratio Estimation

To resolve this limitation, we introduce a new framework, telescoping density-ratio estimation (TRE), that enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces.


Towards Deeper Graph Neural Networks with Differentiable Group Normalization

To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN).


Stochastic Optimization for Performative Prediction

We initiate the study of stochastic optimization for performative prediction.


Learning Differentiable Programs with Admissible Neural Heuristics

We study the problem of learning differentiable functions expressed as programs in a domain-specific language.


Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method

We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis.


Domain Adaptation as a Problem of Inference on Graphical Models

To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models.


Network size and size of the weights in memorization with two-layers neural networks

In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.


Certifying Strategyproof Auction Networks

We propose ways to explicitly verify strategyproofness under a particular valuation profile using techniques from the neural network verification literature.


Continual Learning of Control Primitives : Skill Discovery via Reset-Games

In this work, we show how a single method can allow an agent to acquire skills with minimal supervision while removing the need for resets.


HOI Analysis: Integrating and Decomposing Human-Object Interaction

In analogy to Harmonic Analysis, whose goal is to study how to represent the signals with the superposition of basic waves, we propose the HOI Analysis.


Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

In this paper, we propose a generalization of the objective function behind these methods involving p-norms.


Deep Direct Likelihood Knockoffs

We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property.



In this work, we step forward in this direction and propose a semi-parametric method, Meta-Neighborhoods, where predictions are made adaptively to the neighborhood of the input.


Neural Dynamic Policies for End-to-End Sensorimotor Learning

In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations.


A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons

Here, we develop a two-step inference strategy that allows us to train robust generalised linear models of interacting neurons, by explicitly separating the effects of correlations in the stimulus from network interactions in each training step.


Decision-Making with Auto-Encoding Variational Bayes

Motivated by these theoretical results, we propose learning several approximate proposals for the best model and combining them using multiple importance sampling for decision-making.


Attribution Preservation in Network Compression for Reliable Network Interpretation

In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications.


Feature Importance Ranking for Deep Learning

In this paper, we propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size and ranking the importance of those features in the optimal subset simultaneously.


Causal Estimation with Functional Confounders

We study causal inference when the true confounder value can be expressed as a function of the observed data; we call this setting estimation with functional confounders (EFC).


Model Inversion Networks for Model-Based Optimization

We propose to address such problems with model inversion networks (MINs), which learn an inverse mapping from scores to inputs.


Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a \emph{Feller process}, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case.


Exact expressions for double descent and implicit regularization via surrogate random design

We provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator.


Certifying Confidence via Randomized Smoothing

In this work, we propose a method to generate certified radii for the prediction confidence of the smoothed classifier.


Learning Physical Constraints with Neural Projections

We propose a new family of neural networks to predict the behaviors of physical systems by learning their underpinning constraints.


Robust Optimization for Fairness with Noisy Protected Groups

First, we study the consequences of naively relying on noisy protected group labels: we provide an upper bound on the fairness violations on the true groups G when the fairness criteria are satisfied on noisy groups ^G.


Noise-Contrastive Estimation for Multivariate Point Processes

We show how to instead apply a version of noise-contrastive estimation—a general parameter estimation method with a less expensive stochastic objective.


A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling

We generalize the definition of an incentive-awareness measure proposed by Lavi et al (2019), to quantify the reduction of ERM’s outputted price due to a change of m>=1 out of N input samples, and provide specific convergence rates of this measure to zero as N goes to infinity for different types of input distributions.


Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning

In this paper, we analytically characterise the role of gates and active sub-networks in deep learning.


Multiscale Deep Equilibrium Models