Catch up on the latest AI articles

ECCV2020_Papers Accepted List

ECCV2020_Papers Accepted List

Article

The 2020 European Conference on Computer Vision (ECCV 2020), which took place August 24-27, 2020, is conference in the field of image analysis.

 

 

Quaternion Equivariant Capsule Networks for 3D Point Clouds

[pdf] 

[supplementary material] 

 

DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares

[pdf] 

[supplementary material] 

 

NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search

[pdf] 

[supplementary material] 

 

Describing Textures using Natural Language

[pdf] 

[supplementary material] 

 

Empowering Relational Network by Self-Attention Augmented Conditional Random Fields for Group Activity Recognition

[pdf] 

[supplementary material] 

 

AiR: Attention with Reasoning Capability

[pdf] 

[supplementary material] 

 

Self6D: Self-Supervised Monocular 6D Object Pose Estimation

[pdf] 

[supplementary material] 

 

Invertible Image Rescaling

[pdf] 

[supplementary material] 

 

Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation

[pdf] 

[supplementary material] 

 

House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation

[pdf] 

[supplementary material] 

 

Crowdsampling the Plenoptic Function

[pdf] 

[supplementary material] 

 

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

[pdf] 

 

 

End-to-End Object Detection with Transformers

[pdf] 

[supplementary material] 

 

DeepSFM: Structure From Motion Via Deep Bundle Adjustment

[pdf] 

[supplementary material] 

 

Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry

[pdf] 

[supplementary material] 

 

Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

[pdf] 

[supplementary material] 

 

Conditional Convolutions for Instance Segmentation

[pdf] 

[supplementary material] 

 

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

[pdf] 

[supplementary material] 

 

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

[pdf] 

[supplementary material] 

 

Privacy Preserving Structure-from-Motion

[pdf] 

[supplementary material] 

 

Rewriting a Deep Generative Model

[pdf] 

[supplementary material] 

 

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets

[pdf] 

[supplementary material] 

 

Long-term Human Motion Prediction with Scene Context

[pdf] 

[supplementary material] 

 

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

[pdf] 

[supplementary material] 

 

ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes

[pdf] 

[supplementary material] 

 

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

[pdf] 

[supplementary material] 

 

Learning and Aggregating Deep Local Descriptors for Instance-level Recognition

[pdf] 

 

 

A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem

[pdf] 

[supplementary material] 

 

Learn to Recover Visible Color for Video Surveillance in a Day

[pdf] 

[supplementary material] 

 

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images

[pdf] 

[supplementary material] 

 

Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation

[pdf] 

[supplementary material] 

 

BorderDet: Border Feature for Dense Object Detection

[pdf] 

[supplementary material] 

 

Regularization with Latent Space Virtual Adversarial Training

[pdf] 

[supplementary material] 

 

Du²Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

[pdf] 

[supplementary material] 

 

Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning

[pdf] 

 

 

Targeted Attack for Deep Hashing based Retrieval

[pdf] 

[supplementary material] 

 

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

[pdf] 

[supplementary material] 

 

Content-Aware Unsupervised Deep Homography Estimation

[pdf] 

[supplementary material] 

 

Multi-View Optimization of Local Feature Geometry

[pdf] 

[supplementary material] 

 

The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization

[pdf] 

[supplementary material] 

 

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

[pdf] 

[supplementary material] 

 

Learning Stereo from Single Images

[pdf] 

[supplementary material] 

 

Prototype Rectification for Few-Shot Learning

[pdf] 

[supplementary material] 

 

Learning Feature Descriptors using Camera Pose Supervision

[pdf] 

[supplementary material] 

 

Semantic Flow for Fast and Accurate Scene Parsing

[pdf] 

[supplementary material] 

 

Appearance Consensus Driven Self-Supervised Human Mesh Recovery

[pdf] 

[supplementary material] 

 

Diffraction Line Imaging

[pdf] 

[supplementary material] 

 

Aligning and Projecting Images to Class-conditional Generative Networks

[pdf] 

[supplementary material] 

 

Suppress and Balance: A Simple Gated Network for Salient Object Detection

[pdf] 

[supplementary material] 

 

Visual Memorability for Robotic Interestingness via Unsupervised Online Learning

[pdf] 

[supplementary material] 

 

Post-Training Piecewise Linear Quantization for Deep Neural Networks

[pdf] 

[supplementary material] 

 

Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification

[pdf] 

[supplementary material] 

 

In-Home Daily-Life Captioning Using Radio Signals

[pdf] 

[supplementary material] 

 

Self-Challenging Improves Cross-Domain Generalization

[pdf] 

[supplementary material] 

 

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

[pdf] 

[supplementary material] 

 

Multitask Learning Strengthens Adversarial Robustness

[pdf] 

[supplementary material] 

 

S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search

[pdf] 

[supplementary material] 

 

Improving Deep Video Compression by Resolution-adaptive Flow Coding

[pdf] 

[supplementary material] 

 

Motion Capture from Internet Videos

[pdf] 

[supplementary material] 

 

Appearance-Preserving 3D Convolution for Video-based Person Re-identification

[pdf] 

 

 

Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization

[pdf] 

[supplementary material] 

 

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

[pdf] 

[supplementary material] 

 

Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures

[pdf] 

[supplementary material] 

 

Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling

[pdf] 

[supplementary material] 

 

Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction

[pdf] 

[supplementary material] 

 

Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network

[pdf] 

[supplementary material] 

 

Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation

[pdf] 

[supplementary material] 

 

CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image

[pdf] 

[supplementary material] 

 

Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

[pdf] 

[supplementary material] 

 

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

[pdf] 

[supplementary material] 

 

Domain-invariant Stereo Matching Networks

[pdf] 

[supplementary material] 

 

DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling

[pdf] 

[supplementary material] 

 

Content Adaptive and Error Propagation Aware Deep Video Compression

[pdf] 

[supplementary material] 

 

Towards Streaming Perception

[pdf] 

[supplementary material] 

 

Towards Automated Testing and Robustification by Semantic Adversarial Data Generation

[pdf] 

[supplementary material] 

 

Adversarial Generative Grammars for Human Activity Prediction

[pdf] 

[supplementary material] 

 

GDumb: A Simple Approach that Questions Our Progress in Continual Learning

[pdf] 

[supplementary material] 

 

Learning Lane Graph Representations for Motion Forecasting

[pdf] 

[supplementary material] 

 

What Matters in Unsupervised Optical Flow

[pdf] 

[supplementary material] 

 

Synthesis and Completion of Facades from Satellite Imagery

[pdf] 

[supplementary material] 

 

Mapillary Planet-Scale Depth Dataset

[pdf] 

 

 

V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction

[pdf] 

[supplementary material] 

 

Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters

[pdf] 

[supplementary material] 

 

EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning

[pdf] 

[supplementary material] 

 

Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation

[pdf] 

[supplementary material] 

 

Cross-Domain Cascaded Deep Translation

[pdf] 

[supplementary material] 

 

“Look Ma, no landmarks! – Unsupervised, Model-based Dense Face Alignment

[pdf] 

[supplementary material] 

 

Online Invariance Selection for Local Feature Descriptors

[pdf] 

[supplementary material] 

 

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

[pdf] 

[supplementary material] 

 

TextCaps: a Dataset for Image Captioning with Reading Comprehension

[pdf] 

[supplementary material] 

 

It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction

[pdf] 

[supplementary material] 

 

Learning What to Learn for Video Object Segmentation

[pdf] 

[supplementary material] 

 

SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing

[pdf] 

[supplementary material] 

 

LIMP: Learning Latent Shape Representations with Metric Preservation Priors

[pdf] 

[supplementary material] 

 

Unsupervised Sketch to Photo Synthesis

[pdf] 

[supplementary material] 

 

A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions

[pdf] 

[supplementary material] 

 

SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification

[pdf] 

[supplementary material] 

 

Hierarchical Face Aging through Disentangled Latent Characteristics

[pdf] 

[supplementary material] 

 

Hybrid Models for Open Set Recognition

[pdf] 

 

 

TopoGAN: A Topology-Aware Generative Adversarial Network

[pdf] 

[supplementary material] 

 

Learning to Localize Actions from Moments

[pdf] 

[supplementary material] 

 

ForkGAN: Seeing into the Rainy Night

[pdf] 

[supplementary material] 

 

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

[pdf] 

[supplementary material] 

 

ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval

[pdf] 

[supplementary material] 

 

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

[pdf] 

[supplementary material] 

 

ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices

[pdf] 

[supplementary material] 

 

HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation

[pdf] 

[supplementary material] 

 

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

[pdf] 

[supplementary material] 

 

A Unified Framework of Surrogate Loss by Refactoring and Interpolation

[pdf] 

[supplementary material] 

 

Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images

[pdf] 

[supplementary material] 

 

Memory-augmented Dense Predictive Coding for Video Representation Learning

[pdf] 

[supplementary material] 

 

PointMixup: Augmentation for Point Clouds

[pdf] 

[supplementary material] 

 

Identity-Guided Human Semantic Parsing for Person Re-Identification

[pdf] 

 

 

Learning Gradient Fields for Shape Generation

[pdf] 

[supplementary material] 

 

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

[pdf] 

[supplementary material] 

 

Corner Proposal Network for Anchor-free, Two-stage Object Detection

[pdf] 

 

 

PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click

[pdf] 

 

 

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing

[pdf] 

[supplementary material] 

 

Learning Delicate Local Representations for Multi-Person Pose Estimation

[pdf] 

 

 

Learning to Plan with Uncertain Topological Maps

[pdf] 

[supplementary material] 

 

Neural Design Network: Graphic Layout Generation with Constraints

[pdf] 

[supplementary material] 

 

Learning Open Set Network with Discriminative Reciprocal Points

[pdf] 

[supplementary material] 

 

Convolutional Occupancy Networks

[pdf] 

[supplementary material] 

 

Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry

[pdf] 

[supplementary material] 

 

TIDE: A General Toolbox for Identifying Object Detection Errors

[pdf] 

[supplementary material] 

 

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

[pdf] 

[supplementary material] 

 

DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation

[pdf] 

[supplementary material] 

 

Circumventing Outliers of AutoAugment with Knowledge Distillation

[pdf] 

 

 

S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching

[pdf] 

[supplementary material] 

 

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

[pdf] 

[supplementary material] 

 

Video Object Segmentation with Episodic Graph Memory Networks

[pdf] 

[supplementary material] 

 

Rethinking Bottleneck Structure for Efficient Mobile Network Design

[pdf] 

[supplementary material] 

 

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

[pdf] 

[supplementary material] 

 

Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach

[pdf] 

 

 

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets

[pdf] 

[supplementary material] 

 

Contrastive Learning for Weakly Supervised Phrase Grounding

[pdf] 

[supplementary material] 

 

Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-Order Feature Analysis

[pdf] 

[supplementary material] 

 

Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors

[pdf] 

[supplementary material] 

 

TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

[pdf] 

[supplementary material] 

 

Semi-Siamese Training for Shallow Face Learning

[pdf] 

[supplementary material] 

 

GAN Slimming: All-in-One GAN Compression by A Unified Optimization Framework

[pdf] 

[supplementary material] 

 

Human Interaction Learning on 3D Skeleton Point Clouds for Video Violence Recognition

[pdf] 

 

 

Binarized Neural Network for Single Image Super Resolution

[pdf] 

 

 

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

[pdf] 

[supplementary material] 

 

Adaptive Computationally Efficient Network for Monocular 3D Hand Pose Estimation

[pdf] 

[supplementary material] 

 

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

[pdf] 

[supplementary material] 

 

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

[pdf] 

[supplementary material] 

 

Hamiltonian Dynamics for Real-World Shape Interpolation

[pdf] 

[supplementary material] 

 

Learning to Scale Multilingual Representations for Vision-Language Tasks

[pdf] 

[supplementary material] 

 

Multi-modal Transformer for Video Retrieval

[pdf] 

[supplementary material] 

 

Feature Representation Matters: End-to-End Learning for Reference-based Image Super-resolution

[pdf] 

 

 

RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera

[pdf] 

[supplementary material] 

 

Surface Normal Estimation of Tilted Images via Spatial Rectifier

[pdf] 

[supplementary material] 

 

Multimodal Shape Completion via Conditional Generative Adversarial Networks

[pdf] 

[supplementary material] 

 

Generative Sparse Detection Networks for 3D Single-shot Object Detection

[pdf] 

[supplementary material] 

 

Grounded Situation Recognition

[pdf] 

[supplementary material] 

 

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

[pdf] 

[supplementary material] 

 

Unpaired Learning of Deep Image Denoising

[pdf] 

[supplementary material] 

 

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

[pdf] 

[supplementary material] 

 

Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video

[pdf] 

[supplementary material] 

 

Side-Aware Boundary Localization for More Precise Object Detection

[pdf] 

[supplementary material] 

 

SF-Net: Single-Frame Supervision for Temporal Action Localization

[pdf] 

[supplementary material] 

 

Negative Margin Matters: Understanding Margin in Few-shot Classification

[pdf] 

[supplementary material] 

 

Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References

[pdf] 

[supplementary material] 

 

Tracking Objects as Points

[pdf] 

[supplementary material] 

 

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

[pdf] 

[supplementary material] 

 

Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning

[pdf] 

 

 

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

[pdf] 

[supplementary material] 

 

Learning to Factorize and Relight a City

[pdf] 

[supplementary material] 

 

Region Graph Embedding Network for Zero-Shot Learning

[pdf] 

[supplementary material] 

 

GRAB: A Dataset of Whole-Body Human Grasping of Objects

[pdf] 

[supplementary material] 

 

DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects

[pdf] 

[supplementary material] 

 

RANSAC-Flow: Generic Two-stage Image Alignment

[pdf] 

[supplementary material] 

 

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

[pdf] 

[supplementary material] 

 

Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images

[pdf] 

[supplementary material] 

 

Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking

[pdf] 

[supplementary material] 

 

Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference & Application

[pdf] 

[supplementary material] 

 

MovieNet: A Holistic Dataset for Movie Understanding

[pdf] 

[supplementary material] 

 

Short-Term and Long-Term Context Aggregation Network for Video Inpainting

[pdf] 

[supplementary material] 

 

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

[pdf] 

[supplementary material] 

 

Face Super-Resolution Guided by 3D Facial Priors

[pdf] 

[supplementary material] 

 

Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation

[pdf] 

[supplementary material] 

 

Are Labels Necessary for Neural Architecture Search?

[pdf] 

[supplementary material] 

 

BLSM: A Bone-Level Skinned Model of the Human Mesh

[pdf] 

[supplementary material] 

 

Associative Alignment for Few-shot Image Classification

[pdf] 

[supplementary material] 

 

Cyclic Functional Mapping: Self-supervised Correspondence between Non-isometric Deformable Shapes

[pdf] 

 

 

View-Invariant Probabilistic Embedding for Human Pose

[pdf] 

[supplementary material] 

 

Contact and Human Dynamics from Monocular Video

[pdf] 

[supplementary material] 

 

PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation

[pdf] 

[supplementary material] 

 

Points2Surf Learning Implicit Surfaces from Point Clouds

[pdf] 

[supplementary material] 

 

Few-Shot Scene-Adaptive Anomaly Detection

[pdf] 

[supplementary material] 

 

Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting

[pdf] 

[supplementary material] 

 

Entropy Minimisation Framework for Event-based Vision Model Estimation

[pdf] 

[supplementary material] 

 

Reconstructing NBA Players

[pdf] 

[supplementary material] 

 

PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments

[pdf] 

 

 

TENet: Triple Excitation Network for Video Salient Object Detection

[pdf] 

 

 

Deep Feedback Inverse Problem Solver

[pdf] 

[supplementary material] 

 

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

[pdf] 

 

 

Hallucinating Visual Instances in Total Absentia

[pdf] 

[supplementary material] 

 

Weakly-supervised 3D Shape Completion in the Wild

[pdf] 

[supplementary material] 

 

DTVNet: Dynamic Time-lapse Video Generation via Single Still Image

[pdf] 

[supplementary material] 

 

CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss

[pdf] 

[supplementary material] 

 

Collaborative Video Object Segmentation by Foreground-Background Integration

[pdf] 

[supplementary material] 

 

Adaptive Margin Diversity Regularizer for handling Data Imbalance in Zero-Shot SBIR

[pdf] 

 

 

ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation

[pdf] 

[supplementary material] 

 

Calibration-free Structure-from-Motion with Calibrated Radial Trifocal Tensors

[pdf] 

[supplementary material] 

 

Occupancy Anticipation for Efficient Exploration and Navigation

[pdf] 

[supplementary material] 

 

Unified Image and Video Saliency Modeling

[pdf] 

[supplementary material] 

 

TAO: A Large-Scale Benchmark for Tracking Any Object

[pdf] 

[supplementary material] 

 

A Generalization of Otsu’s Method and Minimum Error Thresholding

[pdf] 

[supplementary material] 

 

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

[pdf] 

[supplementary material] 

 

Big Transfer (BiT): General Visual Representation Learning

[pdf] 

[supplementary material] 

 

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

[pdf] 

[supplementary material] 

 

Few-shot Action Recognition with Permutation-invariant Attention

[pdf] 

[supplementary material] 

 

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

[pdf] 

 

 

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling

[pdf] 

[supplementary material] 

 

Learning Visual Context by Comparison

[pdf] 

[supplementary material] 

 

Large Scale Holistic Video Understanding

[pdf] 

[supplementary material] 

 

Indirect Local Attacks for Context-aware Semantic Segmentation Networks

[pdf] 

[supplementary material] 

 

Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings

[pdf] 

[supplementary material] 

 

Connecting Vision and Language with Localized Narratives

[pdf] 

[supplementary material] 

 

Adversarial T-shirt! Evading Person Detectors in A Physical World

[pdf] 

[supplementary material] 

 

Bounding-box Channels for Visual Relationship Detection

[pdf] 

 

 

Minimal Rolling Shutter Absolute Pose with Unknown Focal Length and Radial Distortion

[pdf] 

[supplementary material] 

 

SRFlow: Learning the Super-Resolution Space with Normalizing Flow

[pdf] 

[supplementary material] 

 

DeepGMR: Learning Latent Gaussian Mixture Models for Registration

[pdf] 

[supplementary material] 

 

Active Perception using Light Curtains for Autonomous Driving

[pdf] 

[supplementary material] 

 

Invertible Neural BRDF for Object Inverse Rendering

[pdf] 

 

 

Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network

[pdf] 

[supplementary material] 

 

Practical Deep Raw Image Denoising on Mobile Devices

[pdf] 

[supplementary material] 

 

SoundSpaces: Audio-Visual Navigation in 3D Environments

[pdf] 

[supplementary material] 

 

Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization

[pdf] 

[supplementary material] 

 

Erasing Appearance Preservation in Optimization-based Smoothing

[pdf] 

[supplementary material] 

 

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler

[pdf] 

[supplementary material] 

 

Guided Deep Decoder: Unsupervised Image Pair Fusion

[pdf] 

[supplementary material] 

 

Filter Style Transfer between Photos

[pdf] 

[supplementary material] 

 

JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

[pdf] 

[supplementary material] 

 

Dynamic Group Convolution for Accelerating Convolutional Neural Networks

[pdf] 

[supplementary material] 

 

RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering

[pdf] 

 

 

Object-Contextual Representations for Semantic Segmentation

[pdf] 

[supplementary material] 

 

Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring

[pdf] 

[supplementary material] 

 

Joint Semantic Instance Segmentation on Graphs with the Semantic Mutex Watershed

[pdf] 

[supplementary material] 

 

Photon-Efficient 3D Imaging with A Non-Local Neural Network

[pdf] 

[supplementary material] 

 

GeLaTO: Generative Latent Textured Objects

[pdf] 

[supplementary material] 

 

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

[pdf] 

[supplementary material] 

 

Directional Temporal Modeling for Action Recognition

[pdf] 

[supplementary material] 

 

Shonan Rotation Averaging: Global Optimality by Surfing SO(p)(n)

[pdf] 

[supplementary material] 

 

Semantic Curiosity for Active Visual Learning

[pdf] 

[supplementary material] 

 

Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training

[pdf] 

[supplementary material] 

 

ProgressFace: Scale-Aware Progressive Learning for Face Detection

[pdf] 

[supplementary material] 

 

Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

[pdf] 

[supplementary material] 

 

CoTeRe-Net: Discovering Collaborative Ternary Relations in Videos

[pdf] 

 

 

Modeling the Effects of Windshield Refraction for Camera Calibration

[pdf] 

[supplementary material] 

 

Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images through Generative Latent Search

[pdf] 

[supplementary material] 

 

PROFIT: A Novel Training Method for sub-4-bit MobileNet Models

[pdf] 

[supplementary material] 

 

Visual Relation Grounding in Videos

[pdf] 

[supplementary material] 

 

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

[pdf] 

[supplementary material] 

 

Controlling Style and Semantics in Weakly-Supervised Image Generation

[pdf] 

[supplementary material] 

 

Jointly learning visual motion and confidence from local patches in event cameras

[pdf] 

[supplementary material] 

 

SODA: Story Oriented Dense Video Captioning Evaluation Framework

[pdf] 

[supplementary material] 

 

Sketch-Guided Object Localization in Natural Images

[pdf] 

[supplementary material] 

 

A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses

[pdf] 

[supplementary material] 

 

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

[pdf] 

[supplementary material] 

 

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

[pdf] 

[supplementary material] 

 

STAR: Sparse Trained Articulated Human Body Regressor

[pdf] 

[supplementary material] 

 

Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer

[pdf] 

[supplementary material] 

 

Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning

[pdf] 

[supplementary material] 

 

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

[pdf] 

[supplementary material] 

 

Learning 3D Part Assembly from a Single Image

[pdf] 

[supplementary material] 

 

PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions

[pdf] 

[supplementary material] 

 

Highly Efficient Salient Object Detection with 100K Parameters

[pdf] 

[supplementary material] 

 

HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing

[pdf] 

 

 

Lifespan Age Transformation Synthesis

[pdf] 

[supplementary material] 

 

Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation

[pdf] 

[supplementary material] 

 

Simulating Content Consistent Vehicle Datasets with Attribute Descent

[pdf] 

 

 

Multiview Detection with Feature Perspective Transformation

[pdf] 

[supplementary material] 

 

Learning Object Relation Graph and Tentative Policy for Visual Navigation

[pdf] 

[supplementary material] 

 

Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

[pdf] 

 

 

Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning

[pdf] 

[supplementary material] 

 

Inducing Optimal Attribute Representations for Conditional GANs

[pdf] 

[supplementary material] 

 

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

[pdf] 

[supplementary material] 

 

Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation

[pdf] 

[supplementary material] 

 

Consistency Guided Scene Flow Estimation

[pdf] 

[supplementary material] 

 

Autoregressive Unsupervised Image Segmentation

[pdf] 

[supplementary material] 

 

Controllable Image Synthesis via SegVAE

[pdf] 

[supplementary material] 

 

Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search

[pdf] 

[supplementary material] 

 

Efficient Non-Line-of-Sight Imaging from Transient Sinograms

[pdf] 

[supplementary material] 

 

Texture Hallucination for Large-Factor Painting Super-Resolution

[pdf] 

[supplementary material] 

 

Learning Progressive Joint Propagation for Human Motion Prediction

[pdf] 

[supplementary material] 

 

Image Stitching and Rectification for Hand-Held Cameras

[pdf] 

[supplementary material] 

 

ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds

[pdf] 

[supplementary material] 

 

The Group Loss for Deep Metric Learning

[pdf] 

[supplementary material] 

 

Learning Object Depth from Camera Motion and Video Object Segmentation

[pdf] 

[supplementary material] 

 

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

[pdf] 

[supplementary material] 

 

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction

[pdf] 

[supplementary material] 

 

Intra-class Feature Variation Distillation for Semantic Segmentation

[pdf] 

 

 

Temporal Distinct Representation Learning for Action Recognition

[pdf] 

 

 

Representative Graph Neural Network

[pdf] 

[supplementary material] 

 

Deformation-Aware 3D Model Embedding and Retrieval

[pdf] 

[supplementary material] 

 

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

[pdf] 

[supplementary material] 

 

Multiple Class Novelty Detection Under Data Distribution Shift

[pdf] 

[supplementary material] 

 

Colorization of Depth Map via Disentanglement

[pdf] 

[supplementary material] 

 

Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes

[pdf] 

[supplementary material] 

 

GeoGraph: Graph-based multi-view object detection with geometric cues end-to-end

[pdf] 

 

 

Localizing the Common Action Among a Few Videos

[pdf] 

[supplementary material] 

 

TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification

[pdf] 

[supplementary material] 

 

Traffic Accident Benchmark for Causality Recognition

[pdf] 

 

 

Face Anti-Spoofing with Human Material Perception

[pdf] 

[supplementary material] 

 

How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction

[pdf] 

 

 

Multiple Expert Brainstorming for Domain Adaptive Person Re-identification

[pdf] 

 

 

NASA Neural Articulated Shape Approximation

[pdf] 

[supplementary material] 

 

Towards Unique and Informative Captioning of Images

[pdf] 

[supplementary material] 

 

When Does Self-supervision Improve Few-shot Learning?

[pdf] 

[supplementary material] 

 

Two-branch Recurrent Network for Isolating Deepfakes in Videos

[pdf] 

 

 

Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment

[pdf] 

[supplementary material] 

 

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

[pdf] 

[supplementary material] 

 

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

[pdf] 

 

 

Global Distance-distributions Separation for Unsupervised Person Re-identification

[pdf] 

[supplementary material] 

 

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image

[pdf] 

[supplementary material] 

 

Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose

[pdf] 

[supplementary material] 

 

ALRe: Outlier Detection for Guided Refinement

[pdf] 

 

 

Weakly-Supervised Crowd Counting Learns from Sorting rather than Locations

[pdf] 

 

 

Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition

[pdf] 

[supplementary material] 

 

Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

[pdf] 

[supplementary material] 

 

Curriculum DeepSDF

[pdf] 

 

 

Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance

[pdf] 

[supplementary material] 

 

Improved Adversarial Training via Learned Optimizer

[pdf] 

[supplementary material] 

 

Component Divide-and-Conquer for Real-World Image Super-Resolution

[pdf] 

[supplementary material] 

 

Enabling Deep Residual Networks for Weakly Supervised Object Detection

[pdf] 

[supplementary material] 

 

Deep near-light photometric stereo for spatially varying reflectances

[pdf] 

[supplementary material] 

 

Learning Visual Representations with Caption Annotations

[pdf] 

[supplementary material] 

 

Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier

[pdf] 

[supplementary material] 

 

Regression of Instance Boundary by Aggregated CNN and GCN

[pdf] 

[supplementary material] 

 

Social Adaptive Module for Weakly-supervised Group Activity Recognition

[pdf] 

 

 

RGB-D Salient Object Detection with Cross-Modality Modulation and Selection

[pdf] 

[supplementary material] 

 

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

[pdf] 

[supplementary material] 

 

Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection

[pdf] 

[supplementary material] 

 

Faster Person Re-Identification

[pdf] 

 

 

Quantization Guided JPEG Artifact Correction

[pdf] 

[supplementary material] 

 

3PointTM: Faster Measurement of High-Dimensional Transmission Matrices

[pdf] 

 

 

Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer

[pdf] 

[supplementary material] 

 

Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction

[pdf] 

[supplementary material] 

 

World-Consistent Video-to-Video Synthesis

[pdf] 

[supplementary material] 

 

Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation

[pdf] 

 

 

GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild

[pdf] 

[supplementary material] 

 

Event-based Asynchronous Sparse Convolutional Networks

[pdf] 

[supplementary material] 

 

AtlantaNet: Inferring the 3D Indoor Layout from a Single 360() Image beyond the Manhattan World Assumption

[pdf] 

[supplementary material] 

 

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

[pdf] 

[supplementary material] 

 

REMIND Your Neural Network to Prevent Catastrophic Forgetting

[pdf] 

[supplementary material] 

 

Image Classification in the Dark using Quanta Image Sensors

[pdf] 

[supplementary material] 

 

n-Reference Transfer Learning for Saliency Prediction

[pdf] 

[supplementary material] 

 

Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection

[pdf] 

[supplementary material] 

 

Bottom-Up Temporal Action Localization with Mutual Regularization

[pdf] 

[supplementary material] 

 

On Modulating the Gradient for Meta-Learning

[pdf] 

[supplementary material] 

 

Domain-Specific Mappings for Generative Adversarial Style Transfer

[pdf] 

[supplementary material] 

 

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

[pdf] 

 

 

DHP: Differentiable Meta Pruning via HyperNetworks

[pdf] 

[supplementary material] 

 

Deep Transferring Quantization

[pdf] 

[supplementary material] 

 

Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification

[pdf] 

 

 

Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?

[pdf] 

 

 

Arbitrary-Oriented Object Detection with Circular Smooth Label

[pdf] 

[supplementary material] 

 

Learning Event-Driven Video Deblurring and Interpolation

[pdf] 

[supplementary material] 

 

Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference

[pdf] 

[supplementary material] 

 

Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation

[pdf] 

[supplementary material] 

 

CSCL: Critical Semantic-Consistent Learning for Unsupervised Domain Adaptation

[pdf] 

[supplementary material] 

 

Prototype Mixture Models for Few-shot Semantic Segmentation

[pdf] 

[supplementary material] 

 

Webly Supervised Image Classification with Self-Contained Confidence

[pdf] 

[supplementary material] 

 

Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

[pdf] 

[supplementary material] 

 

Monocular 3D Object Detection via Feature Domain Adaptation

[pdf] 

[supplementary material] 

 

AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation

[pdf] 

[supplementary material] 

 

VPN: Learning Video-Pose Embedding for Activities of Daily Living

[pdf] 

[supplementary material] 

 

Soft Anchor-Point Object Detection

[pdf] 

[supplementary material] 

 

Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid

[pdf] 

[supplementary material] 

 

Soft Expert Reward Learning for Vision-and-Language Navigation

[pdf] 

 

 

Part-aware Prototype Network for Few-shot Semantic Segmentation

[pdf] 

[supplementary material] 

 

Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization

[pdf] 

[supplementary material] 

 

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

[pdf] 

[supplementary material] 

 

Whole-Body Human Pose Estimation in the Wild

[pdf] 

[supplementary material] 

 

Relative Pose Estimation of Calibrated Cameras with Known SE(3) Invariants

[pdf] 

[supplementary material] 

 

Sequential Convolution and Runge-Kutta Residual Architecture for Image Compressed Sensing

[pdf] 

[supplementary material] 

 

Deep Hough Transform for Semantic Line Detection

[pdf] 

[supplementary material] 

 

Structured Landmark Detection via Topology-Adapting Deep Graph Learning

[pdf] 

[supplementary material] 

 

3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning

[pdf] 

[supplementary material] 

 

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization

[pdf] 

[supplementary material] 

 

Contrastive Learning for Unpaired Image-to-Image Translation

[pdf] 

[supplementary material] 

 

DLow: Diversifying Latent Flows for Diverse Human Motion Prediction

[pdf] 

[supplementary material] 

 

GRNet: Gridding Residual Network for Dense Point Cloud Completion

[pdf] 

[supplementary material] 

 

Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition

[pdf] 

 

 

Blind Face Restoration via Deep Multi-scale Component Dictionaries

[pdf] 

[supplementary material] 

 

Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods

[pdf] 

[supplementary material] 

 

Inequality-Constrained and Robust 3D Face Model Fitting

[pdf] 

[supplementary material] 

 

Gabor Layers Enhance Network Robustness

[pdf] 

[supplementary material] 

 

Conditional Image Repainting via Semantic Bridge and Piecewise Value Function

[pdf] 

[supplementary material] 

 

Learnable Cost Volume Using the Cayley Representation

[pdf] 

[supplementary material] 

 

HALO: Hardware-Aware Learning to Optimize

[pdf] 

[supplementary material] 

 

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

[pdf] 

[supplementary material] 

 

BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition

[pdf] 

 

 

Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision

[pdf] 

[supplementary material] 

 

Domain Adaptive Semantic Segmentation Using Weak Labels

[pdf] 

[supplementary material] 

 

Knowledge Distillation Meets Self-Supervision

[pdf] 

[supplementary material] 

 

Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions

[pdf] 

[supplementary material] 

 

Reconstructing the Noise Variance Manifold for Image Denoising

[pdf] 

[supplementary material] 

 

Occlusion-Aware Depth Estimation with Adaptive Normal Constraints

[pdf] 

[supplementary material] 

 

VisualEchoes: Spatial Image Representation Learning through Echolocation

[pdf] 

[supplementary material] 

 

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

[pdf] 

[supplementary material] 

 

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

[pdf] 

[supplementary material] 

 

Spatially Aware Multimodal Transformers for TextVQA

[pdf] 

[supplementary material] 

 

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

[pdf] 

[supplementary material] 

 

URIE: Universal Image Enhancement for Visual Recognition in the Wild

[pdf] 

[supplementary material] 

 

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

[pdf] 

[supplementary material] 

 

SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning

[pdf] 

 

 

Unpaired Image-to-Image Translation using Adversarial Consistency Loss

[pdf] 

[supplementary material] 

 

Discriminability Distillation in Group Representation Learning

[pdf] 

[supplementary material] 

 

Monocular Expressive Body Regression through Body-Driven Attention

[pdf] 

[supplementary material] 

 

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

[pdf] 

[supplementary material] 

 

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

[pdf] 

[supplementary material] 

 

Federated Visual Classification with Real-World Data Distribution

[pdf] 

[supplementary material] 

 

Robust Re-Identification by Multiple Views Knowledge Distillation

[pdf] 

[supplementary material] 

 

Defocus Deblurring Using Dual-Pixel Data

[pdf] 

[supplementary material] 

 

RhyRNN: Rhythmic RNN for Recognizing Events in Long and Complex Videos

[pdf] 

 

 

Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping

[pdf] 

[supplementary material] 

 

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

[pdf] 

[supplementary material] 

 

Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

[pdf] 

[supplementary material] 

 

Learning to Learn with Variational Information Bottleneck for Domain Generalization

[pdf] 

[supplementary material] 

 

Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis

[pdf] 

[supplementary material] 

 

Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks

[pdf] 

[supplementary material] 

 

Layered Neighborhood Expansion for Incremental Multiple Graph Matching

[pdf] 

 

 

SCAN: Learning to Classify Images without Labels

[pdf] 

[supplementary material] 

 

Graph convolutional networks for learning with few clean and many noisy labels

[pdf] 

[supplementary material] 

 

Object-and-Action Aware Model for Visual Language Navigation

[pdf] 

 

 

A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation

[pdf] 

[supplementary material] 

 

MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution

[pdf] 

[supplementary material] 

 

Efficient Semantic Video Segmentation with Per-frame Inference

[pdf] 

[supplementary material] 

 

Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers

[pdf] 

[supplementary material] 

 

Deep Spiking Neural Network: Energy Efficiency Through Time based Coding

[pdf] 

 

 

InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

[pdf] 

[supplementary material] 

 

Utilizing Patch-level Category Activation Patterns for Multiple Class Novelty Detection

[pdf] 

[supplementary material] 

 

People as Scene Probes

[pdf] 

[supplementary material] 

 

Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes

[pdf] 

[supplementary material]