



2020年欧洲计算机视觉会议(ECCV 2020)于2020年8月24-27日召开,是欧洲图像分析领域的权威会议。



Quaternion Equivariant Capsule Networks for 3D Point Clouds


DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares


NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search


Describing Textures using Natural Language


Empowering Relational Network by Self-Attention Augmented Conditional Random Fields for Group Activity Recognition


AiR: Attention with Reasoning Capability


Self6D: Self-Supervised Monocular 6D Object Pose Estimation


Invertible Image Rescaling


Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation


House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation


Crowdsampling the Plenoptic Function


VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment




End-to-End Object Detection with Transformers


DeepSFM: Structure From Motion Via Deep Bundle Adjustment


Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry


Segment as Points for Efficient Online Multi-Object Tracking and Segmentation


Conditional Convolutions for Instance Segmentation


MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution


Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset


Privacy Preserving Structure-from-Motion


Rewriting a Deep Generative Model


Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets


Long-term Human Motion Prediction with Scene Context


NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis


ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes


MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images


Learning and Aggregating Deep Local Descriptors for Instance-level Recognition




A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem


Learn to Recover Visible Color for Video Surveillance in a Day


Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images


Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation


BorderDet: Border Feature for Dense Object Detection


Regularization with Latent Space Virtual Adversarial Training


Du²Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels


Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning




Targeted Attack for Deep Hashing based Retrieval


Gradient Centralization: A New Optimization Technique for Deep Neural Networks


Content-Aware Unsupervised Deep Homography Estimation


Multi-View Optimization of Local Feature Geometry


The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization


Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video


Learning Stereo from Single Images


Prototype Rectification for Few-Shot Learning


Learning Feature Descriptors using Camera Pose Supervision


Semantic Flow for Fast and Accurate Scene Parsing


Appearance Consensus Driven Self-Supervised Human Mesh Recovery


Diffraction Line Imaging


Aligning and Projecting Images to Class-conditional Generative Networks


Suppress and Balance: A Simple Gated Network for Salient Object Detection


Visual Memorability for Robotic Interestingness via Unsupervised Online Learning


Post-Training Piecewise Linear Quantization for Deep Neural Networks


Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification


In-Home Daily-Life Captioning Using Radio Signals


Self-Challenging Improves Cross-Domain Generalization


A Competence-aware Curriculum for Visual Concepts Learning via Question Answering


Multitask Learning Strengthens Adversarial Robustness


S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search


Improving Deep Video Compression by Resolution-adaptive Flow Coding


Motion Capture from Internet Videos


Appearance-Preserving 3D Convolution for Video-based Person Re-identification




Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization


Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation


Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures


Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling


Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction


Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network


Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation


CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image


Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs


RAFT: Recurrent All-Pairs Field Transforms for Optical Flow


Domain-invariant Stereo Matching Networks


DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling


Content Adaptive and Error Propagation Aware Deep Video Compression


Towards Streaming Perception


Towards Automated Testing and Robustification by Semantic Adversarial Data Generation


Adversarial Generative Grammars for Human Activity Prediction


GDumb: A Simple Approach that Questions Our Progress in Continual Learning


Learning Lane Graph Representations for Motion Forecasting


What Matters in Unsupervised Optical Flow


Synthesis and Completion of Facades from Satellite Imagery


Mapillary Planet-Scale Depth Dataset




V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction


Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters


EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning


Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation


Cross-Domain Cascaded Deep Translation


“Look Ma, no landmarks! – Unsupervised, Model-based Dense Face Alignment


Online Invariance Selection for Local Feature Descriptors


Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations


TextCaps: a Dataset for Image Captioning with Reading Comprehension


It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction


Learning What to Learn for Video Object Segmentation


SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing


LIMP: Learning Latent Shape Representations with Metric Preservation Priors


Unsupervised Sketch to Photo Synthesis


A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions


SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification


Hierarchical Face Aging through Disentangled Latent Characteristics


Hybrid Models for Open Set Recognition




TopoGAN: A Topology-Aware Generative Adversarial Network


Learning to Localize Actions from Moments


ForkGAN: Seeing into the Rainy Night


TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning


ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval


TSIT: A Simple and Versatile Framework for Image-to-Image Translation


ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices


HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation


Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve


A Unified Framework of Surrogate Loss by Refactoring and Interpolation


Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images


Memory-augmented Dense Predictive Coding for Video Representation Learning


PointMixup: Augmentation for Point Clouds


Identity-Guided Human Semantic Parsing for Person Re-Identification




Learning Gradient Fields for Shape Generation


COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder


Corner Proposal Network for Anchor-free, Two-stage Object Detection




PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click




Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing


Learning Delicate Local Representations for Multi-Person Pose Estimation




Learning to Plan with Uncertain Topological Maps


Neural Design Network: Graphic Layout Generation with Constraints


Learning Open Set Network with Discriminative Reciprocal Points


Convolutional Occupancy Networks


Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry


TIDE: A General Toolbox for Identifying Object Detection Errors


PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding


DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation


Circumventing Outliers of AutoAugment with Knowledge Distillation




S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching


RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving


Video Object Segmentation with Episodic Graph Memory Networks


Rethinking Bottleneck Structure for Efficient Mobile Network Design


Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks


Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach




REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets


Contrastive Learning for Weakly Supervised Phrase Grounding


Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-Order Feature Analysis


Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors


TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images


Semi-Siamese Training for Shallow Face Learning


GAN Slimming: All-in-One GAN Compression by A Unified Optimization Framework


Human Interaction Learning on 3D Skeleton Point Clouds for Video Violence Recognition




Binarized Neural Network for Single Image Super Resolution




Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation


Adaptive Computationally Efficient Network for Monocular 3D Hand Pose Estimation


Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking


Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets


Hamiltonian Dynamics for Real-World Shape Interpolation


Learning to Scale Multilingual Representations for Vision-Language Tasks


Multi-modal Transformer for Video Retrieval


Feature Representation Matters: End-to-End Learning for Reference-based Image Super-resolution




RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera


Surface Normal Estimation of Tilted Images via Spatial Rectifier


Multimodal Shape Completion via Conditional Generative Adversarial Networks


Generative Sparse Detection Networks for 3D Single-shot Object Detection


Grounded Situation Recognition


Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos


Unpaired Learning of Deep Image Denoising


Self-supervising Fine-grained Region Similarities for Large-scale Image Localization


Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video


Side-Aware Boundary Localization for More Precise Object Detection


SF-Net: Single-Frame Supervision for Temporal Action Localization


Negative Margin Matters: Understanding Margin in Few-shot Classification


Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References


Tracking Objects as Points


CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis


Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning




MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning


Learning to Factorize and Relight a City


Region Graph Embedding Network for Zero-Shot Learning


GRAB: A Dataset of Whole-Body Human Grasping of Objects


DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects


RANSAC-Flow: Generic Two-stage Image Alignment


Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds


Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images


Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking


Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference & Application


MovieNet: A Holistic Dataset for Movie Understanding


Short-Term and Long-Term Context Aggregation Network for Video Inpainting


DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization


Face Super-Resolution Guided by 3D Facial Priors


Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation


Are Labels Necessary for Neural Architecture Search?


BLSM: A Bone-Level Skinned Model of the Human Mesh


Associative Alignment for Few-shot Image Classification


Cyclic Functional Mapping: Self-supervised Correspondence between Non-isometric Deformable Shapes




View-Invariant Probabilistic Embedding for Human Pose


Contact and Human Dynamics from Monocular Video


PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation


Points2Surf Learning Implicit Surfaces from Point Clouds


Few-Shot Scene-Adaptive Anomaly Detection


Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting


Entropy Minimisation Framework for Event-based Vision Model Estimation


Reconstructing NBA Players


PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments




TENet: Triple Excitation Network for Video Salient Object Detection




Deep Feedback Inverse Problem Solver


Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification




Hallucinating Visual Instances in Total Absentia


Weakly-supervised 3D Shape Completion in the Wild


DTVNet: Dynamic Time-lapse Video Generation via Single Still Image


CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss


Collaborative Video Object Segmentation by Foreground-Background Integration


Adaptive Margin Diversity Regularizer for handling Data Imbalance in Zero-Shot SBIR




ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation


Calibration-free Structure-from-Motion with Calibrated Radial Trifocal Tensors


Occupancy Anticipation for Efficient Exploration and Navigation


Unified Image and Video Saliency Modeling


TAO: A Large-Scale Benchmark for Tracking Any Object


A Generalization of Otsu’s Method and Minimum Error Thresholding


A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks


Big Transfer (BiT): General Visual Representation Learning


VisualCOMET: Reasoning about the Dynamic Context of a Still Image


Few-shot Action Recognition with Permutation-invariant Attention


Character Grounding and Re-Identification in Story of Videos and Text Descriptions




AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling


Learning Visual Context by Comparison


Large Scale Holistic Video Understanding


Indirect Local Attacks for Context-aware Semantic Segmentation Networks


Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings


Connecting Vision and Language with Localized Narratives


Adversarial T-shirt! Evading Person Detectors in A Physical World


Bounding-box Channels for Visual Relationship Detection




Minimal Rolling Shutter Absolute Pose with Unknown Focal Length and Radial Distortion


SRFlow: Learning the Super-Resolution Space with Normalizing Flow


DeepGMR: Learning Latent Gaussian Mixture Models for Registration


Active Perception using Light Curtains for Autonomous Driving


Invertible Neural BRDF for Object Inverse Rendering




Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network


Practical Deep Raw Image Denoising on Mobile Devices


SoundSpaces: Audio-Visual Navigation in 3D Environments


Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization


Erasing Appearance Preservation in Optimization-based Smoothing


Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler


Guided Deep Decoder: Unsupervised Image Pair Fusion


Filter Style Transfer between Photos


JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image


Dynamic Group Convolution for Accelerating Convolutional Neural Networks


RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering




Object-Contextual Representations for Semantic Segmentation


Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring


Joint Semantic Instance Segmentation on Graphs with the Semantic Mutex Watershed


Photon-Efficient 3D Imaging with A Non-Local Neural Network


GeLaTO: Generative Latent Textured Objects


Improving Vision-and-Language Navigation with Image-Text Pairs from the Web


Directional Temporal Modeling for Action Recognition


Shonan Rotation Averaging: Global Optimality by Surfing SO(p)(n)


Semantic Curiosity for Active Visual Learning


Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training


ProgressFace: Scale-Aware Progressive Learning for Face Detection


Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference


CoTeRe-Net: Discovering Collaborative Ternary Relations in Videos




Modeling the Effects of Windshield Refraction for Camera Calibration


Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images through Generative Latent Search


PROFIT: A Novel Training Method for sub-4-bit MobileNet Models


Visual Relation Grounding in Videos


Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows


Controlling Style and Semantics in Weakly-Supervised Image Generation


Jointly learning visual motion and confidence from local patches in event cameras


SODA: Story Oriented Dense Video Captioning Evaluation Framework


Sketch-Guided Object Localization in Natural Images


A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses


Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models


The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement


STAR: Sparse Trained Articulated Human Body Regressor


Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer


Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning


Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians


Learning 3D Part Assembly from a Single Image


PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions


Highly Efficient Salient Object Detection with 100K Parameters


HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing




Lifespan Age Transformation Synthesis


Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation


Simulating Content Consistent Vehicle Datasets with Attribute Descent




Multiview Detection with Feature Perspective Transformation


Learning Object Relation Graph and Tentative Policy for Visual Navigation


Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition




Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning


Inducing Optimal Attribute Representations for Conditional GANs


AR-Net: Adaptive Frame Resolution for Efficient Action Recognition


Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation


Consistency Guided Scene Flow Estimation


Autoregressive Unsupervised Image Segmentation


Controllable Image Synthesis via SegVAE


Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search


Efficient Non-Line-of-Sight Imaging from Transient Sinograms


Texture Hallucination for Large-Factor Painting Super-Resolution


Learning Progressive Joint Propagation for Human Motion Prediction


Image Stitching and Rectification for Hand-Held Cameras


ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds


The Group Loss for Deep Metric Learning


Learning Object Depth from Camera Motion and Video Object Segmentation


OnlineAugment: Online Data Augmentation with Less Domain Knowledge


Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction


Intra-class Feature Variation Distillation for Semantic Segmentation




Temporal Distinct Representation Learning for Action Recognition




Representative Graph Neural Network


Deformation-Aware 3D Model Embedding and Retrieval


Atlas: End-to-End 3D Scene Reconstruction from Posed Images


Multiple Class Novelty Detection Under Data Distribution Shift


Colorization of Depth Map via Disentanglement


Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes


GeoGraph: Graph-based multi-view object detection with geometric cues end-to-end




Localizing the Common Action Among a Few Videos


TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification


Traffic Accident Benchmark for Causality Recognition




Face Anti-Spoofing with Human Material Perception


How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction




Multiple Expert Brainstorming for Domain Adaptive Person Re-identification




NASA Neural Articulated Shape Approximation


Towards Unique and Informative Captioning of Images


When Does Self-supervision Improve Few-shot Learning?


Two-branch Recurrent Network for Isolating Deepfakes in Videos




Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment


BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models


Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation




Global Distance-distributions Separation for Unsupervised Person Re-identification


I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image


Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose


ALRe: Outlier Detection for Guided Refinement




Weakly-Supervised Crowd Counting Learns from Sorting rather than Locations




Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition


Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection


Curriculum DeepSDF




Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance


Improved Adversarial Training via Learned Optimizer


Component Divide-and-Conquer for Real-World Image Super-Resolution


Enabling Deep Residual Networks for Weakly Supervised Object Detection


Deep near-light photometric stereo for spatially varying reflectances


Learning Visual Representations with Caption Annotations


Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier


Regression of Instance Boundary by Aggregated CNN and GCN


Social Adaptive Module for Weakly-supervised Group Activity Recognition




RGB-D Salient Object Detection with Cross-Modality Modulation and Selection


RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval


Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection


Faster Person Re-Identification




Quantization Guided JPEG Artifact Correction


3PointTM: Faster Measurement of High-Dimensional Transmission Matrices




Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer


Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction


World-Consistent Video-to-Video Synthesis


Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation




GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild


Event-based Asynchronous Sparse Convolutional Networks


AtlantaNet: Inferring the 3D Indoor Layout from a Single 360() Image beyond the Manhattan World Assumption


AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification


REMIND Your Neural Network to Prevent Catastrophic Forgetting


Image Classification in the Dark using Quanta Image Sensors


n-Reference Transfer Learning for Saliency Prediction


Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection


Bottom-Up Temporal Action Localization with Mutual Regularization


On Modulating the Gradient for Meta-Learning


Domain-Specific Mappings for Generative Adversarial Style Transfer


DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning




DHP: Differentiable Meta Pruning via HyperNetworks


Deep Transferring Quantization


Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification




Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?




Arbitrary-Oriented Object Detection with Circular Smooth Label


Learning Event-Driven Video Deblurring and Interpolation


Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference


Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation


CSCL: Critical Semantic-Consistent Learning for Unsupervised Domain Adaptation


Prototype Mixture Models for Few-shot Semantic Segmentation


Webly Supervised Image Classification with Self-Contained Confidence


Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization


Monocular 3D Object Detection via Feature Domain Adaptation


AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation


VPN: Learning Video-Pose Embedding for Activities of Daily Living


Soft Anchor-Point Object Detection


Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid


Soft Expert Reward Learning for Vision-and-Language Navigation




Part-aware Prototype Network for Few-shot Semantic Segmentation


Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization


Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos


Whole-Body Human Pose Estimation in the Wild


Relative Pose Estimation of Calibrated Cameras with Known SE(3) Invariants


Sequential Convolution and Runge-Kutta Residual Architecture for Image Compressed Sensing


Deep Hough Transform for Semantic Line Detection


Structured Landmark Detection via Topology-Adapting Deep Graph Learning


3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning


Learning to Balance Specificity and Invariance for In and Out of Domain Generalization


Contrastive Learning for Unpaired Image-to-Image Translation


DLow: Diversifying Latent Flows for Diverse Human Motion Prediction


GRNet: Gridding Residual Network for Dense Point Cloud Completion


Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition




Blind Face Restoration via Deep Multi-scale Component Dictionaries


Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods


Inequality-Constrained and Robust 3D Face Model Fitting


Gabor Layers Enhance Network Robustness


Conditional Image Repainting via Semantic Bridge and Piecewise Value Function


Learnable Cost Volume Using the Cayley Representation


HALO: Hardware-Aware Learning to Optimize


Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling


BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition




Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision


Domain Adaptive Semantic Segmentation Using Weak Labels


Knowledge Distillation Meets Self-Supervision


Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions


Reconstructing the Noise Variance Manifold for Image Denoising


Occlusion-Aware Depth Estimation with Adaptive Normal Constraints


VisualEchoes: Spatial Image Representation Learning through Echolocation


Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval


Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation


Spatially Aware Multimodal Transformers for TextVQA


Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector


URIE: Universal Image Enhancement for Visual Recognition in the Wild


Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation


SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning




Unpaired Image-to-Image Translation using Adversarial Consistency Loss


Discriminability Distillation in Group Representation Learning


Monocular Expressive Body Regression through Body-Driven Attention


Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation


Linguistic Structure Guided Context Modeling for Referring Image Segmentation


Federated Visual Classification with Real-World Data Distribution


Robust Re-Identification by Multiple Views Knowledge Distillation


Defocus Deblurring Using Dual-Pixel Data


RhyRNN: Rhythmic RNN for Recognizing Events in Long and Complex Videos




Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping


Weighing Counts: Sequential Crowd Counting by Reinforcement Learning


Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks


Learning to Learn with Variational Information Bottleneck for Domain Generalization


Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis


Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks


Layered Neighborhood Expansion for Incremental Multiple Graph Matching




SCAN: Learning to Classify Images without Labels


Graph convolutional networks for learning with few clean and many noisy labels


Object-and-Action Aware Model for Visual Language Navigation




A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation


MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution


Efficient Semantic Video Segmentation with Per-frame Inference


Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers


Deep Spiking Neural Network: Energy Efficiency Through Time based Coding




InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling


Utilizing Patch-level Category Activation Patterns for Multiple Class Novelty Detection


People as Scene Probes


Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes


Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions


TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video


Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost


Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation




Modeling 3D Shapes by Reinforcement Learning


LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform


Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision


CN: Channel Normalization For Point Cloud Recognition




Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model




AutoMix: Mixup Networks for Sample Interpolation via Cooperative Barycenter Learning




Scene Text Image Super-resolution in the wild


Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling


Learning Disentangled Representations with Latent Variation Predictability


Deep Space-Time Video Upsampling Networks


Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery


Fast Video Object Segmentation using the Global Context Module


Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos


Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification


MessyTable: Instance Association in Multiple Camera Views


A Unified Framework for Shot Type Classification Based on Subject Centric Lens


BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues


HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization


CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions




Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions




Towards Real-Time Multi-Object Tracking




A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation




Unsupervised Deep Metric Learning with Transformed Attention Consistency and Contrastive Clustering Loss


STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos


Hierarchical Style-based Networks for Motion Synthesis


Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop


Learning to Count in the Crowd from Limited Labeled Data


SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection




Explainable Face Recognition


From Shadow Segmentation to Shadow Removal


Diverse and Admissible Trajectory Prediction through Multimodal Context Understanding


CONFIG: Controllable Neural Face Image Generation


Single View Metrology in the Wild


Procedure Planning in Instructional Videos


Funnel Activation for Visual Recognition




GIQA: Generated Image Quality Assessment




Adversarial Continual Learning


Adapting Object Detectors with Conditional Domain Normalization


HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction




Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction


Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting


Self-supervised Bayesian Deep Learning for Image Recovery with Applications to Compressive Sensing


Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement




Semi-supervised Learning with a Teacher-student Network for Generalized Attribute Prediction


Unsupervised Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification




DPDist: Comparing Point Clouds Using Deep Point Cloud Distance


Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation




DataMix: Efficient Privacy-Preserving Edge-Cloud Inference


Neural Re-Rendering of Humans from a Single Image


Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation


PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration




Why do These Match? Explaining the Behavior of Image Similarity Models


CooGAN: A Memory-Efficient Framework for High-Resolution Facial Attribute Editing


Progressive Transformers for End-to-End Sign Language Production


Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting


Making Affine Correspondences Work in Camera Geometry Computation


Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces


Foley Music: Learning to Generate Music from Videos


Contrastive Multiview Coding


Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses


Generative Low-bitwidth Data Free Quantization


Local Correlation Consistency for Knowledge Distillation


Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild


Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation


CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations


Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues




Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation


SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation


Rethinking the Distribution Gap of Person Re-identification with Camera-based Batch Normalization


AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation




Online Multi-modal Person Search in Videos




Single Image Super-Resolution via a Holistic Attention Network


Can You Read Me Now? Content Aware Rectification using Angle Supervision


Momentum Batch Normalization for Deep Learning with Small Batch Size


AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds


Edge-aware Graph Representation Learning and Reasoning for Face Parsing




BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network


G-LBM:Generative Low-dimensional Background Model Estimation from Video Sequences


H3DNet: 3D Object Detection Using Hybrid Geometric Primitives


Expressive Telepresence via Modular Codec Avatars


Cascade Graph Neural Networks for RGB-D Salient Object Detection




FairALM: Augmented Lagrangian Method for Training Fair Models with Little Regret


Generating Videos of Zero-Shot Compositions of Actions and Objects


ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language


Renovating Parsing R-CNN for Accurate Multiple Human Parsing




Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning




Gradient-Induced Co-Saliency Detection


Nighttime Defogging Using High-Low Frequency Decomposition and Grayscale-Color Networks




SegFix: Model-Agnostic Boundary Refinement for Segmentation


Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction


Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars


Neural Geometric Parser for Single Image Camera Calibration


Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision


Learning Architectures for Binary Networks


Semantic View Synthesis


An Analysis of Sketched IRLS for Accelerated Sparse Residual Regression




Relative Pose from Deep Learned Depth and a Single Affine Correspondence


Video Super-Resolution with Recurrent Structure-Detail Network




Shape Adaptor: A Learnable Resizing Module


Shuffle and Attend: Video Domain Adaptation


DRG: Dual Relation Graph for Human-Object Interaction Detection


Flow-edge Guided Video Completion


End-to-End Trainable Deep Active Contour Models for Automated Image Segmentation: Delineating Buildings in Aerial Imagery


Towards End-to-end Video-based Eye-Tracking


Generating Handwriting via Decoupled Style Descriptors


LEED: Label-Free Expression Editing via Disentanglement


Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards


Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder


Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation


Class-Incremental Domain Adaptation


Anti-Bandit Neural Architecture Search for Model Defense




Wavelet-Based Dual-Branch Network for Image Demoiréing


Low Light Video Enhancement using Synthetic Data Produced with an Intermediate Domain Mapping


Non-Local Spatial Propagation Network for Depth Completion


DanbooRegion: An Illustration Region Dataset


Event Enhanced High-Quality Image Recovery


PackDet: Packed Long-Head Object Detector


A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS


Learning Semantic Neural Tree for Human Parsing


Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation


Burst Denoising via Temporally Shifted Wavelet Transforms


JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans




SimAug: Learning Robust Representations from Simulation for Trajectory Prediction


ScribbleBox: Interactive Annotation Framework for Video Object Segmentation


Rethinking Pseudo-LiDAR Representation


Deep Multi Depth Panoramas for View Synthesis


MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection


ContactPose: A Dataset of Grasps with Object Contact and Hand Pose


API-Net: Robust Generative Classifier via a Single Discriminator


Bias-based Universal Adversarial Patch Attack for Automatic Check-out


Imbalanced Continual Learning with Partitioning Reservoir Sampling


Guided Collaborative Training for Pixel-wise Semi-Supervised Learning


Stacking Networks Dynamically for Image Restoration Based on the Plug-and-Play Framework


Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight




Spatial Attention Pyramid Network for Unsupervised Domain Adaptation


GSIR: Generalizable 3D Shape Interpretation and Reconstruction




Weakly Supervised 3D Object Detection from Lidar Point Cloud


Two-phase Pseudo Label Densification for Self-training based Domain Adaptation


Adaptive Offline Quintuplet Loss for Image-Text Matching


Learning Object Placement by Inpainting for Compositional Data Augmentation


Deep Vectorization of Technical Drawings


CAD-Deform: Deformable Fitting of CAD Models to 3D Scans


An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices




AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points


Multi-Agent Embodied Question Answering in Interactive Environments




Conditional Sequential Modulation for Efficient Global Image Retouching


Segmenting Transparent Objects in the Wild


Length-Controllable Image Captioning


Few-Shot Semantic Segmentation with Democratic Attention Networks


Defocus Blur Detection via Depth Distillation


Motion Guided 3D Pose Estimation from Videos


Reflection Separation via Multi-bounce Polarization State Tracing


SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation




SemanticAdv: Generating Adversarial Examples via Attribute-conditioned Image Editing


Learning with Noisy Class Labels for Instance Segmentation




Deep Image Clustering with Category-Style Representation


Self-supervised Motion Representation via Scattering Local Motion Cues




Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets


BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation


Hard negative examples are hard, but useful


ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions




Video Object Detection via Object-level Temporal Aggregation


Object Detection with a Unified Label Space from Multiple Datasets


Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D


Comprehensive Image Captioning via Scene Graph Decomposition




Symbiotic Adversarial Learning for Attribute-based Person Search


Amplifying Key Cues for Human-Object-Interaction Detection


Rethinking Few-shot Image Classification: A Good Embedding is All You Need?


Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization


Action Localization through Continual Predictive Learning


Generative View-Correlation Adaptation for Semi-Supervised Multi-View Learning




READ: Reciprocal Attention Discriminator for Image-to-Video Re-Identification


3D Human Shape Reconstruction from a Polarization Image


The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification




Improving One-stage Visual Grounding by Recursive Sub-query Construction


Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video


Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision


Content-Consistent Matching for Domain Adaptive Semantic Segmentation




AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting


History Repeats Itself: Human Motion Prediction via Motion Attention


Unsupervised Video Object Segmentation with Joint Hotspot Tracking


SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach


CAFE-GAN: Arbitrary Face Attribute Editing with Complementary Attention Feature


MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection




Latent Topic-aware Multi-Label Classification




Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning




Attract, Perturb, and Explore: Learning a Feature Alignment Network for Semi-supervised Domain Adaptation


Curriculum Manager for Source Selection in Multi-Source Domain Adaptation




Powering One-shot Topological NAS with Stabilized Share-parameter Proxy




Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation


Boundary-preserving Mask R-CNN




Self-supervised Single-view 3D Reconstruction via Semantic Consistency


MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation


Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling


The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation


What is Learned in Deep Uncalibrated Photometric Stereo?


Prior-based Domain Adaptive Object Detection for Hazy and Rainy Conditions


Adversarial Ranking Attack and Defense


ReDro: Efficiently Learning Large-sized SPD Visual Representation


Graph-Based Social Relation Reasoning


EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection


Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency


Asynchronous Interaction Aggregation for Action Detection


Shape and Viewpoint without Keypoints


Learning Attentive and Hierarchical Representations for 3D Shape Recognition




TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search


Associative3D: Volumetric Reconstruction from Sparse Views




PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit


Memory Selection Network for Video Propagation


Disentangled Non-local Neural Networks


URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark


Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup


Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks


Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training


Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip


Knowledge Transfer via Dense Cross-Layer Mutual-Distillation


Matching Guided Distillation


Clustering Driven Deep Autoencoder for Video Anomaly Detection




Learning to Compose Hypercolumns for Visual Correspondence


Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction


Object-based Illumination Estimation with Rendering-aware Neural Networks


Progressive Point Cloud Deconvolution Generation Network


SSCGAN: Facial Attribute Editing via Style Skip Connections




Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology




Learn to Propagate Reliably on Noisy Affinity Graphs


Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search


TANet: Towards Fully Automatic Tooth Arrangement


UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection


GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision


[supplementary material] 


Resolution Switchable Networks for Runtime Efficient Image Recognition


[supplementary material] 


SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation


[supplementary material] 


Learning to Detect Open Classes for Universal Domain Adaptation


[supplementary material] 


Visual Compositional Learning for Human-Object Interaction Detection


[supplementary material] 


Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches


[supplementary material] 


Rethinking Class Activation Mapping for Weakly Supervised Object Localization


[supplementary material] 


OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features


[supplementary material] 


Interpretable Neural Network Decoupling


[supplementary material] 


Omni-sourced Webly-supervised Learning for Video Recognition


[supplementary material] 


CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending


[supplementary material] 


Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation


[supplementary material] 


Estimating People Flows to Better Count Them in Crowded Scenes


[supplementary material] 


Generate to Adapt: Resolution Adaption Network for Surveillance Face Recognition


[supplementary material] 


Learning Feature Embeddings for Discriminant Model based Tracking


[supplementary material] 


WeightNet: Revisiting the Design Space of Weight Networks




Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift


[supplementary material] 


Learning Where to Focus for Efficient Video Object Detection


[supplementary material] 


Learning Object Permanence from Video


[supplementary material] 


Adaptive Text Recognition through Visual Matching


[supplementary material] 


Actions as Moving Points


[supplementary material] 


Learning to Exploit Multiple Vision Modalities by Using Grafted Networks


[supplementary material] 


Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild


[supplementary material] 


3D Fluid Flow Reconstruction Using Compact Light Field PIV


[supplementary material] 


Contextual Diversity for Active Learning


[supplementary material] 


Temporal Aggregate Representations for Long-Range Video Understanding


[supplementary material] 


Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition


[supplementary material] 


General 3D Room Layout from a Single View by Render-and-Compare


[supplementary material] 


Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints


[supplementary material] 


Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability


[supplementary material] 


Yet Another Intermediate-Level Attack




Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction


[supplementary material] 


Early Exit Or Not: Resource-Efficient Blind Quality Enhancement for Compressed Images


[supplementary material] 


PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations


[supplementary material] 


How does Lipschitz Regularization Influence GAN Training?


[supplementary material] 


Infrastructure-based Multi-Camera Calibration using Radial Projections


[supplementary material] 


MotionSqueeze: Neural Motion Feature Learning for Video Understanding


[supplementary material] 


Polarized Optical-Flow Gyroscope


[supplementary material] 


Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation


[supplementary material] 


An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning


[supplementary material] 


On the Effectiveness of Image Rotation for Open Set Domain Adaptation


[supplementary material] 


Combining Task Predictors via Enhancing Joint Predictability


[supplementary material] 


Multi-Scale Positive Sample Refinement for Few-Shot Object Detection


[supplementary material] 


Single-Image Depth Prediction Makes Feature Matching Easier


[supplementary material] 


Deep Reinforced Attention Learning for Quality-Aware Visual Recognition


[supplementary material] 


CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization


[supplementary material] 


Learning Joint Spatial-Temporal Transformations for Video Inpainting


[supplementary material] 


Single Path One-Shot Neural Architecture Search with Uniform Sampling


[supplementary material] 


Learning to Generate Novel Domains for Domain Generalization


[supplementary material] 


Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections


[supplementary material] 


Impact of base dataset design on few-shot image classification


[supplementary material] 


Invertible Zero-Shot Recognition Flows


[supplementary material] 


GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes


[supplementary material] 


Location Sensitive Image Retrieval and Tagging


[supplementary material] 


Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image


[supplementary material] 


Guessing State Tracking for Visual Dialogue




Memory-Efficient Incremental Learning Through Feature Adaptation


[supplementary material] 


Neural Voice Puppetry: Audio-driven Facial Reenactment


[supplementary material] 


One-Shot Unsupervised Cross-Domain Detection


[supplementary material] 


Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks


[supplementary material] 


Probabilistic Future Prediction for Video Scene Understanding


[supplementary material] 


Suppressing Mislabeled Data via Grouping and Self-Attention




Class-wise Dynamic Graph Convolution for Semantic Segmentation


[supplementary material] 


Character-Preserving Coherent Story Visualization


[supplementary material] 


GINet: Graph Interaction Network for Scene Parsing


[supplementary material] 


Tensor Low-Rank Reconstruction for Semantic Segmentation


[supplementary material] 


Attentive Normalization


[supplementary material] 


Count- and Similarity-aware R-CNN for Pedestrian Detection


[supplementary material] 


TRADI: Tracking Deep Neural network Weight Distributions


[supplementary material] 


Spatiotemporal Attacks for Embodied Agents


[supplementary material] 


Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation




Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild


[supplementary material] 


Design and Interpretation of Universal Adversarial Patches in Face Detection


[supplementary material] 


Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild


[supplementary material] 


Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints


[supplementary material] 


Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification




Contextual Heterogeneous Graph Network for Human-Object Interaction Detection


[supplementary material] 


Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning


[supplementary material] 


A Closest Point Proposal for MCMC-based Probabilistic Surface Registration




Interactive Video Object Segmentation Using Global and Local Transfer Modules


[supplementary material] 


End-to-end Interpretable Learning of Non-blind Image Deblurring


[supplementary material] 


Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation




Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection


[supplementary material] 


Rethinking Image Deraining via Rain Streaks and Vapors


[supplementary material] 


Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes




Is Sharing of Egocentric Video Giving Away Your Biometric Signature?


[supplementary material] 


Captioning Images Taken by People Who Are Blind


[supplementary material] 


Improving Semantic Segmentation via Decoupled Body and Edge Supervision


[supplementary material] 


Conditional Entropy Coding for Efficient Video Compression


[supplementary material] 


Differentiable Feature Aggregation Search for Knowledge Distillation




Attention Guided Anomaly Localization in Images


[supplementary material] 


Self-supervised Video Representation Learning by Pace Prediction


[supplementary material] 


Full-Body Awareness from Partial Observations


[supplementary material] 


Reinforced Axial Refinement Network for Monocular 3D Object Detection




Self-Supervised Multi-Task Procedure Learning from Instructional Videos


[supplementary material] 


CosyPose: Consistent multi-view multi-object 6D pose estimation


[supplementary material] 


In-Domain GAN Inversion for Real Image Editing


[supplementary material] 


Key Frame Proposal Network for Efficient Pose Estimation in Videos


[supplementary material] 


Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning


[supplementary material] 


Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs


[supplementary material] 


Cross-Modal Weighting Network for RGB-D Salient Object Detection




Open-set Adversarial Defense


[supplementary material] 


Deep Image Compression using Decoder Side Information


[supplementary material] 


Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation


[supplementary material] 


A Generic Visualization Approach for Convolutional Neural Networks


[supplementary material] 


Interactive Annotation of 3D Object Geometry using 2D Scribbles


[supplementary material] 


Hierarchical Kinematic Human Mesh Recovery


[supplementary material] 


Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation


[supplementary material] 


3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View


[supplementary material] 


We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos


[supplementary material] 


Joint Optimization for Multi-Person Shape Models from Markerless 3D-Scans


[supplementary material] 


Accurate RGB-D Salient Object Detection via Collaborative Learning




Finding Your (3D) Center: 3D Object Detection Using a Learned Loss


[supplementary material] 


Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection




Two Stream Active Query Suggestion for Active Learning in Connectomics


[supplementary material] 


Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images


[supplementary material] 


6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference


[supplementary material] 


Modeling Artistic Workflows for Image Generation and Editing


[supplementary material] 


A Large-scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks with Deep Neural Networks


[supplementary material] 


Hidden Footprints: Learning Contextual Walkability from 3D Human Trails


[supplementary material] 


Self-Supervised Learning of Audio-Visual Objects from Video


[supplementary material] 


GAN-based Garment Generation Using Sewing Pattern Images


[supplementary material] 


Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach


[supplementary material] 


An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds


[supplementary material] 


Monotonicity Prior for Cloud Tomography


[supplementary material] 


Learning Trailer Moments in Full-Length Movies with Co-Contrastive Attention


[supplementary material] 


Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval




Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline


[supplementary material] 


Learning to Generate Grounded Visual Captions without Localization Supervision


[supplementary material] 


Neural Hair Rendering


[supplementary material] 


JNR: Joint-based Neural Rig Representation for Compact 3D Face Modeling


[supplementary material] 


On Disentangling Spoof Trace for Generic Face Anti-Spoofing


[supplementary material] 


Streaming Object Detection for 3-D Point Clouds


[supplementary material] 


NAS-DIP: Learning Deep Image Prior with Neural Architecture Search


[supplementary material] 


Learning to Learn in a Semi-Supervised Fashion


[supplementary material] 


FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning


[supplementary material] 


RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects


[supplementary material] 


Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation


[supplementary material] 


Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes




Towards causal benchmarking of bias in face analysis algorithms


[supplementary material] 


Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation


[supplementary material] 


Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions


[supplementary material] 


Transformation Consistency Regularization – A Semi-Supervised Paradigm for Image-to-Image Translation


[supplementary material] 


LIRA: Lifelong Image Restoration from Unknown Blended Distortions


[supplementary material] 


HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization




SOLO: Segmenting Objects by Locations


[supplementary material] 


Learning to See in the Dark with Events


[supplementary material] 


Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data


[supplementary material] 


Context-Gated Convolution


[supplementary material] 


Polynomial Regression Network for Variable-Number Lane Detection


[supplementary material] 


Structural Deep Metric Learning for Room Layout Estimation




Adaptive Task Sampling for Meta-Learning


[supplementary material] 


Deep Complementary Joint Model for Complex Scene Registration and Few-shot Segmentation on Medical Images




Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems


[supplementary material] 


High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling


[supplementary material] 


Online Ensemble Model Compression using Knowledge Distillation




Deep Learning-based Pupil Center Detection for Fast and Accurate Eye Tracking System


[supplementary material] 


Efficient Residue Number System Based Winograd Convolution


[supplementary material] 


Robust Tracking against Adversarial Attacks


[supplementary material] 


Single-Shot Neural Relighting and SVBRDF Estimation


[supplementary material] 


Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement


[supplementary material] 


Angle-based Search Space Shrinking for Neural Architecture Search


[supplementary material] 


RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition


[supplementary material] 


Towards Fast, Accurate and Stable 3D Dense Face Alignment


[supplementary material] 


Iterative Feature Transformation for Fast and Versatile Universal Style Transfer


[supplementary material] 


CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search


[supplementary material] 


Toward Faster and Simpler Matrix Normalization via Rank-1 Update


[supplementary material] 


Accurate Polarimetric BRDF for Real Polarization Scene Rendering


[supplementary material] 


Lensless Imaging with Focusing Sparse URA Masks in Long-Wave Infrared and its Application for Human Detection


[supplementary material] 


Topology-Preserving Class-Incremental Learning




Inter-Image Communication for Weakly Supervised Localization




UFO²: A Unified Framework towards Omni-supervised Object Detection


[supplementary material] 


iCaps: An Interpretable Classifier via Disentangled Capsule Networks


[supplementary material] 


Detecting Natural Disasters, Damage, and Incidents in the Wild


[supplementary material] 


Dynamic ReLU


[supplementary material] 


Acquiring Dynamic Light Fields through Coded Aperture Camera


[supplementary material] 


Gait Recognition from a Single Image using a Phase-Aware Gait Cycle Reconstruction Network


[supplementary material] 


Informative Sample Mining Network for Multi-Domain Image-to-Image Translation


[supplementary material] 


Spherical Feature Transform for Deep Metric Learning


[supplementary material] 


Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering


[supplementary material] 


Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes


[supplementary material] 


Representation Sharing for Fast Object Detector Search and Beyond


[supplementary material] 


Peeking into occluded joints: A novel framework for crowd pose estimation


[supplementary material] 


RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition


[supplementary material] 


Deep Hashing with Active Pairwise Supervision


[supplementary material] 


Graph Edit Distance Reward: Learning to Edit Scene Graph




Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing


[supplementary material] 


Feature-metric Loss for Self-supervised Learning of Depth and Egomotion




Propagating Over Phrase Relations for One-Stage Visual Grounding




Adversarial Semantic Data Augmentation for Human Pose Estimation




Free View Synthesis


[supplementary material] 


Face Anti-Spoofing via Disentangled Representation Learning


[supplementary material] 


Prime-Aware Adaptive Distillation




Meta-Learning with Network Pruning


[supplementary material] 


Spiral Generative Network for Image Extrapolation


[supplementary material] 


SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches


[supplementary material] 


Few-shot Compositional Font Generation with Dual Memory


[supplementary material] 


PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling


[supplementary material] 


Handcrafted Outlier Detection Revisited


[supplementary material] 


The Average Mixing Kernel Signature


[supplementary material] 


BCNet: Learning Body and Cloth Shape from A Single Image


[supplementary material] 


Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos


[supplementary material] 


Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration


[supplementary material] 


Polysemy Deciphering Network for Human-Object Interaction Detection


[supplementary material] 


PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning


[supplementary material] 


Learning Graph-Convolutional Representations for Point Cloud Denoising




Semantic Line Detection Using Mirror Attention and Comparative Ranking and Matching


[supplementary material] 


A Differentiable Recurrent Surface for Asynchronous Event-Based Data


[supplementary material] 


Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches


[supplementary material] 


LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation


[supplementary material] 


Microscopy Image Restoration with Deep Wiener-Kolmogorov Filters


[supplementary material] 


ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language


[supplementary material] 


JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds


[supplementary material] 


Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior


[supplementary material] 


An Inference Algorithm for Multi-Label MRF-MAP Problems with Clique Size 100


[supplementary material] 


Dual Refinement Underwater Object Detection Network




Multiple Sound Sources Localization from Coarse to Fine


[supplementary material] 


Task-Aware Quantization Network for JPEG Image Compression


[supplementary material] 


Energy-Based Models for Deep Probabilistic Regression


[supplementary material] 


CLOTH3D: Clothed 3D Humans


[supplementary material] 


Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images


[supplementary material] 


CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers




Occlusion-Aware Siamese Network for Human Pose Estimation




Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model


[supplementary material] 


NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image


[supplementary material] 


Model-based occlusion disentanglement for image-to-image translation


[supplementary material] 


Rotation-robust Intersection over Union for 3D Object Detection


[supplementary material] 


New Threats against Object Detector with Non-local Block


[supplementary material] 


Self-Supervised CycleGAN for Object-Preserving Image-to-Image Domain Adaptation


[supplementary material] 


On the Usage of the Trifocal Tensor in Motion Segmentation


[supplementary material] 


3D-Rotation-Equivariant Quaternion Neural Networks


[supplementary material] 


InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image


[supplementary material] 


Active Crowd Counting with Limited Supervision


[supplementary material] 


Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance


[supplementary material] 


Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language


[supplementary material] 


Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On


[supplementary material] 


NODIS: Neural Ordinary Differential Scene Understanding


[supplementary material] 


AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material -


[supplementary material] 


Learning Propagation Rules for Attribution Map Generation


[supplementary material] 


Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference


[supplementary material] 


Learning Predictive Models from Observation and Interaction


[supplementary material] 


Unifying Deep Local and Global Features for Image Search


[supplementary material] 


Human Body Model Fitting by Learned Gradient Descent


[supplementary material] 


DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition


[supplementary material] 


Learning latent representations across multiple data domains using Lifelong VAEGAN


[supplementary material] 


DVI: Depth Guided Video Inpainting for Autonomous Driving


[supplementary material] 


Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation


[supplementary material] 


APRICOT: A Dataset of Physical Adversarial Attacks on Object Detection


[supplementary material] 


Visual Question Answering on Image Sets


[supplementary material] 


Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots


[supplementary material] 


Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations


[supplementary material] 


DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points


[supplementary material] 


Dynamic Low-light Imaging with Quanta Image Sensors


[supplementary material] 


Disambiguating Monocular Depth Estimation with a Single Transient


[supplementary material] 


DSDNet: Deep Structured self-Driving Network


[supplementary material] 


QuEST: Quantized Embedding Space for Transferring Knowledge


[supplementary material] 


EGDCL: An Adaptive Curriculum Learning Framework for Unbiased Glaucoma Diagnosis




Backpropagated Gradient Representations for Anomaly Detection


[supplementary material] 


Dense RepPoints: Representing Visual Objects with Dense Point Sets


[supplementary material] 


On Dropping Clusters to Regularize Graph Convolutional Neural Networks


[supplementary material] 


Adaptive Video Highlight Detection by Learning from User History




Improving 3D Object Detection through Progressive Population Based Augmentation


[supplementary material] 


DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction


[supplementary material] 


SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization


[supplementary material] 


Adversarial Learning for Zero-shot Domain Adaptation




YOLO in the Dark - Domain Adaptation Method for Merging Multiple Models -




Identity-Aware Multi-Sentence Video Description


[supplementary material] 


VQA-LOL: Visual Question Answering under the Lens of Logic


[supplementary material] 


Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation




TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering




Mining Inter-Video Proposal Relations for Video Object Detection


[supplementary material] 


TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval


[supplementary material] 


Minimum Class Confusion for Versatile Domain Adaptation


[supplementary material] 


Large Batch Optimization for Object Detection: Training COCO in 12 Minutes




Towards Practical and Efficient High-Resolution HDR Deghosting with CNN


[supplementary material] 


Monocular Differentiable Rendering for Self-Supervised 3D Object Detection


[supplementary material] 


Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation


[supplementary material] 


Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction


[supplementary material] 


Image-based table recognition: data, model, and evaluation


[supplementary material] 


Group Activity Prediction with Sequential Relational Anticipation Model




PiP: Planning-informed Trajectory Prediction for Autonomous Driving


[supplementary material] 


PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer


[supplementary material] 


Hierarchical Context Embedding for Region-based Object Detection


[supplementary material] 


Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition


[supplementary material] 


Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection


[supplementary material] 


Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction


[supplementary material] 


MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation


[supplementary material] 


Detecting Human-Object Interactions with Action Co-occurrence Priors


[supplementary material] 


Learning Connectivity of Neural Networks from a Topological Perspective




JSTASR: Joint Size and Transparency-Aware Snow Removal Algorithm Based on Modified Partial Convolution and Veiling Effect Removal


[supplementary material] 


Ocean: Object-aware Anchor-free Tracking


[supplementary material] 


Object Tracking using Spatio-Temporal Networks for Future Prediction Location




Pillar-based Object Detection for Autonomous Driving


[supplementary material] 


Sparse Adversarial Attack via Perturbation Factorization


[supplementary material] 


3D Scene Reconstruction from a Single Viewport


[supplementary material] 


Learning to Optimize Domain Specific Normalization for Domain Generalization


[supplementary material] 


Self-supervised Outdoor Scene Relighting


[supplementary material] 


Privacy Preserving Visual SLAM


[supplementary material] 


Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning


[supplementary material] 


Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval




Globally Optimal and Efficient Vanishing Point Estimation in Atlanta World


[supplementary material] 


StyleGAN2 Distillation for Feed-forward Image Manipulation


[supplementary material] 


Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds




Learning Disentangled Representations via Mutual Information Estimation


[supplementary material] 


Challenge-Aware RGBT Tracking




Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration


[supplementary material] 


AutoSimulate: (Quickly) Learning Synthetic Data Generation


[supplementary material] 


LatticeNet: Towards Lightweight Image Super-resolution with Lattice Block




Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation


[supplementary material] 


Active Visual Information Gathering for Vision-Language Navigation


[supplementary material] 


Deep Hough-Transform Line Priors


[supplementary material] 


Unsupervised Shape and Pose Disentanglement for 3D Meshes


[supplementary material] 


CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection


[supplementary material] 


Inclusive GAN: Improving Data and Minority Coverage in Generative Models


[supplementary material] 


SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects


[supplementary material] 


Dive Deeper Into Box for Object Detection


[supplementary material] 


PG-Net: Pixel to Global Matching Network for Visual Tracking


[supplementary material] 


Why Are Deep Representations Good Perceptual Quality Features?


[supplementary material] 


Geometric Estimation via Robust Subspace Recovery




Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification


[supplementary material] 


Human Correspondence Consensus for 3D Object Semantic Understanding


[supplementary material] 


Learning Memory Augmented Cascading Network for Compressed Sensing of Images




Least squares surface reconstruction on arbitrary domains


[supplementary material] 


Task-conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery




Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting


[supplementary material] 


DADA: Differentiable Automatic Data Augmentation




SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans


[supplementary material] 


Kinship Identification through Joint Learning using Kinship Verification Ensembles


[supplementary material] 


Kernelized Memory Network for Video Object Segmentation


[supplementary material] 


A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection




Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation




Temporal Keypoint Matching and Refinement Network for Pose Estimation and Tracking


[supplementary material] 


Neural Point-Based Graphics


[supplementary material] 


FHDe²Net: Full High Definition Demoireing Network


[supplementary material] 


Learning Structural Similarity of User Interface Layouts using Graph Networks


[supplementary material] 


NAS-Count: Counting-by-Density with Neural Architecture Search


[supplementary material] 


Towards Generalization Across Depth for Monocular 3D Object Detection


[supplementary material] 


Margin-Mix: Semi–Supervised Learning for Face Expression Recognition


[supplementary material] 


Principal Feature Visualisation in Convolutional Neural Networks


[supplementary material] 


Progressive Refinement Network for Occluded Pedestrian Detection


[supplementary material] 


Monocular Real-Time Volumetric Performance Capture


[supplementary material] 


The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale


[supplementary material] 


Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction


[supplementary material] 


Disentangling Multiple Features in Video Sequences using Gaussian Processes in Variational Autoencoders


[supplementary material] 


SEN: A Novel Feature Normalization Dissimilarity Measure for Prototypical Few-Shot Learning Networks


[supplementary material] 


Kinematic 3D Object Detection in Monocular Video


[supplementary material] 


Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents


[supplementary material] 


SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding


[supplementary material] 


End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention


[supplementary material] 


Know Your Surroundings: Exploiting Scene Information for Object Tracking


[supplementary material] 


Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases


[supplementary material] 


Anatomy-Aware Siamese Network: Exploiting Semantic Asymmetry for Accurate Pelvic Fracture Detection in X-ray Images




DeepLandscape: Adversarial Modeling of Landscape Videos


[supplementary material] 


GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images


[supplementary material] 


Spatial-Angular Interaction for Light Field Image Super-Resolution


[supplementary material] 


BATS: Binary ArchitecTure Search


[supplementary material] 


A Closer Look at Local Aggregation Operators in Point Cloud Analysis


[supplementary material] 


Look here! A parametric learning based approach to redirect visual attention


[supplementary material] 


Variational Diffusion Autoencoders with Random Walk Sampling


[supplementary material] 


Adaptive Variance Based Label Distribution Learning For Facial Age Estimation




Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency


[supplementary material] 


Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations


[supplementary material] 


VarSR: Variational Super-Resolution Network for Very Low Resolution Images


[supplementary material] 


Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation


[supplementary material] 


Towards Recognizing Unseen Categories in Unseen Domains


[supplementary material] 


Square Attack: a query-efficient black-box adversarial attack via random search


[supplementary material] 


You Are Here: Geolocation by Embedding Maps and Images


[supplementary material] 


Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation


[supplementary material] 


From Image to Stability: Learning Dynamics from Human Pose


[supplementary material] 


LevelSet R-CNN: A Deep Variational Method for Instance Segmentation


[supplementary material] 


Efficient Scale-Permuted Backbone with Learned Resource Distribution


[supplementary material] 


Reducing Distributional Uncertainty by Mutual Information Maximisation and Transferable Feature Learning


[supplementary material] 


Bridging Knowledge Graphs to Generate Scene Graphs


[supplementary material] 


Implicit Latent Variable Model for Scene-Consistent Motion Forecasting


[supplementary material] 


Learning Visual Commonsense for Robust Scene Graph Generation


[supplementary material] 


MPCC: Matching Priors and Conditionals for Clustering


[supplementary material] 


PointAR: Efficient Lighting Estimation for Mobile Augmented Reality




Discrete Point Flow Networks for Efficient Point Cloud Generation


[supplementary material] 


Accelerating Deep Learning with Millions of Classes


[supplementary material] 


Password-conditioned Anonymization and Deanonymization with Face Identity Transformers


[supplementary material] 


Inertial Safety from Structured Light


[supplementary material] 


PointTriNet: Learned Triangulation of 3D Point Sets


[supplementary material] 


Toward Unsupervised, Multi-Object Discovery in Large-Scale Image Collections


[supplementary material] 


Deep Novel View Synthesis from Colored 3D Point Clouds


[supplementary material] 


Consensus-Aware Visual-Semantic Embedding for Image-Text Matching




Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising


[supplementary material] 


Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding




Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition




Polarimetric Multi-View Inverse Rendering


[supplementary material] 


SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information


[supplementary material] 


Improving Face Recognition by Clustering Unlabeled Faces in the Wild


[supplementary material] 


NeuRoRA: Neural Robust Rotation Averaging


[supplementary material] 


SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes


[supplementary material] 


Unsupervised Learning of Optical Flow with Deep Feature Similarity




Blended Grammar Network for Human Parsing




P²Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation


[supplementary material] 


Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs


[supplementary material] 


Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting


[supplementary material] 


BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging


[supplementary material] 


Ultra Fast Structure-aware Deep Lane Detection


[supplementary material] 


Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling


[supplementary material] 


Domain Adaptive Object Detection via Asymmetric Tri-way Faster-RCNN




Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition




Learning Camera-Aware Noise Models


[supplementary material] 


Towards Precise Completion of Deformable Shapes


[supplementary material] 


Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration


[supplementary material] 


Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization


[supplementary material] 


Environment-agnostic Multitask Learning for Natural Language Grounded Navigation


[supplementary material] 


TPFN: Applying Outer Product along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data


[supplementary material] 


ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis


[supplementary material] 


Learning with Privileged Information for Efficient Image Super-Resolution


[supplementary material] 


Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification




Autoencoder-based Graph Construction for Semi-supervised Learning


[supplementary material] 


Virtual Multi-view Fusion for 3D Semantic Segmentation


[supplementary material] 


Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition


[supplementary material] 


Deep Shape from Polarization


[supplementary material] 


A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning




Mind the Discriminability: Asymmetric Adversarial Domain Adaptation


[supplementary material] 


SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates


[supplementary material] 


Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking


[supplementary material] 


Deep FusionNet for Point Cloud Semantic Segmentation


[supplementary material] 


Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information


[supplementary material] 


Dual Adversarial Network for Deep Active Learning




Fully Convolutional Networks for Continuous Sign Language Recognition


[supplementary material] 


Self-adapting confidence estimation for stereo


[supplementary material] 


Deep Surface Normal Estimation on the 2-Sphere with Confidence Guided Semantic Attention


[supplementary material] 


AutoSTR: Efficient Backbone Search for Scene Text Recognition




Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification


[supplementary material] 


Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification


[supplementary material] 


Faster AutoAugment: Learning Augmentation Strategies Using Backpropagation




Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation


[supplementary material] 


Boundary-Aware Cascade Networks for Temporal Action Segmentation


[supplementary material] 


Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation


[supplementary material] 


Inference Graphs for CNN Interpretation


[supplementary material] 


An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension




Improving Query Efficiency of Black-box Adversarial Attack


[supplementary material] 


Self-similarity Student for Partial Label Histopathology Image Segmentation


[supplementary material] 


BioMetricNet: deep unconstrained face verification through learning of metrics regularized onto Gaussian distributions




A Decoupled Learning Scheme for Real-world Burst Denoising from Raw Images


[supplementary material] 


Global-and-Local Relative Position Embedding for Unsupervised Video Summarization


[supplementary material] 


Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms


[supplementary material] 


SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking


[supplementary material] 


CenterNet Heatmap Propagation for Real-time Video Object Detection


[supplementary material] 


Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection


[supplementary material] 


SOLAR: Second-Order Loss and Attention for Image Retrieval


[supplementary material] 


Fixing Localization Errors to Improve Image Classification




PatchPerPix for Instance Segmentation


[supplementary material] 


Attend and Segment: Attention Guided Active Semantic Segmentation


[supplementary material] 


Accelerating CNN Training by Pruning Activation Gradients


[supplementary material] 


Global and Local Enhancement Networks for Paired and Unpaired Image Enhancement


[supplementary material] 


Probabilistic Anchor Assignment with IoU Prediction for Object Detection


[supplementary material] 


Eyeglasses 3D shape reconstruction from a single face image


[supplementary material] 


Temporal Complementary Learning for Video Person Re-Identification




HoughNet: Integrating near and long-range evidence for bottom-up object detection


[supplementary material] 


Graph Wasserstein Correlation Analysis for Movie Retrieval


[supplementary material] 


Context-Aware RCNN: A Baseline for Action Detection in Videos




Full-Time Monocular Road Detection Using Zero-Distribution Prior of Angle of Polarization


[supplementary material] 


A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation


[supplementary material] 


Learning Enriched Features for Real Image Restoration and Enhancement


[supplementary material] 


Detail Preserved Point Cloud Completion via Separated Feature Aggregation


[supplementary material] 


LabelEnc: A New Intermediate Supervision Method for Object Detection


[supplementary material] 


Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets


[supplementary material] 


PAMS: Quantized Super-Resolution via Parameterized Max Scale




SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds


[supplementary material] 


OID: Outlier Identifying and Discarding in Blind Image Deblurring


[supplementary material] 


Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors


[supplementary material] 


Enhanced Sparse Model for Blind Deblurring


[supplementary material] 


SumGraph: Video Summarization via Recursive Graph Modeling


[supplementary material] 


Feature Normalized Knowledge Distillation for Image Classification




A Metric Learning Reality Check


[supplementary material] 


FTL: A universal framework for training low-bit DNNs via Feature Transfer




XingGAN for Person Image Generation


[supplementary material] 


GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering


[supplementary material] 


VCNet: A Robust Approach to Blind Image Inpainting


[supplementary material] 


Learning to Predict Context-adaptive Convolution for Semantic Segmentation




EfficientFCN: Holistically-guided Decoding for Semantic Segmentation




GroSS: Group-Size Series Decomposition for Grouped Architecture Search


[supplementary material] 


Efficient Adversarial Attacks for Visual Object Tracking


[supplementary material] 


Globally-Optimal Event Camera Motion Estimation


[supplementary material] 


Weakly-supervised Learning of Human Dynamics


[supplementary material] 


Journey Towards Tiny Perceptual Super-Resolution


[supplementary material] 


What makes fake images detectable? Understanding properties that generalize


[supplementary material] 


Embedding Propagation: Smoother Manifold for Few-Shot Classification


[supplementary material] 


Category Level Object Pose Estimation via Neural Analysis-by-Synthesis


[supplementary material] 


High-Fidelity Synthesis with Disentangled Representation


[supplementary material] 


PL₁P - Point-line Minimal Problems under Partial Visibility in Three Views


[supplementary material] 


Prediction and Recovery for Adaptive Low-Resolution Person Re-Identification


[supplementary material] 


Learning Canonical Representations for Scene Graph to Image Generation


[supplementary material] 


Adversarial Robustness on In- and Out-Distribution Improves Explainability


[supplementary material] 


Deformable Style Transfer


[supplementary material] 


Aligning Videos in Space and Time


[supplementary material] 


Neural Wireframe Renderer: Learning Wireframe to Image Translations


[supplementary material] 


RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax




Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction


[supplementary material] 


Determining the Relevance of Features for Deep Neural Networks


[supplementary material] 


Weakly Supervised Semantic Segmentation with Boundary Exploration




GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation


[supplementary material] 


DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild




Multi-view adaptive graph convolutions for graph classification




Instance Adaptive Self-Training for Unsupervised Domain Adaptation


[supplementary material] 


Weight Decay Scheduling and Knowledge Distillation for Active Learning




HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs




Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning


[supplementary material] 


Geometry Constrained Weakly Supervised Object Localization


[supplementary material] 


Duality Diagram Similarity: a generic framework for initialization selection in task transfer learning


[supplementary material] 


OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering


[supplementary material] 


Mining self-similarity: Label super-resolution with epitomic representations


[supplementary material] 


AE-OT-GAN: Training GANs from data specific latent distribution


[supplementary material] 


Null-sampling for Interpretable and Fair Representations


[supplementary material] 


Guiding Monocular Depth Estimation Using Depth-Attention Volume


[supplementary material] 


Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping


[supplementary material] 


Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer


[supplementary material] 


BézierSketch: A generative model for scalable vector sketches


[supplementary material] 


Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation


[supplementary material] 


Domain Adaptation Through Task Distillation


[supplementary material] 


PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning


[supplementary material] 


More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning


[supplementary material] 


Extending and Analyzing Self-Supervised Learning Across Domains


[supplementary material] 


Multi-Source Open-Set Deep Adversarial Domain Adaptation


[supplementary material] 


Neural Batch Sampling with Reinforcement Learning for Semi-Supervised Anomaly Detection


[supplementary material] 


LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities


[supplementary material] 


Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images




Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion


[supplementary material] 


Proposal-based Video Completion


[supplementary material] 


HGNet: Hybrid Generative Network for Zero-shot Domain Adaptation




Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding




DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks


[supplementary material] 


All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling


[supplementary material] 


A Broader Study of Cross-Domain Few-Shot Learning


[supplementary material] 


Practical Poisoning Attacks on Neural Networks


[supplementary material] 


Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification




Learn distributed GAN with Temporary Discriminators


[supplementary material] 


SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems




Improving Adversarial Robustness by Enforcing Local and Global Compactness


[supplementary material] 


TopoAL: An Adversarial Learning Approach for Topology-Aware Road Segmentation


[supplementary material] 


Channel selection using Gumbel Softmax


[supplementary material] 


Exploiting Temporal Coherence for Self-Supervised One-shot Video Re-identification


[supplementary material] 


An Efficient Training Framework for Reversible Neural Architectures




Box2Seg: Attention Weighted Loss and Discriminative Feature Learning for Weakly Supervised Segmentation


[supplementary material] 


FreeCam3D: Snapshot Structured Light 3D with Freely-Moving Cameras


[supplementary material] 


One-Pixel Signature: Characterizing CNN Models for Backdoor Detection




Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning


[supplementary material] 


Structure-Aware Generation Network for Recipe Generation from Images




A Simple and Effective Framework for Pairwise Deep Metric Learning


[supplementary material] 


Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner


[supplementary material] 


A Recurrent Transformer Network for Novel View Action Synthesis


[supplementary material] 


Multi-view Action Recognition using Cross-view Video Prediction


[supplementary material] 


Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation




SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction


[supplementary material] 


Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation


[supplementary material] 


Efficient Outdoor 3D Point Cloud Semantic Segmentation for Critical Road Objects and Distributed Contexts




Attributional Robustness Training using Input-Gradient Spatial Alignment


[supplementary material] 


Reducing the Sim-to-Real Gap for Event Cameras


[supplementary material] 


Spatial Geometric Reasoning for Room Layout Estimation via Deep Reinforcement Learning




Learning Data Augmentation Strategies for Object Detection


[supplementary material] 


DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search




A Closer Look at Generalisation in RAVEN


[supplementary material] 


Supervised Edge Attention Network for Accurate Image Instance Segmentation




Discriminative Partial Domain Adversarial Network


[supplementary material] 


Differentiable Programming for Hyperspectral Unmixing using a Physics-based Dispersion Model


[supplementary material] 


Deep Cross-species Feature Learning for Animal Face Recognition via Residual Interspecies Equivariant Network




Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes


[supplementary material] 


Sound2Sight: Generating Visual Dynamics from Sound and Context


[supplementary material] 


3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection




NoiseRank: Unsupervised Label Noise Reduction with Dependence Models




Fast Adaptation to Super-Resolution Networks via Meta-Learning




TP-LSD: Tri-Points Based Line Segment Detector


[supplementary material] 


SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation


[supplementary material] 


An Attention-driven Two-stage Clustering Method for Unsupervised Person Re-Identification


[supplementary material] 


Toward Fine-grained Facial Expression Manipulation




Adaptive Object Detection with Dual Multi-Label Prediction




Table Structure Recognition using Top-Down and Bottom-Up Cues


[supplementary material] 


Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-Encoder


[supplementary material] 


Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments


[supplementary material] 


Boundary Content Graph Neural Network for Temporal Action Proposal Generation


[supplementary material] 


Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition


[supplementary material] 


VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval




Attention-Based Query Expansion Learning




Interpretable Foreground Object Search As Knowledge Distillation




Improving Knowledge Distillation via Category Structure




High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images


[supplementary material] 


Attentive Prototype Few-shot Learning with Capsule Network-based Embedding




Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances


[supplementary material] 


DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving


[supplementary material] 


Visual-Relation Conscious Image Generation from Structured-Text


[supplementary material] 


Patch-wise Attack for Fooling Deep Neural Network


[supplementary material] 


Feature Pyramid Transformer


[supplementary material] 


MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module




Guided Saliency Feature Learning for Person Re-identification in Crowded Scenes




Asymmetric Two-Stream Architecture for Accurate RGB-D Saliency Detection




Explaining Image Classifiers using Statistical Fault Localization




Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers


[supplementary material] 


Learning Video Representations by Transforming Time


[supplementary material] 


Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation




Variational Connectionist Temporal Classification




End-to-end Dynamic Matching Network for Multi-view Multi-person 3d Pose Estimation


[supplementary material] 


Orderly Disorder in Point Cloud Domain




Deep Decomposition Learning for Inverse Imaging Problems




FLOT: Scene Flow on Point Clouds guided by Optimal Transport


[supplementary material] 


Accurate Reconstruction of Oriented 3D Points using Affine Correspondences


[supplementary material] 


Volumetric Transformer Networks




360(o) Camera Alignment via Segmentation


[supplementary material] 


A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval




Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks


[supplementary material] 


Guided Semantic Flow


[supplementary material] 


Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation


[supplementary material] 


Measuring the Importance of Temporal Features in Video Saliency


[supplementary material] 


Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution


[supplementary material] 


Towards Reliable Evaluation of Algorithms for Road Network Reconstruction from Aerial Images


[supplementary material] 


Online Continual Learning under Extreme Memory Constraints


[supplementary material] 


Learning to Cluster under Domain Shift


[supplementary material] 


Defense Against Adversarial Attacks via Controlling Gradient Leaking on Embedded Manifolds


[supplementary material] 


Improving Optical Flow on a Pyramid Level


[supplementary material] 


Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations


[supplementary material] 


Learning to Learn Parameterized Classification Networks for Scalable Input Images


[supplementary material] 


Stereo Event-based Particle Tracking Velocimetry for 3D Fluid Flow Reconstruction


[supplementary material] 


Simplicial Complex based Point Correspondence between Images warped onto Manifolds


[supplementary material] 


Representation Learning on Visual-Symbolic Graphs for Video Understanding


[supplementary material] 


Distance-Normalized Unified Representation for Monocular 3D Object Detection




Sequential Deformation for Accurate Scene Text Detection




Where to Explore Next? ExHistCNN for History-aware Autonomous 3D Exploration


[supplementary material] 


Semi-Supervised Segmentation based on Error-Correcting Supervision




Quantum-soft QUBO Suppression for Accurate Object Detection




Label-similarity Curriculum Learning


[supplementary material] 


Recurrent Image Annotation With Explicit Inter-Label Dependencies




Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution




SimPose: Effectively Learning DensePose and Surface Normals of People from Simulated Data




ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images


[supplementary material] 


Differentiable Joint Pruning and Quantization for Hardware Efficiency




Learning to Generate Customized Dynamic 3D Facial Expressions


[supplementary material] 


LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors


[supplementary material] 


Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration




Jointly De-biasing Face Recognition and Demographic Attribute Estimation


[supplementary material] 


Regularized Loss for Weakly Supervised Single Class Semantic Segmentation


[supplementary material] 


Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks


[supplementary material] 


Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations


[supplementary material] 


Inherent Adversarial Robustness of Deep Spiking Neural Networks: Effects of Discrete Input Encoding and Non-Linear Activations


[supplementary material] 


Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks


[supplementary material] 


Learning to Learn Words from Visual Scenes


[supplementary material] 


On Transferability of Histological Tissue Labels in Computational Pathology


[supplementary material] 


Learning Actionness via Long-range Temporal Order Verification


[supplementary material] 


Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays


[supplementary material] 


Character Region Attention For Text Spotting




Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network




Dual Mixup Regularized Learning for Adversarial Domain Adaptation




Robust and On-the-fly Dataset Denoising for Image Classification


[supplementary material] 


Imaging Behind Occluders Using Two-Bounce Light


[supplementary material] 


Improving Object Detection with Selective Self-Supervised Self-Training


[supplementary material] 


Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction


[supplementary material] 


Info3D: Representation Learning on 3D Objects using Mutual Information Maximization and Contrastive Learning


[supplementary material] 


Adversarial Data Augmentation via Deformation Statistics




Neural Predictor for Neural Architecture Search


[supplementary material] 


Learning Permutation Invariant Representations using Memory Networks




Feature Space Augmentation for Long-Tailed Data


[supplementary material] 


Laying the Foundations of Deep Long-Term Crowd Flow Prediction




Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning




Fairness by Learning Orthogonal Disentangled Representations


[supplementary material] 


Self-supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation


[supplementary material] 


On Diverse Asynchronous Activity Anticipation


[supplementary material] 


Representative-Discriminative Learning for Open-set Land Cover Classification of Satellite Imagery


[supplementary material] 


Structure-Aware Human-Action Generation


[supplementary material] 


Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition


[supplementary material] 


S³Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data


[supplementary material] 


Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning


[supplementary material] 


Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks


[supplementary material] 


UNITER: UNiversal Image-TExt Representation Learning


[supplementary material] 


Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks


[supplementary material] 


Improving Face Recognition from Hard Samples via Distribution Distillation Loss


[supplementary material] 


Extract and Merge: Superpixel Segmentation with Regional Attributes




Spatial-Adaptive Network for Single Image Denoising


[supplementary material] 


Physics-based Feature Dehazing Networks




Learning Surrogates via Deep Embedding




An Asymmetric Modeling for Action Assessment


[supplementary material] 


High-quality Single-model Deep Video Compression with Frame-Conv3D and Multi-frame Differential Modulation


[supplementary material] 


Instance-Aware Embedding for Point Cloud Instance Segmentation




Self-Paced Deep Regression Forests with Consideration on Underrepresented Examples




Manifold Projection for Adversarial Defense on Face Recognition


[supplementary material] 


Weakly Supervised Learning with Side Information for Noisy Labeled Images




Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision


[supplementary material] 


SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection


[supplementary material] 


Modeling the Space of Point Landmark Constrained Diffeomorphisms




PieNet: Personalized Image Enhancement Network


[supplementary material] 


Rotational Outlier Identification in Pose Graphs Using Dual Decomposition




Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture


[supplementary material] 


Solving Phase Retrieval with a Learned Reference




Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps





