ECCV2020_採択論文一覧
2020年8月24日から27日まで開催されていました2020 European Conference on Computer Vision (ECCV 2020)は、画像解析分野におけるヨーロッパのトップカンファレンスです。
ECCV 2020に採択された論文と参考資料に一覧です。
Quaternion Equivariant Capsule Networks for 3D Point Clouds
DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares
NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search
Describing Textures using Natural Language
AiR: Attention with Reasoning Capability
Self6D: Self-Supervised Monocular 6D Object Pose Estimation
Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation
House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation
Crowdsampling the Plenoptic Function
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment
End-to-End Object Detection with Transformers
DeepSFM: Structure From Motion Via Deep Bundle Adjustment
Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation
Conditional Convolutions for Instance Segmentation
MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Privacy Preserving Structure-from-Motion
Rewriting a Deep Generative Model
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
Long-term Human Motion Prediction with Scene Context
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images
Learning and Aggregating Deep Local Descriptors for Instance-level Recognition
A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem
Learn to Recover Visible Color for Video Surveillance in a Day
Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation
BorderDet: Border Feature for Dense Object Detection
Regularization with Latent Space Virtual Adversarial Training
Du²Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels
Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning
Targeted Attack for Deep Hashing based Retrieval
Gradient Centralization: A New Optimization Technique for Deep Neural Networks
Content-Aware Unsupervised Deep Homography Estimation
Multi-View Optimization of Local Feature Geometry
The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization
Learning Stereo from Single Images
Prototype Rectification for Few-Shot Learning
Learning Feature Descriptors using Camera Pose Supervision
Semantic Flow for Fast and Accurate Scene Parsing
Appearance Consensus Driven Self-Supervised Human Mesh Recovery
Aligning and Projecting Images to Class-conditional Generative Networks
Suppress and Balance: A Simple Gated Network for Salient Object Detection
Visual Memorability for Robotic Interestingness via Unsupervised Online Learning
Post-Training Piecewise Linear Quantization for Deep Neural Networks
Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification
In-Home Daily-Life Captioning Using Radio Signals
Self-Challenging Improves Cross-Domain Generalization
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
Multitask Learning Strengthens Adversarial Robustness
S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search
Improving Deep Video Compression by Resolution-adaptive Flow Coding
Motion Capture from Internet Videos
Appearance-Preserving 3D Convolution for Video-based Person Re-identification
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures
Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling
Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction
Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation
CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image
Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Domain-invariant Stereo Matching Networks
Content Adaptive and Error Propagation Aware Deep Video Compression
Towards Automated Testing and Robustification by Semantic Adversarial Data Generation
Adversarial Generative Grammars for Human Activity Prediction
GDumb: A Simple Approach that Questions Our Progress in Continual Learning
Learning Lane Graph Representations for Motion Forecasting
What Matters in Unsupervised Optical Flow
Synthesis and Completion of Facades from Satellite Imagery
Mapillary Planet-Scale Depth Dataset
V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction
Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters
EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning
Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation
Cross-Domain Cascaded Deep Translation
“Look Ma, no landmarks!” – Unsupervised, Model-based Dense Face Alignment
Online Invariance Selection for Local Feature Descriptors
Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations
TextCaps: a Dataset for Image Captioning with Reading Comprehension
It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction
Learning What to Learn for Video Object Segmentation
SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing
LIMP: Learning Latent Shape Representations with Metric Preservation Priors
Unsupervised Sketch to Photo Synthesis
A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions
SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification
Hierarchical Face Aging through Disentangled Latent Characteristics
Hybrid Models for Open Set Recognition
TopoGAN: A Topology-Aware Generative Adversarial Network
Learning to Localize Actions from Moments
ForkGAN: Seeing into the Rainy Night
TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning
ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval
TSIT: A Simple and Versatile Framework for Image-to-Image Translation
ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices
HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
A Unified Framework of Surrogate Loss by Refactoring and Interpolation
Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images
Memory-augmented Dense Predictive Coding for Video Representation Learning
PointMixup: Augmentation for Point Clouds
Identity-Guided Human Semantic Parsing for Person Re-Identification
Learning Gradient Fields for Shape Generation
COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder
Corner Proposal Network for Anchor-free, Two-stage Object Detection
PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Learning Delicate Local Representations for Multi-Person Pose Estimation
Learning to Plan with Uncertain Topological Maps
Neural Design Network: Graphic Layout Generation with Constraints
Learning Open Set Network with Discriminative Reciprocal Points
Convolutional Occupancy Networks
Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry
TIDE: A General Toolbox for Identifying Object Detection Errors
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation
Circumventing Outliers of AutoAugment with Knowledge Distillation
S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching
RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving
Video Object Segmentation with Episodic Graph Memory Networks
Rethinking Bottleneck Structure for Efficient Mobile Network Design
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets
Contrastive Learning for Weakly Supervised Phrase Grounding
Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors
TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images
Semi-Siamese Training for Shallow Face Learning
GAN Slimming: All-in-One GAN Compression by A Unified Optimization Framework
Human Interaction Learning on 3D Skeleton Point Clouds for Video Violence Recognition
Binarized Neural Network for Single Image Super Resolution
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Adaptive Computationally Efficient Network for Monocular 3D Hand Pose Estimation
Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets
Hamiltonian Dynamics for Real-World Shape Interpolation
Learning to Scale Multilingual Representations for Vision-Language Tasks
Multi-modal Transformer for Video Retrieval
Feature Representation Matters: End-to-End Learning for Reference-based Image Super-resolution
RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera
Surface Normal Estimation of Tilted Images via Spatial Rectifier
Multimodal Shape Completion via Conditional Generative Adversarial Networks
Generative Sparse Detection Networks for 3D Single-shot Object Detection
Grounded Situation Recognition
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Unpaired Learning of Deep Image Denoising
Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video
Side-Aware Boundary Localization for More Precise Object Detection
SF-Net: Single-Frame Supervision for Temporal Action Localization
Negative Margin Matters: Understanding Margin in Few-shot Classification
Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References
CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis
Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning
MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning
Learning to Factorize and Relight a City
Region Graph Embedding Network for Zero-Shot Learning
GRAB: A Dataset of Whole-Body Human Grasping of Objects
DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects
RANSAC-Flow: Generic Two-stage Image Alignment
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking
Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference & Application
MovieNet: A Holistic Dataset for Movie Understanding
Short-Term and Long-Term Context Aggregation Network for Video Inpainting
DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization
Face Super-Resolution Guided by 3D Facial Priors
Are Labels Necessary for Neural Architecture Search?
BLSM: A Bone-Level Skinned Model of the Human Mesh
Associative Alignment for Few-shot Image Classification
Cyclic Functional Mapping: Self-supervised Correspondence between Non-isometric Deformable Shapes
View-Invariant Probabilistic Embedding for Human Pose
Contact and Human Dynamics from Monocular Video
PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation
Points2Surf Learning Implicit Surfaces from Point Clouds
Few-Shot Scene-Adaptive Anomaly Detection
Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting
Entropy Minimisation Framework for Event-based Vision Model Estimation
PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments
TENet: Triple Excitation Network for Video Salient Object Detection
Deep Feedback Inverse Problem Solver
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
Hallucinating Visual Instances in Total Absentia
Weakly-supervised 3D Shape Completion in the Wild
DTVNet: Dynamic Time-lapse Video Generation via Single Still Image
CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss
Collaborative Video Object Segmentation by Foreground-Background Integration
Adaptive Margin Diversity Regularizer for handling Data Imbalance in Zero-Shot SBIR
ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation
Calibration-free Structure-from-Motion with Calibrated Radial Trifocal Tensors
Occupancy Anticipation for Efficient Exploration and Navigation
Unified Image and Video Saliency Modeling
TAO: A Large-Scale Benchmark for Tracking Any Object
A Generalization of Otsu’s Method and Minimum Error Thresholding
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
Big Transfer (BiT): General Visual Representation Learning
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
Few-shot Action Recognition with Permutation-invariant Attention
Character Grounding and Re-Identification in Story of Videos and Text Descriptions
AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling
Learning Visual Context by Comparison
Large Scale Holistic Video Understanding
Indirect Local Attacks for Context-aware Semantic Segmentation Networks
Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings
Connecting Vision and Language with Localized Narratives
Adversarial T-shirt! Evading Person Detectors in A Physical World
Bounding-box Channels for Visual Relationship Detection
Minimal Rolling Shutter Absolute Pose with Unknown Focal Length and Radial Distortion
SRFlow: Learning the Super-Resolution Space with Normalizing Flow
DeepGMR: Learning Latent Gaussian Mixture Models for Registration
Active Perception using Light Curtains for Autonomous Driving
Invertible Neural BRDF for Object Inverse Rendering
Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network
Practical Deep Raw Image Denoising on Mobile Devices
SoundSpaces: Audio-Visual Navigation in 3D Environments
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization
Erasing Appearance Preservation in Optimization-based Smoothing
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler
Guided Deep Decoder: Unsupervised Image Pair Fusion
Filter Style Transfer between Photos
Dynamic Group Convolution for Accelerating Convolutional Neural Networks
RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering
Object-Contextual Representations for Semantic Segmentation
Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring
Joint Semantic Instance Segmentation on Graphs with the Semantic Mutex Watershed
Photon-Efficient 3D Imaging with A Non-Local Neural Network
GeLaTO: Generative Latent Textured Objects
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Directional Temporal Modeling for Action Recognition
Shonan Rotation Averaging: Global Optimality by Surfing SO(p)(n)
Semantic Curiosity for Active Visual Learning
ProgressFace: Scale-Aware Progressive Learning for Face Detection
CoTeRe-Net: Discovering Collaborative Ternary Relations in Videos
Modeling the Effects of Windshield Refraction for Camera Calibration
PROFIT: A Novel Training Method for sub-4-bit MobileNet Models
Visual Relation Grounding in Videos
Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows
Controlling Style and Semantics in Weakly-Supervised Image Generation
Jointly learning visual motion and confidence from local patches in event cameras
SODA: Story Oriented Dense Video Captioning Evaluation Framework
Sketch-Guided Object Localization in Natural Images
A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement
STAR: Sparse Trained Articulated Human Body Regressor
Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer
Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians
Learning 3D Part Assembly from a Single Image
PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions
Highly Efficient Salient Object Detection with 100K Parameters
HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing
Lifespan Age Transformation Synthesis
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation
Simulating Content Consistent Vehicle Datasets with Attribute Descent
Multiview Detection with Feature Perspective Transformation
Learning Object Relation Graph and Tentative Policy for Visual Navigation
Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition
Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning
Inducing Optimal Attribute Representations for Conditional GANs
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation
Consistency Guided Scene Flow Estimation
Autoregressive Unsupervised Image Segmentation
Controllable Image Synthesis via SegVAE
Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search
Efficient Non-Line-of-Sight Imaging from Transient Sinograms
Texture Hallucination for Large-Factor Painting Super-Resolution
Learning Progressive Joint Propagation for Human Motion Prediction
Image Stitching and Rectification for Hand-Held Cameras
ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds
The Group Loss for Deep Metric Learning
Learning Object Depth from Camera Motion and Video Object Segmentation
OnlineAugment: Online Data Augmentation with Less Domain Knowledge
Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction
Intra-class Feature Variation Distillation for Semantic Segmentation
Temporal Distinct Representation Learning for Action Recognition
Representative Graph Neural Network
Deformation-Aware 3D Model Embedding and Retrieval
Atlas: End-to-End 3D Scene Reconstruction from Posed Images
Multiple Class Novelty Detection Under Data Distribution Shift
Colorization of Depth Map via Disentanglement
Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes
GeoGraph: Graph-based multi-view object detection with geometric cues end-to-end
Localizing the Common Action Among a Few Videos
TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification
Traffic Accident Benchmark for Causality Recognition
Face Anti-Spoofing with Human Material Perception
How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification
NASA Neural Articulated Shape Approximation
Towards Unique and Informative Captioning of Images
When Does Self-supervision Improve Few-shot Learning?
Two-branch Recurrent Network for Isolating Deepfakes in Videos
Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
Global Distance-distributions Separation for Unsupervised Person Re-identification
Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose
ALRe: Outlier Detection for Guided Refinement
Weakly-Supervised Crowd Counting Learns from Sorting rather than Locations
Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition
Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection
Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance
Improved Adversarial Training via Learned Optimizer
Component Divide-and-Conquer for Real-World Image Super-Resolution
Enabling Deep Residual Networks for Weakly Supervised Object Detection
Deep near-light photometric stereo for spatially varying reflectances
Learning Visual Representations with Caption Annotations
Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier
Regression of Instance Boundary by Aggregated CNN and GCN
Social Adaptive Module for Weakly-supervised Group Activity Recognition
RGB-D Salient Object Detection with Cross-Modality Modulation and Selection
RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection
Faster Person Re-Identification
Quantization Guided JPEG Artifact Correction
3PointTM: Faster Measurement of High-Dimensional Transmission Matrices
Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer
Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction
World-Consistent Video-to-Video Synthesis
GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild
Event-based Asynchronous Sparse Convolutional Networks
AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Image Classification in the Dark using Quanta Image Sensors
n-Reference Transfer Learning for Saliency Prediction
Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection
Bottom-Up Temporal Action Localization with Mutual Regularization
On Modulating the Gradient for Meta-Learning
Domain-Specific Mappings for Generative Adversarial Style Transfer
DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning
DHP: Differentiable Meta Pruning via HyperNetworks
Deep Transferring Quantization
Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification
Arbitrary-Oriented Object Detection with Circular Smooth Label
Learning Event-Driven Video Deblurring and Interpolation
Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation
CSCL: Critical Semantic-Consistent Learning for Unsupervised Domain Adaptation
Prototype Mixture Models for Few-shot Semantic Segmentation
Webly Supervised Image Classification with Self-Contained Confidence
Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization
Monocular 3D Object Detection via Feature Domain Adaptation
VPN: Learning Video-Pose Embedding for Activities of Daily Living
Soft Anchor-Point Object Detection
Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid
Soft Expert Reward Learning for Vision-and-Language Navigation
Part-aware Prototype Network for Few-shot Semantic Segmentation
Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization
Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos
Whole-Body Human Pose Estimation in the Wild
Relative Pose Estimation of Calibrated Cameras with Known SE(3) Invariants
Sequential Convolution and Runge-Kutta Residual Architecture for Image Compressed Sensing
Deep Hough Transform for Semantic Line Detection
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning
Learning to Balance Specificity and Invariance for In and Out of Domain Generalization
Contrastive Learning for Unpaired Image-to-Image Translation
DLow: Diversifying Latent Flows for Diverse Human Motion Prediction
GRNet: Gridding Residual Network for Dense Point Cloud Completion
Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition
Blind Face Restoration via Deep Multi-scale Component Dictionaries
Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods
Inequality-Constrained and Robust 3D Face Model Fitting
Gabor Layers Enhance Network Robustness
Conditional Image Repainting via Semantic Bridge and Piecewise Value Function
Learnable Cost Volume Using the Cayley Representation
HALO: Hardware-Aware Learning to Optimize
Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition
Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision
Domain Adaptive Semantic Segmentation Using Weak Labels
Knowledge Distillation Meets Self-Supervision
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
Reconstructing the Noise Variance Manifold for Image Denoising
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints
VisualEchoes: Spatial Image Representation Learning through Echolocation
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
Spatially Aware Multimodal Transformers for TextVQA
Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector
URIE: Universal Image Enhancement for Visual Recognition in the Wild
Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation
SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning
Unpaired Image-to-Image Translation using Adversarial Consistency Loss
Discriminability Distillation in Group Representation Learning
Monocular Expressive Body Regression through Body-Driven Attention
Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation
Linguistic Structure Guided Context Modeling for Referring Image Segmentation
Federated Visual Classification with Real-World Data Distribution
Robust Re-Identification by Multiple Views Knowledge Distillation
Defocus Deblurring Using Dual-Pixel Data
RhyRNN: Rhythmic RNN for Recognizing Events in Long and Complex Videos
Weighing Counts: Sequential Crowd Counting by Reinforcement Learning
Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks
Learning to Learn with Variational Information Bottleneck for Domain Generalization
Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis
Layered Neighborhood Expansion for Incremental Multiple Graph Matching
SCAN: Learning to Classify Images without Labels
Graph convolutional networks for learning with few clean and many noisy labels
Object-and-Action Aware Model for Visual Language Navigation
A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation
MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution
Efficient Semantic Video Segmentation with Per-frame Inference
Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers
Deep Spiking Neural Network: Energy Efficiency Through Time based Coding
InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling
Utilizing Patch-level Category Activation Patterns for Multiple Class Novelty Detection
Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes
Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions
TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video
Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost
Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation
Modeling 3D Shapes by Reinforcement Learning
LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
CN: Channel Normalization For Point Cloud Recognition
Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model
AutoMix: Mixup Networks for Sample Interpolation via Cooperative Barycenter Learning
Scene Text Image Super-resolution in the wild
Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling
Learning Disentangled Representations with Latent Variation Predictability
Deep Space-Time Video Upsampling Networks
Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery
Fast Video Object Segmentation using the Global Context Module
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification
MessyTable: Instance Association in Multiple Camera Views
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization
CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
Towards Real-Time Multi-Object Tracking
A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
Hierarchical Style-based Networks for Motion Synthesis
Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop
Learning to Count in the Crowd from Limited Labeled Data
SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection
From Shadow Segmentation to Shadow Removal
Diverse and Admissible Trajectory Prediction through Multimodal Context Understanding
CONFIG: Controllable Neural Face Image Generation
Single View Metrology in the Wild
Procedure Planning in Instructional Videos
Funnel Activation for Visual Recognition
GIQA: Generated Image Quality Assessment
Adversarial Continual Learning
Adapting Object Detectors with Conditional Domain Normalization
HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction
Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction
Self-supervised Bayesian Deep Learning for Image Recovery with Applications to Compressive Sensing
Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement
Semi-supervised Learning with a Teacher-student Network for Generalized Attribute Prediction
Unsupervised Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification
DPDist: Comparing Point Clouds Using Deep Point Cloud Distance
DataMix: Efficient Privacy-Preserving Edge-Cloud Inference
Neural Re-Rendering of Humans from a Single Image
Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation
PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration
Why do These Match? Explaining the Behavior of Image Similarity Models
CooGAN: A Memory-Efficient Framework for High-Resolution Facial Attribute Editing
Progressive Transformers for End-to-End Sign Language Production
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Making Affine Correspondences Work in Camera Geometry Computation
Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces
Foley Music: Learning to Generate Music from Videos
Generative Low-bitwidth Data Free Quantization
Local Correlation Consistency for Knowledge Distillation
Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations
Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues
Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation
SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation
Rethinking the Distribution Gap of Person Re-identification with Camera-based Batch Normalization
AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation
Online Multi-modal Person Search in Videos
Single Image Super-Resolution via a Holistic Attention Network
Can You Read Me Now? Content Aware Rectification using Angle Supervision
Momentum Batch Normalization for Deep Learning with Small Batch Size
AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds
Edge-aware Graph Representation Learning and Reasoning for Face Parsing
BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network
G-LBM:Generative Low-dimensional Background Model Estimation from Video Sequences
H3DNet: 3D Object Detection Using Hybrid Geometric Primitives
Expressive Telepresence via Modular Codec Avatars
Cascade Graph Neural Networks for RGB-D Salient Object Detection
FairALM: Augmented Lagrangian Method for Training Fair Models with Little Regret
Generating Videos of Zero-Shot Compositions of Actions and Objects
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
Renovating Parsing R-CNN for Accurate Multiple Human Parsing
Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning
Gradient-Induced Co-Saliency Detection
Nighttime Defogging Using High-Low Frequency Decomposition and Grayscale-Color Networks
SegFix: Model-Agnostic Boundary Refinement for Segmentation
Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars
Neural Geometric Parser for Single Image Camera Calibration
Learning Architectures for Binary Networks
An Analysis of Sketched IRLS for Accelerated Sparse Residual Regression
Relative Pose from Deep Learned Depth and a Single Affine Correspondence
Video Super-Resolution with Recurrent Structure-Detail Network
Shape Adaptor: A Learnable Resizing Module
Shuffle and Attend: Video Domain Adaptation
DRG: Dual Relation Graph for Human-Object Interaction Detection
Flow-edge Guided Video Completion
Towards End-to-end Video-based Eye-Tracking
Generating Handwriting via Decoupled Style Descriptors
LEED: Label-Free Expression Editing via Disentanglement
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation
Class-Incremental Domain Adaptation
Anti-Bandit Neural Architecture Search for Model Defense
Wavelet-Based Dual-Branch Network for Image Demoiréing
Low Light Video Enhancement using Synthetic Data Produced with an Intermediate Domain Mapping
Non-Local Spatial Propagation Network for Depth Completion
DanbooRegion: An Illustration Region Dataset
Event Enhanced High-Quality Image Recovery
PackDet: Packed Long-Head Object Detector
A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS
Learning Semantic Neural Tree for Human Parsing
Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
Burst Denoising via Temporally Shifted Wavelet Transforms
SimAug: Learning Robust Representations from Simulation for Trajectory Prediction
ScribbleBox: Interactive Annotation Framework for Video Object Segmentation
Rethinking Pseudo-LiDAR Representation
Deep Multi Depth Panoramas for View Synthesis
MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection
ContactPose: A Dataset of Grasps with Object Contact and Hand Pose
API-Net: Robust Generative Classifier via a Single Discriminator
Bias-based Universal Adversarial Patch Attack for Automatic Check-out
Imbalanced Continual Learning with Partitioning Reservoir Sampling
Guided Collaborative Training for Pixel-wise Semi-Supervised Learning
Stacking Networks Dynamically for Image Restoration Based on the Plug-and-Play Framework
Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight
Spatial Attention Pyramid Network for Unsupervised Domain Adaptation
GSIR: Generalizable 3D Shape Interpretation and Reconstruction
Weakly Supervised 3D Object Detection from Lidar Point Cloud
Two-phase Pseudo Label Densification for Self-training based Domain Adaptation
Adaptive Offline Quintuplet Loss for Image-Text Matching
Learning Object Placement by Inpainting for Compositional Data Augmentation
Deep Vectorization of Technical Drawings
CAD-Deform: Deformable Fitting of CAD Models to 3D Scans
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points
Multi-Agent Embodied Question Answering in Interactive Environments
Conditional Sequential Modulation for Efficient Global Image Retouching
Segmenting Transparent Objects in the Wild
Length-Controllable Image Captioning
Few-Shot Semantic Segmentation with Democratic Attention Networks
Defocus Blur Detection via Depth Distillation
Motion Guided 3D Pose Estimation from Videos
Reflection Separation via Multi-bounce Polarization State Tracing
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
SemanticAdv: Generating Adversarial Examples via Attribute-conditioned Image Editing
Learning with Noisy Class Labels for Instance Segmentation
Deep Image Clustering with Category-Style Representation
Self-supervised Motion Representation via Scattering Local Motion Cues
Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets
BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation
Hard negative examples are hard, but useful
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions
Video Object Detection via Object-level Temporal Aggregation
Object Detection with a Unified Label Space from Multiple Datasets
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
Comprehensive Image Captioning via Scene Graph Decomposition
Symbiotic Adversarial Learning for Attribute-based Person Search
Amplifying Key Cues for Human-Object-Interaction Detection
Rethinking Few-shot Image Classification: A Good Embedding is All You Need?
Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization
Action Localization through Continual Predictive Learning
Generative View-Correlation Adaptation for Semi-Supervised Multi-View Learning
READ: Reciprocal Attention Discriminator for Image-to-Video Re-Identification
3D Human Shape Reconstruction from a Polarization Image
The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification
Improving One-stage Visual Grounding by Recursive Sub-query Construction
Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision
Content-Consistent Matching for Domain Adaptive Semantic Segmentation
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
History Repeats Itself: Human Motion Prediction via Motion Attention
Unsupervised Video Object Segmentation with Joint Hotspot Tracking
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach
CAFE-GAN: Arbitrary Face Attribute Editing with Complementary Attention Feature
MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection
Latent Topic-aware Multi-Label Classification
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
Curriculum Manager for Source Selection in Multi-Source Domain Adaptation
Powering One-shot Topological NAS with Stabilized Share-parameter Proxy
Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation
Boundary-preserving Mask R-CNN
Self-supervised Single-view 3D Reconstruction via Semantic Consistency
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation
What is Learned in Deep Uncalibrated Photometric Stereo?
Prior-based Domain Adaptive Object Detection for Hazy and Rainy Conditions
Adversarial Ranking Attack and Defense
ReDro: Efficiently Learning Large-sized SPD Visual Representation
Graph-Based Social Relation Reasoning
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency
Asynchronous Interaction Aggregation for Action Detection
Shape and Viewpoint without Keypoints
Learning Attentive and Hierarchical Representations for 3D Shape Recognition
Associative3D: Volumetric Reconstruction from Sparse Views
PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit
Memory Selection Network for Video Propagation
Disentangled Non-local Neural Networks
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark
Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup
Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training
Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip
Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
Clustering Driven Deep Autoencoder for Video Anomaly Detection
Learning to Compose Hypercolumns for Visual Correspondence
Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction
Object-based Illumination Estimation with Rendering-aware Neural Networks
Progressive Point Cloud Deconvolution Generation Network
SSCGAN: Facial Attribute Editing via Style Skip Connections
Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology
Learn to Propagate Reliably on Noisy Affinity Graphs
Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search
TANet: Towards Fully Automatic Tooth Arrangement
UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection
GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision
Resolution Switchable Networks for Runtime Efficient Image Recognition
SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation
Learning to Detect Open Classes for Universal Domain Adaptation
Visual Compositional Learning for Human-Object Interaction Detection
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches
Rethinking Class Activation Mapping for Weakly Supervised Object Localization
OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Interpretable Neural Network Decoupling
Omni-sourced Webly-supervised Learning for Video Recognition
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation
Estimating People Flows to Better Count Them in Crowded Scenes
Generate to Adapt: Resolution Adaption Network for Surveillance Face Recognition
Learning Feature Embeddings for Discriminant Model based Tracking
WeightNet: Revisiting the Design Space of Weight Networks
Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift
Learning Where to Focus for Efficient Video Object Detection
Learning Object Permanence from Video
Adaptive Text Recognition through Visual Matching
Learning to Exploit Multiple Vision Modalities by Using Grafted Networks
Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild
3D Fluid Flow Reconstruction Using Compact Light Field PIV
Contextual Diversity for Active Learning
Temporal Aggregate Representations for Long-Range Video Understanding
General 3D Room Layout from a Single View by Render-and-Compare
Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints
Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability
Yet Another Intermediate-Level Attack
Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction
Early Exit Or Not: Resource-Efficient Blind Quality Enhancement for Compressed Images
PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations
How does Lipschitz Regularization Influence GAN Training?
Infrastructure-based Multi-Camera Calibration using Radial Projections
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Polarized Optical-Flow Gyroscope
Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation
An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning
On the Effectiveness of Image Rotation for Open Set Domain Adaptation
Combining Task Predictors via Enhancing Joint Predictability
Multi-Scale Positive Sample Refinement for Few-Shot Object Detection
Single-Image Depth Prediction Makes Feature Matching Easier
Deep Reinforced Attention Learning for Quality-Aware Visual Recognition
CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization
Learning Joint Spatial-Temporal Transformations for Video Inpainting
Single Path One-Shot Neural Architecture Search with Uniform Sampling
Learning to Generate Novel Domains for Domain Generalization
Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections
Impact of base dataset design on few-shot image classification
Invertible Zero-Shot Recognition Flows
GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes
Location Sensitive Image Retrieval and Tagging
Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image
Guessing State Tracking for Visual Dialogue
Memory-Efficient Incremental Learning Through Feature Adaptation
Neural Voice Puppetry: Audio-driven Facial Reenactment
One-Shot Unsupervised Cross-Domain Detection
Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks
Probabilistic Future Prediction for Video Scene Understanding
Suppressing Mislabeled Data via Grouping and Self-Attention
Class-wise Dynamic Graph Convolution for Semantic Segmentation
Character-Preserving Coherent Story Visualization
GINet: Graph Interaction Network for Scene Parsing
Tensor Low-Rank Reconstruction for Semantic Segmentation
Count- and Similarity-aware R-CNN for Pedestrian Detection
TRADI: Tracking Deep Neural network Weight Distributions
Spatiotemporal Attacks for Embodied Agents
Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild
Design and Interpretation of Universal Adversarial Patches in Face Detection
Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild
Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints
Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection
Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning
A Closest Point Proposal for MCMC-based Probabilistic Surface Registration
Interactive Video Object Segmentation Using Global and Local Transfer Modules
End-to-end Interpretable Learning of Non-blind Image Deblurring
Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation
Rethinking Image Deraining via Rain Streaks and Vapors
Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes
Is Sharing of Egocentric Video Giving Away Your Biometric Signature?
Captioning Images Taken by People Who Are Blind
Improving Semantic Segmentation via Decoupled Body and Edge Supervision
Conditional Entropy Coding for Efficient Video Compression
Differentiable Feature Aggregation Search for Knowledge Distillation
Attention Guided Anomaly Localization in Images
Self-supervised Video Representation Learning by Pace Prediction
Full-Body Awareness from Partial Observations
Reinforced Axial Refinement Network for Monocular 3D Object Detection
Self-Supervised Multi-Task Procedure Learning from Instructional Videos
CosyPose: Consistent multi-view multi-object 6D pose estimation
In-Domain GAN Inversion for Real Image Editing
Key Frame Proposal Network for Efficient Pose Estimation in Videos
Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning
Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs
Cross-Modal Weighting Network for RGB-D Salient Object Detection
Deep Image Compression using Decoder Side Information
Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation
A Generic Visualization Approach for Convolutional Neural Networks
Interactive Annotation of 3D Object Geometry using 2D Scribbles
Hierarchical Kinematic Human Mesh Recovery
Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation
3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View
We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos
Joint Optimization for Multi-Person Shape Models from Markerless 3D-Scans
Accurate RGB-D Salient Object Detection via Collaborative Learning
Finding Your (3D) Center: 3D Object Detection Using a Learned Loss
Two Stream Active Query Suggestion for Active Learning in Connectomics
Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images
6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference
Modeling Artistic Workflows for Image Generation and Editing
Hidden Footprints: Learning Contextual Walkability from 3D Human Trails
Self-Supervised Learning of Audio-Visual Objects from Video
GAN-based Garment Generation Using Sewing Pattern Images
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach
An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds
Monotonicity Prior for Cloud Tomography
Learning Trailer Moments in Full-Length Movies with Co-Contrastive Attention
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Learning to Generate Grounded Visual Captions without Localization Supervision
JNR: Joint-based Neural Rig Representation for Compact 3D Face Modeling
On Disentangling Spoof Trace for Generic Face Anti-Spoofing
Streaming Object Detection for 3-D Point Clouds
NAS-DIP: Learning Deep Image Prior with Neural Architecture Search
Learning to Learn in a Semi-Supervised Fashion
FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning
RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes
Towards causal benchmarking of bias in face analysis algorithms
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
LIRA: Lifelong Image Restoration from Unknown Blended Distortions
HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization
SOLO: Segmenting Objects by Locations
Learning to See in the Dark with Events
Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data
Polynomial Regression Network for Variable-Number Lane Detection
Structural Deep Metric Learning for Room Layout Estimation
Adaptive Task Sampling for Meta-Learning
Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems
High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling
Online Ensemble Model Compression using Knowledge Distillation
Deep Learning-based Pupil Center Detection for Fast and Accurate Eye Tracking System
Efficient Residue Number System Based Winograd Convolution
Robust Tracking against Adversarial Attacks
Single-Shot Neural Relighting and SVBRDF Estimation
Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement
Angle-based Search Space Shrinking for Neural Architecture Search
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Towards Fast, Accurate and Stable 3D Dense Face Alignment
Iterative Feature Transformation for Fast and Versatile Universal Style Transfer
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search
Toward Faster and Simpler Matrix Normalization via Rank-1 Update
Accurate Polarimetric BRDF for Real Polarization Scene Rendering
Topology-Preserving Class-Incremental Learning
Inter-Image Communication for Weakly Supervised Localization
UFO²: A Unified Framework towards Omni-supervised Object Detection
iCaps: An Interpretable Classifier via Disentangled Capsule Networks
Detecting Natural Disasters, Damage, and Incidents in the Wild
Acquiring Dynamic Light Fields through Coded Aperture Camera
Gait Recognition from a Single Image using a Phase-Aware Gait Cycle Reconstruction Network
Informative Sample Mining Network for Multi-Domain Image-to-Image Translation
Spherical Feature Transform for Deep Metric Learning
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes
Representation Sharing for Fast Object Detector Search and Beyond
Peeking into occluded joints: A novel framework for crowd pose estimation
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
Deep Hashing with Active Pairwise Supervision
Graph Edit Distance Reward: Learning to Edit Scene Graph
Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing
Feature-metric Loss for Self-supervised Learning of Depth and Egomotion
Propagating Over Phrase Relations for One-Stage Visual Grounding
Adversarial Semantic Data Augmentation for Human Pose Estimation
Face Anti-Spoofing via Disentangled Representation Learning
Prime-Aware Adaptive Distillation
Meta-Learning with Network Pruning
Spiral Generative Network for Image Extrapolation
SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches
Few-shot Compositional Font Generation with Dual Memory
PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling
Handcrafted Outlier Detection Revisited
The Average Mixing Kernel Signature
BCNet: Learning Body and Cloth Shape from A Single Image
Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos
Polysemy Deciphering Network for Human-Object Interaction Detection
PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning
Learning Graph-Convolutional Representations for Point Cloud Denoising
Semantic Line Detection Using Mirror Attention and Comparative Ranking and Matching
A Differentiable Recurrent Surface for Asynchronous Event-Based Data
Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches
LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation
Microscopy Image Restoration with Deep Wiener-Kolmogorov Filters
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds
Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior
An Inference Algorithm for Multi-Label MRF-MAP Problems with Clique Size 100
Dual Refinement Underwater Object Detection Network
Multiple Sound Sources Localization from Coarse to Fine
Task-Aware Quantization Network for JPEG Image Compression
Energy-Based Models for Deep Probabilistic Regression
Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images
CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers
Occlusion-Aware Siamese Network for Human Pose Estimation
Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model
NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image
Model-based occlusion disentanglement for image-to-image translation
Rotation-robust Intersection over Union for 3D Object Detection
New Threats against Object Detector with Non-local Block
Self-Supervised CycleGAN for Object-Preserving Image-to-Image Domain Adaptation
On the Usage of the Trifocal Tensor in Motion Segmentation
3D-Rotation-Equivariant Quaternion Neural Networks
Active Crowd Counting with Limited Supervision
Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language
Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On
NODIS: Neural Ordinary Differential Scene Understanding
Learning Propagation Rules for Attribution Map Generation
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference
Learning Predictive Models from Observation and Interaction
Unifying Deep Local and Global Features for Image Search
Human Body Model Fitting by Learned Gradient Descent
DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition
Learning latent representations across multiple data domains using Lifelong VAEGAN
DVI: Depth Guided Video Inpainting for Autonomous Driving
Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation
APRICOT: A Dataset of Physical Adversarial Attacks on Object Detection
Visual Question Answering on Image Sets
Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots
Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations
DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points
Dynamic Low-light Imaging with Quanta Image Sensors
Disambiguating Monocular Depth Estimation with a Single Transient
DSDNet: Deep Structured self-Driving Network
QuEST: Quantized Embedding Space for Transferring Knowledge
EGDCL: An Adaptive Curriculum Learning Framework for Unbiased Glaucoma Diagnosis
Backpropagated Gradient Representations for Anomaly Detection
Dense RepPoints: Representing Visual Objects with Dense Point Sets
On Dropping Clusters to Regularize Graph Convolutional Neural Networks
Adaptive Video Highlight Detection by Learning from User History
Improving 3D Object Detection through Progressive Population Based Augmentation
DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction
SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization
Adversarial Learning for Zero-shot Domain Adaptation
YOLO in the Dark - Domain Adaptation Method for Merging Multiple Models -
Identity-Aware Multi-Sentence Video Description
VQA-LOL: Visual Question Answering under the Lens of Logic
Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering
Mining Inter-Video Proposal Relations for Video Object Detection
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Minimum Class Confusion for Versatile Domain Adaptation
Large Batch Optimization for Object Detection: Training COCO in 12 Minutes
Towards Practical and Efficient High-Resolution HDR Deghosting with CNN
Monocular Differentiable Rendering for Self-Supervised 3D Object Detection
Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation
Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction
Image-based table recognition: data, model, and evaluation
Group Activity Prediction with Sequential Relational Anticipation Model
PiP: Planning-informed Trajectory Prediction for Autonomous Driving
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer
Hierarchical Context Embedding for Region-based Object Detection
Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition
Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection
Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation
Detecting Human-Object Interactions with Action Co-occurrence Priors
Learning Connectivity of Neural Networks from a Topological Perspective
Ocean: Object-aware Anchor-free Tracking
Object Tracking using Spatio-Temporal Networks for Future Prediction Location
Pillar-based Object Detection for Autonomous Driving
Sparse Adversarial Attack via Perturbation Factorization
3D Scene Reconstruction from a Single Viewport
Learning to Optimize Domain Specific Normalization for Domain Generalization
Self-supervised Outdoor Scene Relighting
Privacy Preserving Visual SLAM
Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning
Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval
Globally Optimal and Efficient Vanishing Point Estimation in Atlanta World
StyleGAN2 Distillation for Feed-forward Image Manipulation
Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds
Learning Disentangled Representations via Mutual Information Estimation
Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration
AutoSimulate: (Quickly) Learning Synthetic Data Generation
LatticeNet: Towards Lightweight Image Super-resolution with Lattice Block
Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation
Active Visual Information Gathering for Vision-Language Navigation
Deep Hough-Transform Line Priors
Unsupervised Shape and Pose Disentanglement for 3D Meshes
Inclusive GAN: Improving Data and Minority Coverage in Generative Models
SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects
Dive Deeper Into Box for Object Detection
PG-Net: Pixel to Global Matching Network for Visual Tracking
Why Are Deep Representations Good Perceptual Quality Features?
Geometric Estimation via Robust Subspace Recovery
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
Human Correspondence Consensus for 3D Object Semantic Understanding
Learning Memory Augmented Cascading Network for Compressed Sensing of Images
Least squares surface reconstruction on arbitrary domains
Task-conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery
DADA: Differentiable Automatic Data Augmentation
SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans
Kinship Identification through Joint Learning using Kinship Verification Ensembles
Kernelized Memory Network for Video Object Segmentation
A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection
Temporal Keypoint Matching and Refinement Network for Pose Estimation and Tracking
FHDe²Net: Full High Definition Demoireing Network
Learning Structural Similarity of User Interface Layouts using Graph Networks
NAS-Count: Counting-by-Density with Neural Architecture Search
Towards Generalization Across Depth for Monocular 3D Object Detection
Margin-Mix: Semi–Supervised Learning for Face Expression Recognition
Principal Feature Visualisation in Convolutional Neural Networks
Progressive Refinement Network for Occluded Pedestrian Detection
Monocular Real-Time Volumetric Performance Capture
The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale
SEN: A Novel Feature Normalization Dissimilarity Measure for Prototypical Few-Shot Learning Networks
Kinematic 3D Object Detection in Monocular Video
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention
Know Your Surroundings: Exploiting Scene Information for Object Tracking
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
DeepLandscape: Adversarial Modeling of Landscape Videos
GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images
Spatial-Angular Interaction for Light Field Image Super-Resolution
BATS: Binary ArchitecTure Search
A Closer Look at Local Aggregation Operators in Point Cloud Analysis
Look here! A parametric learning based approach to redirect visual attention
Variational Diffusion Autoencoders with Random Walk Sampling
Adaptive Variance Based Label Distribution Learning For Facial Age Estimation
Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency
Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations
VarSR: Variational Super-Resolution Network for Very Low Resolution Images
Towards Recognizing Unseen Categories in Unseen Domains
Square Attack: a query-efficient black-box adversarial attack via random search
You Are Here: Geolocation by Embedding Maps and Images
Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation
From Image to Stability: Learning Dynamics from Human Pose
LevelSet R-CNN: A Deep Variational Method for Instance Segmentation
Efficient Scale-Permuted Backbone with Learned Resource Distribution
Bridging Knowledge Graphs to Generate Scene Graphs
Implicit Latent Variable Model for Scene-Consistent Motion Forecasting
Learning Visual Commonsense for Robust Scene Graph Generation
MPCC: Matching Priors and Conditionals for Clustering
PointAR: Efficient Lighting Estimation for Mobile Augmented Reality
Discrete Point Flow Networks for Efficient Point Cloud Generation
Accelerating Deep Learning with Millions of Classes
Password-conditioned Anonymization and Deanonymization with Face Identity Transformers
Inertial Safety from Structured Light
PointTriNet: Learned Triangulation of 3D Point Sets
Toward Unsupervised, Multi-Object Discovery in Large-Scale Image Collections
Deep Novel View Synthesis from Colored 3D Point Clouds
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching
Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising
Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
Polarimetric Multi-View Inverse Rendering
SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information
Improving Face Recognition by Clustering Unlabeled Faces in the Wild
NeuRoRA: Neural Robust Rotation Averaging
SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes
Unsupervised Learning of Optical Flow with Deep Feature Similarity
Blended Grammar Network for Human Parsing
P²Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation
Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting
Ultra Fast Structure-aware Deep Lane Detection
Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling
Domain Adaptive Object Detection via Asymmetric Tri-way Faster-RCNN
Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition
Learning Camera-Aware Noise Models
Towards Precise Completion of Deformable Shapes
Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation
TPFN: Applying Outer Product along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data
ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis
Learning with Privileged Information for Efficient Image Super-Resolution
Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification
Autoencoder-based Graph Construction for Semi-supervised Learning
Virtual Multi-view Fusion for 3D Semantic Segmentation
Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition
A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning
Mind the Discriminability: Asymmetric Adversarial Domain Adaptation
Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking
Deep FusionNet for Point Cloud Semantic Segmentation
Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information
Dual Adversarial Network for Deep Active Learning
Fully Convolutional Networks for Continuous Sign Language Recognition
Self-adapting confidence estimation for stereo
Deep Surface Normal Estimation on the 2-Sphere with Confidence Guided Semantic Attention
AutoSTR: Efficient Backbone Search for Scene Text Recognition
Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification
Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification
Faster AutoAugment: Learning Augmentation Strategies Using Backpropagation
Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation
Boundary-Aware Cascade Networks for Temporal Action Segmentation
Inference Graphs for CNN Interpretation
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension
Improving Query Efficiency of Black-box Adversarial Attack
Self-similarity Student for Partial Label Histopathology Image Segmentation
A Decoupled Learning Scheme for Real-world Burst Denoising from Raw Images
Global-and-Local Relative Position Embedding for Unsupervised Video Summarization
Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms
SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking
CenterNet Heatmap Propagation for Real-time Video Object Detection
Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection
SOLAR: Second-Order Loss and Attention for Image Retrieval
Fixing Localization Errors to Improve Image Classification
PatchPerPix for Instance Segmentation
Attend and Segment: Attention Guided Active Semantic Segmentation
Accelerating CNN Training by Pruning Activation Gradients
Global and Local Enhancement Networks for Paired and Unpaired Image Enhancement
Probabilistic Anchor Assignment with IoU Prediction for Object Detection
Eyeglasses 3D shape reconstruction from a single face image
Temporal Complementary Learning for Video Person Re-Identification
HoughNet: Integrating near and long-range evidence for bottom-up object detection
Graph Wasserstein Correlation Analysis for Movie Retrieval
Context-Aware RCNN: A Baseline for Action Detection in Videos
Full-Time Monocular Road Detection Using Zero-Distribution Prior of Angle of Polarization
A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation
Learning Enriched Features for Real Image Restoration and Enhancement
Detail Preserved Point Cloud Completion via Separated Feature Aggregation
LabelEnc: A New Intermediate Supervision Method for Object Detection
Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets
PAMS: Quantized Super-Resolution via Parameterized Max Scale
SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds
OID: Outlier Identifying and Discarding in Blind Image Deblurring
Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
Enhanced Sparse Model for Blind Deblurring
SumGraph: Video Summarization via Recursive Graph Modeling
Feature Normalized Knowledge Distillation for Image Classification
A Metric Learning Reality Check
FTL: A universal framework for training low-bit DNNs via Feature Transfer
XingGAN for Person Image Generation
GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering
VCNet: A Robust Approach to Blind Image Inpainting
Learning to Predict Context-adaptive Convolution for Semantic Segmentation
EfficientFCN: Holistically-guided Decoding for Semantic Segmentation
GroSS: Group-Size Series Decomposition for Grouped Architecture Search
Efficient Adversarial Attacks for Visual Object Tracking
Globally-Optimal Event Camera Motion Estimation
Weakly-supervised Learning of Human Dynamics
Journey Towards Tiny Perceptual Super-Resolution
What makes fake images detectable? Understanding properties that generalize
Embedding Propagation: Smoother Manifold for Few-Shot Classification
Category Level Object Pose Estimation via Neural Analysis-by-Synthesis
High-Fidelity Synthesis with Disentangled Representation
PL₁P - Point-line Minimal Problems under Partial Visibility in Three Views
Prediction and Recovery for Adaptive Low-Resolution Person Re-Identification
Learning Canonical Representations for Scene Graph to Image Generation
Adversarial Robustness on In- and Out-Distribution Improves Explainability
Aligning Videos in Space and Time
Neural Wireframe Renderer: Learning Wireframe to Image Translations
RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax
Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction
Determining the Relevance of Features for Deep Neural Networks
Weakly Supervised Semantic Segmentation with Boundary Exploration
GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation
DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild
Multi-view adaptive graph convolutions for graph classification
Instance Adaptive Self-Training for Unsupervised Domain Adaptation
Weight Decay Scheduling and Knowledge Distillation for Active Learning
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
Geometry Constrained Weakly Supervised Object Localization
Mining self-similarity: Label super-resolution with epitomic representations
AE-OT-GAN: Training GANs from data specific latent distribution
Null-sampling for Interpretable and Fair Representations
Guiding Monocular Depth Estimation Using Depth-Attention Volume
Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping
Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer
BézierSketch: A generative model for scalable vector sketches
Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation
Domain Adaptation Through Task Distillation
PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning
More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning
Extending and Analyzing Self-Supervised Learning Across Domains
Multi-Source Open-Set Deep Adversarial Domain Adaptation
Neural Batch Sampling with Reinforcement Learning for Semi-Supervised Anomaly Detection
LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities
Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images
Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion
Proposal-based Video Completion
HGNet: Hybrid Generative Network for Zero-shot Domain Adaptation
Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding
DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks
All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling
A Broader Study of Cross-Domain Few-Shot Learning
Practical Poisoning Attacks on Neural Networks
Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification
Learn distributed GAN with Temporary Discriminators
SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems
Improving Adversarial Robustness by Enforcing Local and Global Compactness
TopoAL: An Adversarial Learning Approach for Topology-Aware Road Segmentation
Channel selection using Gumbel Softmax
Exploiting Temporal Coherence for Self-Supervised One-shot Video Re-identification
An Efficient Training Framework for Reversible Neural Architectures
FreeCam3D: Snapshot Structured Light 3D with Freely-Moving Cameras
One-Pixel Signature: Characterizing CNN Models for Backdoor Detection
Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning
Structure-Aware Generation Network for Recipe Generation from Images
A Simple and Effective Framework for Pairwise Deep Metric Learning
Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner
A Recurrent Transformer Network for Novel View Action Synthesis
Multi-view Action Recognition using Cross-view Video Prediction
Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction
Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation
Attributional Robustness Training using Input-Gradient Spatial Alignment
Reducing the Sim-to-Real Gap for Event Cameras
Spatial Geometric Reasoning for Room Layout Estimation via Deep Reinforcement Learning
Learning Data Augmentation Strategies for Object Detection
DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search
A Closer Look at Generalisation in RAVEN
Supervised Edge Attention Network for Accurate Image Instance Segmentation
Discriminative Partial Domain Adversarial Network
Differentiable Programming for Hyperspectral Unmixing using a Physics-based Dispersion Model
Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes
Sound2Sight: Generating Visual Dynamics from Sound and Context
NoiseRank: Unsupervised Label Noise Reduction with Dependence Models
Fast Adaptation to Super-Resolution Networks via Meta-Learning
TP-LSD: Tri-Points Based Line Segment Detector
SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation
An Attention-driven Two-stage Clustering Method for Unsupervised Person Re-Identification
Toward Fine-grained Facial Expression Manipulation
Adaptive Object Detection with Dual Multi-Label Prediction
Table Structure Recognition using Top-Down and Bottom-Up Cues
Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-Encoder
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
Boundary Content Graph Neural Network for Temporal Action Proposal Generation
Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
Attention-Based Query Expansion Learning
Interpretable Foreground Object Search As Knowledge Distillation
Improving Knowledge Distillation via Category Structure
High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images
Attentive Prototype Few-shot Learning with Capsule Network-based Embedding
Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances
DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving
Visual-Relation Conscious Image Generation from Structured-Text
Patch-wise Attack for Fooling Deep Neural Network
MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module
Guided Saliency Feature Learning for Person Re-identification in Crowded Scenes
Asymmetric Two-Stream Architecture for Accurate RGB-D Saliency Detection
Explaining Image Classifiers using Statistical Fault Localization
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers
Learning Video Representations by Transforming Time
Variational Connectionist Temporal Classification
End-to-end Dynamic Matching Network for Multi-view Multi-person 3d Pose Estimation
Orderly Disorder in Point Cloud Domain
Deep Decomposition Learning for Inverse Imaging Problems
FLOT: Scene Flow on Point Clouds guided by Optimal Transport
Accurate Reconstruction of Oriented 3D Points using Affine Correspondences
Volumetric Transformer Networks
360(o) Camera Alignment via Segmentation
A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval
Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks
Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation
Measuring the Importance of Temporal Features in Video Saliency
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Towards Reliable Evaluation of Algorithms for Road Network Reconstruction from Aerial Images
Online Continual Learning under Extreme Memory Constraints
Learning to Cluster under Domain Shift
Defense Against Adversarial Attacks via Controlling Gradient Leaking on Embedded Manifolds
Improving Optical Flow on a Pyramid Level
Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations
Learning to Learn Parameterized Classification Networks for Scalable Input Images
Stereo Event-based Particle Tracking Velocimetry for 3D Fluid Flow Reconstruction
Simplicial Complex based Point Correspondence between Images warped onto Manifolds
Representation Learning on Visual-Symbolic Graphs for Video Understanding
Distance-Normalized Unified Representation for Monocular 3D Object Detection
Sequential Deformation for Accurate Scene Text Detection
Where to Explore Next? ExHistCNN for History-aware Autonomous 3D Exploration
Semi-Supervised Segmentation based on Error-Correcting Supervision
Quantum-soft QUBO Suppression for Accurate Object Detection
Label-similarity Curriculum Learning
Recurrent Image Annotation With Explicit Inter-Label Dependencies
Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution
SimPose: Effectively Learning DensePose and Surface Normals of People from Simulated Data
ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images
Differentiable Joint Pruning and Quantization for Hardware Efficiency
Learning to Generate Customized Dynamic 3D Facial Expressions
Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration
Jointly De-biasing Face Recognition and Demographic Attribute Estimation
Regularized Loss for Weakly Supervised Single Class Semantic Segmentation
Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks
Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks
Learning to Learn Words from Visual Scenes
On Transferability of Histological Tissue Labels in Computational Pathology
Learning Actionness via Long-range Temporal Order Verification
Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays
Character Region Attention For Text Spotting
Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network
Dual Mixup Regularized Learning for Adversarial Domain Adaptation
Robust and On-the-fly Dataset Denoising for Image Classification
Imaging Behind Occluders Using Two-Bounce Light
Improving Object Detection with Selective Self-Supervised Self-Training
Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction
Adversarial Data Augmentation via Deformation Statistics
Neural Predictor for Neural Architecture Search
Learning Permutation Invariant Representations using Memory Networks
Feature Space Augmentation for Long-Tailed Data
Laying the Foundations of Deep Long-Term Crowd Flow Prediction
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning
Fairness by Learning Orthogonal Disentangled Representations
Self-supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation
On Diverse Asynchronous Activity Anticipation
Representative-Discriminative Learning for Open-set Land Cover Classification of Satellite Imagery
Structure-Aware Human-Action Generation
Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition
S³Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data
Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks
UNITER: UNiversal Image-TExt Representation Learning
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Improving Face Recognition from Hard Samples via Distribution Distillation Loss
Extract and Merge: Superpixel Segmentation with Regional Attributes
Spatial-Adaptive Network for Single Image Denoising
Physics-based Feature Dehazing Networks
Learning Surrogates via Deep Embedding
An Asymmetric Modeling for Action Assessment
Instance-Aware Embedding for Point Cloud Instance Segmentation
Self-Paced Deep Regression Forests with Consideration on Underrepresented Examples
Manifold Projection for Adversarial Defense on Face Recognition
Weakly Supervised Learning with Side Information for Noisy Labeled Images
Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision
Modeling the Space of Point Landmark Constrained Diffeomorphisms
PieNet: Personalized Image Enhancement Network
Rotational Outlier Identification in Pose Graphs Using Dual Decomposition
Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture
Solving Phase Retrieval with a Learned Reference
Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps
コンピュータビジョンの理論と実践に関する研究に関する情報の普及を促進することを目的とするチューリッヒにある非営利団体「European Computer Vision Association (ECVA)」によってまとめられております。
この記事に関するカテゴリー