3D From a Single Image and Shape-From-X (1)


l  Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild

l  Footprints and Free Space From a Single Color Image

l  Dynamic Fluid Surface Reconstruction Using Deep Neural Network

l  CvxNet: Learnable Convex Decomposition

l  BSP-Net: Generating Compact Meshes via Binary Space Partitioning

l  Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image

l  Generating and Exploiting Probabilistic Monocular Depth Estimates

l  Neural Cages for Detail-Preserving 3D Deformations

l  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

l  A Lighting-Invariant Point Processor for Shading

l  ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture

l  Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations


Action and Behavior

行動と振舞いに関する論文です。3D骨格に基づく人間の動きの予測に関する論文もありますが、限られたデータセット内でAction and Behaviorを効果的に行うモデルや予測モデルの計算量を低減するモデルの開発が多い印象です。(oral)

l  Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
l  Evolving Losses for Unsupervised Video Representation Learning
l  Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
l  A Multigrid Method for Efficiently Training Video Models
l  Ego-Topo: Environment Affordances From Egocentric Video
l  Generative Hybrid Representations for Activity Forecasting With No-Regret Learning
l  Skeleton-Based Action Recognition With Shift Graph Convolutional Network
l  Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning
l  X3D: Expanding Architectures for Efficient Video Recognition
l  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction
l  Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects


Adversarial Learning

敵対的学習とか敵対性学習やadversarial attack and defense methodsに関する論文です。(oral)

l  DaST: Data-Free Substitute Training for Adversarial Attacks

l  Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations

l  The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

l  A Self-supervised Approach for Adversarial Robustness

l  Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization

l  How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework

l  Unpaired Image Super-Resolution Using Pseudo-Supervision

l  Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

l  Robustness Guarantees for Deep Neural Networks on Videos

l  Benchmarking Adversarial Robustness on Image Classification

l  What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients

l  Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking


3D From a Single Image and Shape-From-X; Action and Behavior Recognition; Adversarial Learning


l  Video Modeling With Correlation Networks

l  Projection & Probability-Driven Black-Box Attack

l  Auxiliary Training: Towards Accurate and Robust Models

l  PaStaNet: Toward Human Activity Knowledge Engine

l  A Hierarchical Graph Network for 3D Object Detection on Point Clouds

l  Learning Generative Models of Shape Handles

l  One Man's Trash Is Another Man's Treasure: Resisting Adversarial Examples by Adversarial Examples

l  Toward a Universal Model for Shape From Texture

l  HybridPose: 6D Object Pose Estimation Under Hybrid Representations

l  Boundary-Aware 3D Building Reconstruction From a Single Overhead Image

l  Articulation-Aware Canonical Surface Mapping

l  BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion

l  Transformation GAN for Unsupervised Image Synthesis and Representation Learning

l  PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection

l  Height and Uprightness Invariance for 3D Prediction From a Single View

l  SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation

l  3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

l  Adaptive Interaction Modeling via Graph Operations Search

l  Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

l  SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation

l  Single-View View Synthesis With Multiplane Images

l  Deep Parametric Shape Predictions Using Distance Fields

l  Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction

l  Ensemble Generative Cleaning With Feedback Loops for Defending Adversarial Attacks

l  Temporal Pyramid Network for Action Recognition

l  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

l  Structure-Guided Ranking Loss for Single Image Depth Prediction

l  In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction From 2D Landmarks

l  When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks

l  Towards Transferable Targeted Attack

l  Self-Supervised Human Depth Estimation From Monocular Videos

l  Recursive Social Behavior Graph for Trajectory Prediction

l  Context-Aware and Scale-Insensitive Temporal Repetition Counting

l  OASIS: A Large-Scale Dataset for Single Image 3D in the Wild

l  VPLNet: Deep Single View Normal Estimation With Vanishing Points and Lines

l  Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

l  Defending Against Universal Attacks Through Selective Feature Regeneration

l  Universal Physical Camouflage Attacks on Object Detectors

l  Intra- and Inter-Action Understanding via Temporal Action Parsing

l  Lightweight Photometric Stereo for Facial Details Recovery

l  Bundle Pooling for Polygonal Architecture Segmentation Problem

l  AvatarMe: Realistically Renderable 3D Facial Reconstruction "In-the-Wild"

l  Defending Against Model Stealing Attacks With Adaptive Misinformation

l  Learning to Generate 3D Training Data Through Hybrid Gradient

l  Cascaded Refinement Network for Point Cloud Completion

l  Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

l  Learning to Discriminate Information for Online Action Detection

l  Adversarial Examples Improve Image Recognition

l  PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

l  Actor-Transformers for Group Activity Recognition

l  SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans

l  Geometry-Aware Satellite-to-Ground Image Synthesis for Urban Areas

l  Action Modifiers: Learning From Adverbs in Instructional Videos

l  ZSTAD: Zero-Shot Temporal Activity Detection

l  Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery

l  Deep Kinematics Analysis for Monocular 3D Human Pose Estimation

l  TEA: Temporal Excitation and Aggregation for Action Recognition

l  Oops! Predicting Unintentional Action in Video

l  Scene Recomposition by Learning-Based ICP

l  Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction

l  Single-Step Adversarial Training With Dropout Scheduling

l  Deep Non-Line-of-Sight Reconstruction

l  SSRNet: Scalable 3D Surface Reconstruction Network

l  Progressive Relation Learning for Group Activity Recognition

l  Cooling-Shrinking Attack: Blinding the Tracker With Imperceptible Noises

l  Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles

l  Weakly-Supervised Action Localization by Generative Attention Modeling

l  Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes

l  Polishing Decision-Based Adversarial Noise With a Customized Sampling

l  Towards Large Yet Imperceptible Adversarial Image Perturbations With Perceptual Color Distance

l  Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks

l  Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image

l  Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

l  Active Vision for Early Recognition of Human Actions

l  SmallBigNet: Integrating Core and Contextual Views for Video Classification

l  Gate-Shift Networks for Video Action Recognition

l  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

l  Exploiting Joint Robustness to Adversarial Perturbations

l  From Image Collections to Point Clouds With Self-Supervised Shape and Pose Networks

l  Searching for Actions on the Hyperbole

l  ColorFool: Semantic Adversarial Colorization

l  Boosting the Transferability of Adversarial Samples via Attention

l  ActionBytes: Learning From Trimmed Videos to Localize Actions

l  Efficient Adversarial Training With Transferable Adversarial Examples

l  Alleviation of Gradient Exploding in GANs: Fake Can Be Real

l  On Isometry Robustness of Deep 3D Point Cloud Models Under Adversarial Attacks

l  Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations

l  QEBA: Query-Efficient Boundary-Based Blackbox Attack

l  Learning to Simulate Dynamic Environments With GameGAN

l  Learn2Perturb: An End-to-End Feature Perturbation Learning to Improve Adversarial Robustness


3D From Multiview and Sensors (1)

multiviewからの3Dやセンサー(hand-held sensorやLiDAR etc)に関する論文です。(oral)

l  SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization

l  Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes

l  TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning

l  D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

l  Deep Implicit Volume Compression

l  MAGSAC++, a Fast, Reliable and Accurate Robust Estimator

l  OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

l  4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras

l  Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion

l  Robust 3D Self-Portraits in Seconds


Computational Photography


l  FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation

l  Learning to Have an Ear for Face Super-Resolution

l  Deep Optics for Single-Shot High-Dynamic-Range Imaging

l  Learning Rank-1 Diffractive Optics for Single-Shot High Dynamic Range Imaging

l  Deep White-Balance Editing

l  Non-Line-of-Sight Surface Reconstruction Using the Directional Light-Cone Transform

l  Seeing the World in a Bag of Chips

l  Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers

l  Retina-Like Visual Image Reconstruction via Spiking Neural Model

l  Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging


Efficient Training and Inference


l  Neural Network Pruning With Residual-Connections and Limited-Data

l  AdderNet: Do We Really Need Multiplications in Deep Learning?

l  NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

l  Training Quantized Neural Networks With a Full-Precision Auxiliary Module

l  Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

l  Multi-Dimensional Pruning: A Unified Framework for Model Compression

l  Towards Efficient Model Compression via Learned Global Ranking

l  HRank: Filter Pruning Using High-Rank Feature Map

l  DMCP: Differentiable Markov Channel Pruning for Neural Networks

l  ReSprop: Reuse Sparsified Backpropagation

l  Adversarial Texture Optimization From RGB-D Scans


3D From Multiview and Sensors; Computational Photography; Efficient Training and Inference Methods for Networks


l  Synchronizing Probability Measures on Rotations via Optimal Transport

l  GhostNet: More Features From Cheap Operations

l  Attention-Aware Multi-View Stereo

l  Bi3D: Stereo Depth Estimation via Binary Classifications

l  Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging

l  SGAS: Sequential Greedy Architecture Search

l  HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection

l  Frequency Domain Compact 3D Convolutional Neural Networks

l  Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

l  DNU: Deep Non-Local Unrolling for Computational Spectral Imaging

l  Single Image Optical Flow Estimation With an Event Camera

l  Multi-View Neural Human Rendering

l  Depth Sensing Beyond LiDAR Range

l  Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras

l  Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud

l  Self-Learning Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

l  Neuromorphic Camera Guided High Dynamic Range Imaging

l  Learning in the Frequency Domain

l  Polarized Reflection Removal With Perfect Alignment in the Wild

l  Learning Multiview 3D Point Cloud Registration

l  A Sparse Resultant Based Method for Efficient Minimal Solvers

l  Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

l  BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks

l  Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis

l  A Semi-Supervised Assessor of Neural Architectures

l  Learning a Reinforced Agent for Flexible Exposure Bracketing Selection

l  CARS: Continuous Evolution for Efficient Neural Architecture Search

l  Joint 3D Instance Segmentation and Object Detection for Autonomous Driving

l  View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis

l  Collaborative Distillation for Ultra-Resolution Universal Style Transfer

l  TomoFluid: Reconstructing Dynamic Fluid From Sparse View Videos

l  Instance Shadow Detection

l  Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image

l  Discrete Model Compression With Resource Constraint for Deep Neural Networks

l  Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

l  End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds

l  Minimal Solutions for Relative Pose With a Single Affine Correspondence

l  Point Cloud Completion by Skip-Attention Network With Hierarchical Folding

l  Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement

l  AANet: Adaptive Aggregation Network for Efficient Stereo Matching

l  Towards Unified INT8 Training for Convolutional Neural Network

l  Active 3D Motion Visualization Based on Spatiotemporal Light-Ray Integration

l  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation

l  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet

l  Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration

l  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing

l  Visually Imbalanced Stereo Matching

l  Mesh-Guided Multi-View Stereo With Pyramid Architecture

l  BiDet: An Efficient Binarized Object Detector

l  Local Non-Rigid Structure-From-Motion From Diffeomorphic Mappings

l  Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar

l  APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

l  On the Acceleration of Deep Learning Model Parallelism With Staleness

l  RevealNet: Seeing Behind Objects in RGB-D Scans

l  MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning

l  StegaStamp: Invisible Hyperlinks in Physical Photographs

l  L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

l  Polarized Non-Line-of-Sight Imaging

l  AdaBits: Neural Network Quantization With Adaptive Bit-Widths

l  Multi-Scale Boosted Dehazing Network With Dense Feature Fusion

l  ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

l  Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

l  Normal Assisted Stereo Depth Estimation

l  Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach

l  gDLS*: Generalized Pose-and-Scale Estimation Given Scale and Gravity Priors

l  Embodied Language Grounding With 3D Visual Feature Representations

l  Learning to Autofocus

l  Joint Demosaicing and Denoising With Self Guidance

l  Forward and Backward Information Retention for Accurate Binary Neural Networks

l  Light Field Spatial Super-Resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

l  A Multi-Hypothesis Approach to Color Constancy

l  Learning to Restore Low-Light Images via Decomposition-and-Enhancement

l  Background Matting: The World Is Your Green Screen

l  Supervised Raw Video Denoising With a Benchmark Dataset on Dynamic Scenes

l  Photometric Stereo via Discrete Hypothesis-and-Test Search

l  Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

l  Fixed-Point Back-Propagation Training

l  Heterogeneous Knowledge Distillation Using Information Flow Modeling

l  Rethinking Differentiable Search for Mixed-Precision Neural Networks

l  Residual Feature Aggregation Network for Image Super-Resolution

l  Resolution Adaptive Networks for Efficient Inference

l  Learning to Forget for Meta-Learning

l  Deep Learning for Handling Kernel/model Uncertainty in Image Deconvolution

l  Reflection Scene Separation From a Single Image

l  Wavelet Synthesis Net for Disparity Estimation to Synthesize DSLR Calibre Bokeh Effect on Smartphones

l  Bundle Adjustment on a Graph Processor

l  3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset

l  PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

l  Scalability in Perception for Autonomous Driving: Waymo Open Dataset


3D From a Single Image and Shape-From-X (2); 3D From Multiview and Sensors (2)


l  Extreme Relative Pose Network Under Hybrid Representations

l  Single-Shot Monocular RGB-D Imaging Using Uneven Double Refraction

l  Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

l  3D Packing for Self-Supervised Monocular Depth Estimation

l  Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

l  From Two Rolling Shutters to One Global Shutter

l  Deep Global Registration

l  Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness

l  Why Having 10,000 Parameters in Your Camera Model Is Better Than Twelve

l  Blur Aware Calibration of Multi-Focus Plenoptic Camera

l  Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields

l  SAL: Sign Agnostic Learning of Shapes From Raw Data

l  Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval


Image Retrieval; Datasets and Evaluation


l  Instance Guided Proposal Network for Person Search

l  Which Is Plagiarism: Fashion Image Retrieval Based on Regional Representation for Design Protection

l  Inter-Task Association Critic for Cross-Resolution Person Re-Identification

l  FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding

l  Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition

l  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

l  Rethinking Computer-Aided Tuberculosis Diagnosis

l  IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning

l  Revisiting Saliency Metrics: Farthest-Neighbor Area Under Curve

l  Computing the Testing Error Without a Testing Set

l  Improving Confidence Estimates for Unfamiliar Examples

l  CycleISP: Real Image Restoration via Improved Data Synthesis


Low-Level and Physics-Based Vision


l  Enhanced Blind Face Restoration With Multi-Exemplar Images and Adaptive Spatial Feature Fusion

l  Explorable Super Resolution

l  Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes

l  Deblurring by Realistic Blurring

l  Bringing Old Photos Back to Life

l  A Physics-Based Noise Formation Model for Extreme Low-Light Raw Denoising

l  Learning to Super Resolve Intensity Images From Events

l  Camouflaged Object Detection

l  Holistically-Attracted Wireframe Parsing


3D From a Single Image and Shape-From-X; 3D From Multiview and Sensors; Image Retrieval; Datasets and Evaluation; Low-Level and Physics-Based Vision


l  Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction

l  Domain Adaptation for Image Dehazing

l  Auto-Encoding Twin-Bottleneck Hashing

l  Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

l  Bi-Directional Interaction Network for Person Search

l  Meshlet Priors for 3D Mesh Reconstruction

l  Space-Time-Aware Multi-Resolution Video Enhancement

l  FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

l  MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation

l  DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

l  Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification

l  Online Joint Multi-Metric Adaptation From Frequent Sharing-Subset Mining for Person Re-Identification

l  Taking a Deeper Look at Co-Salient Object Detection

l  Single-Stage 6D Object Pose Estimation

l  OccuSeg: Occupancy-Aware 3D Instance Segmentation

l  Camera Trace Erasing

l  Deep Metric Learning via Adaptive Learnable Assessment

l  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective

l  Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention

l  HUMBI: A Large Multiview Dataset of Human Body Expressions

l  Image Search With Text Feedback by Visiolinguistic Attention Learning

l  Image Processing Using Multi-Code GAN Prior

l  What Does Plate Glass Reveal About Camera Calibration?

l  Zero-Assignment Constraint for Graph Matching With Outliers

l  Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

l  JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

l  From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement

l  Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution

l  Central Similarity Quantization for Efficient Image and Video Retrieval

l  ARCH: Animatable Reconstruction of Clothed Humans

l  A Model-Driven Deep Neural Network for Single Image Rain Removal

l  Novel Object Viewpoint Estimation Through Reconstruction Alignment

l  Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

l  Evaluating Weakly Supervised Object Localization Methods Right

l  Style Normalization and Restitution for Generalizable Person Re-Identification

l  Reconstruct Locally, Localize Globally: A Model Free Method for Object Pose Estimation

l  RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

l  All in One Bad Weather Removal Using Architectural Search

l  Relation-Aware Global Attention for Person Re-Identification

l  HOnnotate: A Method for 3D Annotation of Hand and Object Poses

l  Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics

l  Deep Unfolding Network for Image Super-Resolution

l  On the Uncertainty of Self-Supervised Monocular Depth Estimation

l  Proxy Anchor Loss for Deep Metric Learning

l  Unsupervised Learning for Intrinsic Image Decomposition From a Single Image

l  Multi-Domain Learning for Accurate and Few-Shot Color Constancy

l  PANDA: A Gigapixel-Level Human-Centric Video Dataset

l  Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS

l  Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification

l  Salience-Guided Cascaded Suppression Network for Person Re-Identification

l  Fashion Outfit Complementary Item Retrieval

l  Learning Event-Based Motion Deblurring

l  Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

l  Neural Blind Deconvolution Using Deep Priors

l  Anisotropic Convolutional Networks for 3D Semantic Scene Completion

l  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

l  Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

l  Fast MSER

l  Unsupervised Person Re-Identification via Softened Similarity Learning

l  COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification

l  Learning Formation of Physically-Based Face Attributes

l  Generalized Product Quantization Network for Semi-Supervised Image Retrieval

l  Stereoscopic Flash and No-Flash Photography for Shape and Albedo Recovery

l  Context-Aware Group Captioning via Self-Attention and Contrastive Features

l  MEBOW: Monocular Estimation of Body Orientation in the Wild

l  Distilling Image Dehazing With Heterogeneous Task Imitation

l  Select, Supplement and Focus for RGB-D Saliency Detection

l  Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization

l  On Joint Estimation of Pose, Geometry and svBRDF From a Handheld Scanner

l  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision

l  Meta-Transfer Learning for Zero-Shot Super-Resolution

l  Solving Jigsaw Puzzles With Eroded Boundaries

l  Context-Aware Attention Network for Image-Text Retrieval

l  M-LVC: Multiple Frames Prediction for Learned Video Compression

l  Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training

l  Single Image Reflection Removal Through Cascaded Refinement

l  From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality

l  Video to Events: Recycling Video Datasets for Event Cameras

l  Composed Query Image Retrieval Using Locally Bounded Features

l  Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring

l  End-to-End Illuminant Estimation Based on Deep Metric Learning

l  Variational-EM-Based Deep Learning for Noise-Blind Image Deblurring

l  Image Demoireing with Learnable Bandpass Filters

l  Assessing Image Quality Issues for Real-World Problems

l  Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

l  Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network

l  Perceptual Quality Assessment of Smartphone Photography

l  Don't Hit Me! Glass Detection in Real-World Scenes

l  Progressive Mirror Detection


Scene Analysis and Understanding


l  Category-Level Articulated Object Pose Estimation

l  Unbiased Scene Graph Generation From Biased Training

l  Dynamic Graph Message Passing Networks

l  Weakly Supervised Visual Semantic Parsing

l  GPS-Net: Graph Property Sensing Network for Scene Graph Generation

l  End-to-End Optimization of Scene Layout

l  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision

l  Dual Super-Resolution Learning for Semantic Segmentation

l  Self-Supervised Scene De-Occlusion

l  BANet: Bidirectional Aggregation Network With Occlusion Handling for Panoptic Segmentation


Medical, Biological and Cell Microscopy

医療関係の論文です。例えば、顕微鏡におけるギガピクセルも持つwhole slide imagesの処理をすることが近年行われているが、医師と同じで、怪しい部分だけ使用することが正しいということを主張している研究と言った医療関係の方はここを見ればいいですね。(oral)

l  CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

l  Cross-View Correspondence Reasoning Based on Bipartite Graph Convolutional Network for Mammogram Mass Detection

l  MPM: Joint Representation of Motion and Position Map for Cell Tracking

l  Deep Distance Transform for Tubular Structure Segmentation in CT Scans

l  Instance Segmentation of Biological Images Using Harmonic Embeddings

l  Multi-scale Domain-adversarial Multiple-instance CNN for Cancer Subtype Classification with Unannotated Histopathological Images

l  SOS: Selective Objective Switch for Rapid Immunofluorescence Whole Slide Image Classification


Transfer/Low-Shot/Semi/Unsupervised Learning (1)

Transfer/Low-Shot/Semi/Unsupervised Learningがまとまっています。かなりビックリするものもありますのでここから覗いてみてもいいかもしれませんね。(oral)

l  Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density Networks

l  METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos

l  Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

l  Revisiting Knowledge Distillation via Label Smoothing Regularization

l  WCP: Worst-Case Perturbations for Semi-Supervised Deep Learning

l  DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

l  Conditional Channel Gated Networks for Task-Aware Continual Learning

l  Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations


Scene Analysis and Understanding; Medical, Biological and Cell Microscopy; Transfer/Low-Shot/Semi/Unsupervised Learning


l  FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

l  Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions

l  Self-Supervised Viewpoint Learning From Image Collections

l  Two-Shot Spatially-Varying BRDF and Shape Estimation

l  Variational Context-Deformable ConvNets for Indoor Scene Parsing

l  Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

l  Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector

l  What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation

l  ADINet: Attribute Driven Incremental Network for Retinal Image Classification

l  Unsupervised Domain Adaptation With Hierarchical Gradient Synchronization

l  Deep Grouping Model for Unified Perceptual Parsing

l  Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

l  Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging

l  FDA: Fourier Domain Adaptation for Semantic Segmentation

l  Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery

l  When2com: Multi-Agent Perception via Communication Graph Grouping

l  Learning Human-Object Interaction Detection Using Interaction Points

l  C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

l  Adaptive Subspaces for Few-Shot Learning

l  Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection

l  Stochastic Sparse Subspace Clustering

l  CRNet: Cross-Reference Networks for Few-Shot Segmentation

l  Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data

l  Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings

l  3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior

l  Graph-Guided Architecture Search for Real-Time Semantic Segmentation

l  Composing Good Shots by Exploiting Mutual Relations

l  Organ at Risk Segmentation for Head and Neck Cancer Using Stratified Learning and Neural Architecture Search

l  G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features

l  Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-Weighting

l  Single-Stage Semantic Segmentation From Image Labels

l  Cascaded Human-Object Interaction Recognition

l  DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction With Deep T1 Prior

l  Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation

l  FPConv: Learning Local Flattening for Point Convolution

l  Rotation Equivariant Graph Convolutional Network for Spherical Image Classification

l  FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation

l  ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

l  Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer

l  Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

l  Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior

l  Structure Preserving Generative Cross-Domain Learning

l  Reverse Perspective Network for Perspective-Aware Object Counting

l  Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

l  Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation

l  ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes

l  Understanding Road Layout From Videos as a Whole

l  Bi-Directional Relationship Inferring Network for Referring Image Segmentation

l  Perspective Plane Program Induction From a Single Image

l  DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration

l  Semi-Supervised Learning for Few-Shot Image-to-Image Translation

l  Semantic Correspondence as an Optimal Transport Problem

l  How Much Time Do You Have? Modeling Multi-Duration Saliency

l  Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention

l  Online Depth Learning Against Forgetting in Monocular Videos

l  Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation

l  Pattern-Structure Diffusion for Multi-Task Learning

l  Training Noise-Robust Deep Neural Networks via Meta-Learning

l  Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

l  Universal Source-Free Domain Adaptation

l  Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

l  Varicolored Image De-Hazing

l  SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds

l  Separating Particulate Matter From a Single Microscopic Image

l  Adaptive Dilated Network With Self-Correction Supervision for Counting

l  PointPainting: Sequential Fusion for 3D Object Detection

l  Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

l  Learning to Select Base Classes for Few-Shot Classification

l  CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

l  Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks

l  Distilled Semantics for Comprehensive Scene Understanding from Videos

l  Modeling Biological Immunity to Adversarial Examples

l  DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-Move Forgery Detection and Localization

l  Correspondence-Free Material Reconstruction using Sparse Surface Constraints

l  Augmenting Colonoscopy Using Extended and Directional CycleGAN for Lossy Image Translation

l  Attention Scaling for Crowd Counting

l  Shape Reconstruction by Learning Differentiable Surface Representations

l  A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image

l  Attention-Based Context Aware Reasoning for Situation Recognition

l  PatchVAE: Learning Local Latent Codes for Recognition

l  Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

l  STAViS: Spatio-Temporal AudioVisual Saliency Network

l  More Grounded Image Captioning by Distilling Image-Text Matching Model

l  DUNIT: Detection-Based Unsupervised Image-to-Image Translation

l  Learning to Observe: Approximating Human Perceptual Thresholds for Detection of Suprathreshold Image Transformations

l  Show, Edit and Tell: A Framework for Editing Image Captions

l  Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary

l  Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers

l  Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

l  Extremely Dense Point Correspondences Using a Learned Feature Descriptor

3D From Multiview and Sensors (3)

3D From Multiview and Sensors の最後のoralになります。あとは下にポスターがあり、3D From Multiview and Sensors に興味がある人はそこまで頑張りましょう。(oral)

l  Local Deep Implicit Functions for 3D Shape

l  PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

l  Cost Volume Pyramid Based Depth Inference for Multi-View Stereo

l  RoutedFusion: Learning Real-Time Depth Map Fusion

l  VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals

l  Learning to Optimize Non-Rigid Tracking

l  KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering

l  Information-Driven Direct RGB-D Odometry

l  SuperGlue: Learning Feature Matching With Graph Neural Networks

l  Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task


Face, Gesture, and Body Pose (1)


l  ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction

l  EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera

l  Cross-Modal Deep Face Normals With Deactivable Skip Connections

l  Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild

l  Face X-Ray for More General Face Forgery Detection

l  A Morphable Face Albedo Model

l  Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses

l  GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes

l  Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing

l  DeepCap: Monocular Human Performance Capture Using Weak Supervision

l  Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

l  Advancing High Fidelity Identity Swapping for Forgery Detection


Image and Video Synthesis (1)


l  Controllable Person Image Synthesis With Attribute-Decomposed GAN

l  Attentive Normalization for Conditional Image Generation

l  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization

l  Blurry Video Frame Interpolation

l  Learning Physics-Guided Face Relighting Under Directional Light

l  Disentangled Image Generation Through Structured Noise Injection

l  Cross-Domain Correspondence Learning for Exemplar-Based Image Translation

l  Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning

l  Single Image Reflection Removal With Physically-Based Training Images

l  SketchyCOCO: Image Generation From Freehand Scene Sketches

l  Image Based Virtual Try-On Network From Unpaired Data

l  PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer


3D From Multiview and Sensors; Face, Gesture, and Body Pose; Image and Video Synthesis


l  RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild

l  Semantic Image Manipulation Using Scene Graphs

l  A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

l  Transferring Dense Pose to Proximal Animal Classes

l  Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild

l  VIBE: Video Inference for Human Body Pose and Shape Estimation

l  G3AN: Disentangling Appearance and Motion for Video Generation

l  Domain Adaptive Image-to-Image Translation

l  GAN Compression: Efficient Architectures for Interactive Conditional GANs

l  Searching Central Difference Convolutional Networks for Face Anti-Spoofing

l  TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

l  AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

l  FReeNet: Multi-Identity Face Reenactment

l  Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera

l  Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data

l  The GAN That Warped: Semantic Attribute Editing With Unpaired Data

l  4D Visualization of Dynamic Events From Unconstrained Multi-View Videos

l  Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

l  HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

l  Detecting Attended Visual Targets in Video

l  Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

l  Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool

l  Neural Contours: Learning to Draw Lines From 3D Shapes

l  Softmax Splatting for Video Frame Interpolation

l  CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

l  Probabilistic Structural Latent Representation for Unsupervised Embedding

l  Semantically Multi-Modal Image Synthesis

l  Nested Scale-Editing for Conditional Image Synthesis

l  UnrealText: Synthesizing Realistic Scene Text Images From the Unreal World

l  Fast Texture Synthesis via Pseudo Optimizer

l  Towards Learning Structure via Consensus for Face Segmentation and Parsing

l  CookGAN: Causality Based Text-to-Image Synthesis

l  Weakly Supervised Discriminative Feature Learning With State Information for Person Identification

l  Future Video Synthesis With Object Motion Prediction

l  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

l  A Graduated Filter Method for Large Scale Robust Estimation

l  Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation

l  Coherent Reconstruction of Multiple Humans From a Single Image

l  PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling

l  A Neural Rendering Framework for Free-Viewpoint Relighting

l  A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection

l  GroupFace: Learning Latent Groups and Constructing Group-Based Representations for Face Recognition

l  Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution

l  Time Flies: Animating a Still Image With Time-Lapse Video As Reference

l  SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness

l  Grid-GCN for Fast and Scalable Point Cloud Learning

l  Domain Balancing: Face Recognition on Long-Tailed Domains

l  AdversarialNAS: Adversarial Neural Architecture Search for GANs

l  Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

l  The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation

l  Data Uncertainty Learning in Face Recognition

l  Regularizing Discriminative Capability of CGANs for Semi-Supervised Generative Learning

l  FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification

l  UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation

l  Decoupled Representation Learning for Skeleton-Based Gesture Recognition

l  An Efficient PointLSTM for Point Clouds Based Gesture Recognition

l  Editing in Style: Uncovering the Local Semantics of GANs

l  On the Detection of Digital Face Manipulation

l  Learning Texture Transformer Network for Image Super-Resolution

l  Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence

l  Deblurring Using Analysis-Synthesis Networks Pair

l  Exploring Unlabeled Faces for Novel Attribute Discovery

l  Neural Pose Transfer by Spatially Adaptive Instance Normalization

l  Fine-Grained Image-to-Image Transformation Towards Visual Recognition

l  Deep Facial Non-Rigid Multi-View Stereo

l  Attention-Driven Cropping for Very High Resolution Facial Landmark Detection

l  Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

l  End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

l  Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks

l  CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

l  Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

l  One-Shot Domain Adaptation for Face Generation

l  BidNet: Binocular Image Dehazing Without Explicit Disparity Estimation

l  Deep Shutter Unrolling Network

l  Joint Texture and Geometry Optimization for RGB-D Reconstruction

l  Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images

l  Auto-Tuning Structured Light by Optical Stochastic Gradient Descent

l  MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction

l  Uncertainty Based Camera Model Selection

l  Local Implicit Grid Representations for 3D Scenes

l  TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell

l  Averaging Essential and Fundamental Matrices in Collinear Camera Settings

l  On the Distribution of Minima in Intrinsic-Metric Rotation Averaging

l  Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation

l  A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset

l  Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation

l  Effectively Unbiased FID and Inception Score and Where to Find Them

l  Robust Homography Estimation via Dual Principal Component Pursuit

l  Non-Adversarial Video Synthesis With Learned Priors

l  Uncertainty-Aware Mesh Decoder for High Fidelity 3D Face Reconstruction


Face, Gesture, and Body Pose (2)

上記ですでに出ているFace, Gesture, and Body Poseのパート2です。(oral)

l  3FabRec: Fast Few-Shot Face Alignment by Reconstruction

l  Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects

l  Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition

l  StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images

l  Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis

l  Learning Meta Face Recognition in Unseen Domains

l  Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data

l  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models

l  Generating 3D People in Scenes Without People

l  Transferring Cross-Domain Knowledge for Video Sign Language Recognition

l  Bodies at Rest: 3D Human Pose and Shape Estimation From a Pressure Image Using Synthetic Data

l  Bayesian Adversarial Human Motion Synthesis


Motion and Tracking (1)


l  LSM: Learning Subspace Minimization for Low-Level Vision

l  Learning a Neural Solver for Multiple Object Tracking

l  GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences

l  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking

l  MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask

l  Tracking by Instance Detection: A Meta-Learning Approach

l  High-Performance Long-Term Tracking With Meta-Updater

l  TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

l  Collaborative Motion Prediction via Neural Motion Message Passing

l  P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

l  Self-Supervised Deep Visual Odometry With Online Adaptation

l Globally Optimal Contrast Maximisation for Event-Based Motion Estimation

Representation Learning


l  D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

l  Towards Backward-Compatible Representation Learning

l  PointAugment: An Auto-Augmentation Framework for Point Cloud Classification

l  Cross-Batch Memory for Embedding Learning

l  Circle Loss: A Unified Perspective of Pair Similarity Optimization

l  Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

l  Hyperbolic Image Embeddings

l  Controllable Orthogonalization in Training DNNs

l  An Investigation Into the Stochasticity of Batch Whitening


Face, Gesture, and Body Pose; Motion and Tracking; Representation Learning


l  High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

l  Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

l  Learning to Dress 3D People in Generative Clothing

l  MAST: A Memory-Augmented Self-Supervised Tracker

l  Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation

l  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning

l  ClusterFit: Improving Generalization of Visual Representations

l  Learning Dynamic Relationships for 3D Human Motion Prediction

l  Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge

l  S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

l  Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

l  Learning to Manipulate Individual Objects in an Image

l  PADS: Policy-Adapted Sampling for Visual Similarity Learning

l  Siam R-CNN: Visual Tracking by Re-Detection

l  ASLFeat: Learning Local Features of Accurate Shape and Localization

l  Filter Grafting for Deep Neural Networks

l  HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation

l  DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

l  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement

l  Learning Better Lossless Compression Using Lossy Compression

l  Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching

l  Multi-Scale Fusion Subspace Clustering Using Similarity Constraint

l  Siamese Box Adaptive Network for Visual Tracking

l  Cross-Domain Face Presentation Attack Detection via Multi-Domain Disentangled Representation Learning

l  Online Deep Clustering for Unsupervised Representation Learning

l  Density-Aware Feature Embedding for Face Clustering

l  Self-Supervised Learning of Pretext-Invariant Representations

l  ROAM: Recurrently Optimizing Tracking Model

l  Deformable Siamese Attention Networks for Visual Object Tracking

l  15 Keypoints Is All You Need

l  Optical Flow in the Dark

l  Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt

l  A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

l  Sub-Frame Appearance and 6D Pose Estimation of Fast Moving Objects

l  How to Train Your Deep Multi-Object Tracker

l  TPNet: Trajectory Proposal Network for Motion Prediction

l  Large Scale Video Representation Learning via Relational Graph Clustering

l  Towards Universal Representation Learning for Deep Face Recognition

l  Robust Partial Matching for Person Search in the Wild

l  Correlation-Guided Attention for Corner Detection Based Visual Tracking

l  Learning Multi-Object Tracking and Segmentation From Automatic Annotations

l  PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation

l  Rotation Consistent Margin Loss for Efficient Low-Bit Face Recognition

l  Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

l  Unity Style Transfer for Person Re-Identification

l  Suppressing Uncertainties for Large-Scale Facial Expression Recognition

l  Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation

l  Regularizing Neural Networks via Minimizing Hyperspherical Energy

l  Learning Representations by Predicting Bags of Visual Words

l  AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

l  A Transductive Approach for Video Object Segmentation

l  Dynamic Face Video Segmentation via Reinforcement Learning

l  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

l  Semantic Drift Compensation for Class-Incremental Learning

l  Context-Aware Human Motion Prediction

l  DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data

l  Optical Non-Line-of-Sight Physics-Based 3D Human Pose Estimation

l  Learning to Transfer Texture From Clothing Images to 3D Humans

l  UniPose: Unified Human Pose Estimation in Single Images and Videos

l  Minimal Solutions to Relative Pose Estimation From Two Views Sharing a Common Direction With Unknown Focal Length

l  3D Human Mesh Regression With Dense Correspondence

l  Cross-Modal Pattern-Propagation for RGB-T Tracking

l  Distilling Knowledge From Graph Convolutional Networks

l  Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment

l  Distribution-Aware Coordinate Representation for Human Pose Estimation

l  Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification

l  HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

l  Determinant Regularization for Gradient-Efficient Graph Matching

l  D3S - A Discriminative Single Shot Segmentation Tracker

l  MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction

l  End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

l  GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-Wise Transformations

l  Can Facial Pose and Expression Be Separated With Weak Perspective Camera?

l  Probabilistic Regression for Visual Tracking

l  3DRegNet: A Deep Neural Network for 3D Point Registration

l  Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

l  Three-Dimensional Reconstruction of Human Interactions

l  Distribution-Induced Bidirectional Generative Adversarial Network for Graph Representation Learning

l  Minimal Solvers for 3D Scan Alignment With Pairs of Intersecting Lines

l  Wavelet Integrated CNNs for Noise-Robust Image Classification

l  Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning

l  PropagationNet: Propagate Points to Curve to Learn Structure Information

l  Sequential 3D Human Pose and Shape Estimation From Point Clouds

l  Improving the Robustness of Capsule Networks to Image Affine Transformations

l  Noise Modeling, Synthesis and Classification for Generic Object Anti-Spoofing

l  Quaternion Product Units for Deep Learning on 3D Rotation Groups

l  Unsupervised Representation Learning for Gaze Estimation

l  P-nets: Deep Polynomial Neural Networks

l  Hierarchically Robust Representation Learning

l  How Useful Is Self-Supervised Pretraining for Visual Tasks?

Face, Gesture, and Body Pose (3); Motion and Tracking (2)

Face, Gesture, and Body Poseの(3)とMotion and Tracking(2)です。(oral)

l  Copy and Paste GAN: Face Hallucination From Shaded Thumbnails

l  TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style

l  Object-Occluded Human Shape and Pose Estimation From a Single Color Image

l  Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

l  Self-Supervised Monocular Scene Flow Estimation

l  Learning Fast and Robust Target Models for Video Object Segmentation

l  Reciprocal Learning Networks for Human Trajectory Prediction

l  Nonparametric Object and Parts Modeling With Lie Group Dynamics


Image and Video Synthesis (2); Neural Generative Models

合成に関する内容ですが少し内容がNeural Generative Modelsにメインが置かれています。(oral)

l  Learning to Shadow Hand-Drawn Sketches

l  Intuitive, Interactive Beard and Hair Synthesis With Generative Models

l  Semantic Pyramid for Image Generation

l  SynSin: End-to-End View Synthesis From a Single Image

l  A Characteristic Function Approach to Deep Implicit Generative Modeling

l  High-Resolution Daytime Translation Without Domain Labels

l  Leveraging 2D Data to Learn Textured 3D Mesh Generation

l  Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

l  Flow Contrastive Estimation of Energy-Based Models


Optimization and Learning Methods


l  Hardware-in-the-Loop End-to-End Optimization of Camera Image Processing Pipelines

l  Search to Distill: Pearls Are Everywhere but Not the Eyes

l  Total Deep Variation for Linear Inverse Problems

l  Relative Interior Rule in Block-Coordinate Descent

l  Learning Combinatorial Solver for Graph Matching

l  SampleNet: Differentiable Point Cloud Sampling

l  Can We Learn Heuristics for Graphical Model Inference Using Reinforcement Learning?

l  Quasi-Newton Solver for Robust Non-Rigid Registration

l  Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective

l  Optimizing Rank-Based Metrics With Blackbox Differentiation


Face, Gesture, and Body Pose; Motion and Tracking; Image and Video Synthesis; Nearal Generative Models; Optimization and Learning Methods


l  DualSDF: Semantic Shape Manipulation Using a Two-Level Representation

l  Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

l  Deep Homography Estimation for Dynamic Scenes

l  PF-Net: Point Fractal Network for 3D Point Cloud Completion

l  On the Regularization Properties of Structured Dropout

l  Learning Oracle Attention for High-Fidelity Face Completion

l  Deep Image Spatial Transformation for Person Image Generation

l  Learning to Optimize on SPD Manifolds

l  Deep 3D Portrait From a Single Image

l  RDCFace: Radial Distortion Correction for Face Recognition

l  Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition

l  MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis

l  SAINT: Spatially Aware Interpolation NeTwork for Medical Slice Synthesis

l  Recurrent Feature Reasoning for Image Inpainting

l  Structure-Preserving Super Resolution With Gradient Guidance

l  Epipolar Transformers

l  Diversified Arbitrary Style Transfer via Deep Feature Perturbation

l  MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks

l  Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

l  Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds

l  Neural Point Cloud Rendering via Multi-Plane Projection

l  Wish You Were Here: Context-Aware Human Generation

l  Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content

l  Breaking the Cycle - Colleagues Are All You Need

l  Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

l  ManiGAN: Text-Guided Image Manipulation

l  Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions

l  Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems

l  Barycenters of Natural Images Constrained Wasserstein Barycenters for Image Morphing

l  Guided Variational Autoencoder for Disentanglement Learning

l  Cross-Spectral Face Hallucination via Disentangling Independent Factors

l  Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules

l  C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds