最新AI論文をキャッチアップ

CVPR2020_覗いてみるpart2

CVPR2020_覗いてみるpart2

論文

皆さんが興味あるセッションや技術の中から論文や先行研究をリサーチされると良いと思います。

()がついているものはその後にも同じセッションとして出てきますので興味あるセッションで()がついているものは最後まで見てください。

cvpaperchallengeによる論文サマリ!(現在1000本突破!) 

CVPR2020技術報告会資料公開!!

技術資料

3D From a Single Image and Shape-From-X (1)

1枚の画像から3Dを作成することや形状の再構成と言った”形”に関する論文です。(oral)

l  Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild

l  Footprints and Free Space From a Single Color Image

l  Dynamic Fluid Surface Reconstruction Using Deep Neural Network

l  CvxNet: Learnable Convex Decomposition

l  BSP-Net: Generating Compact Meshes via Binary Space Partitioning

l  Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image

l  Generating and Exploiting Probabilistic Monocular Depth Estimates

l  Neural Cages for Detail-Preserving 3D Deformations

l  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

l  A Lighting-Invariant Point Processor for Shading

l  ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture

l  Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations

 

Action and Behavior

行動と振舞いに関する論文です。3D骨格に基づく人間の動きの予測に関する論文もありますが、限られたデータセット内でAction and Behaviorを効果的に行うモデルや予測モデルの計算量を低減するモデルの開発が多い印象です。(oral)

l  Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
l  Evolving Losses for Unsupervised Video Representation Learning
l  Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
l  A Multigrid Method for Efficiently Training Video Models
l  Ego-Topo: Environment Affordances From Egocentric Video
l  Generative Hybrid Representations for Activity Forecasting With No-Regret Learning
l  Skeleton-Based Action Recognition With Shift Graph Convolutional Network
l  Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning
l  X3D: Expanding Architectures for Efficient Video Recognition
l  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction
l  Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

 

Adversarial Learning

敵対的学習とか敵対性学習やadversarial attack and defense methodsに関する論文です。(oral)

l  DaST: Data-Free Substitute Training for Adversarial Attacks

l  Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations

l  The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

l  A Self-supervised Approach for Adversarial Robustness

l  Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization

l  How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework

l  Unpaired Image Super-Resolution Using Pseudo-Supervision

l  Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

l  Robustness Guarantees for Deep Neural Networks on Videos

l  Benchmarking Adversarial Robustness on Image Classification

l  What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients

l  Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking

 

3D From a Single Image and Shape-From-X; Action and Behavior Recognition; Adversarial Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  Video Modeling With Correlation Networks

l  Projection & Probability-Driven Black-Box Attack

l  Auxiliary Training: Towards Accurate and Robust Models

l  PaStaNet: Toward Human Activity Knowledge Engine

l  A Hierarchical Graph Network for 3D Object Detection on Point Clouds

l  Learning Generative Models of Shape Handles

l  One Man's Trash Is Another Man's Treasure: Resisting Adversarial Examples by Adversarial Examples

l  Toward a Universal Model for Shape From Texture

l  HybridPose: 6D Object Pose Estimation Under Hybrid Representations

l  Boundary-Aware 3D Building Reconstruction From a Single Overhead Image

l  Articulation-Aware Canonical Surface Mapping

l  BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion

l  Transformation GAN for Unsupervised Image Synthesis and Representation Learning

l  PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection

l  Height and Uprightness Invariance for 3D Prediction From a Single View

l  SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation

l  3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

l  Adaptive Interaction Modeling via Graph Operations Search

l  Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

l  SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation

l  Single-View View Synthesis With Multiplane Images

l  Deep Parametric Shape Predictions Using Distance Fields

l  Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction

l  Ensemble Generative Cleaning With Feedback Loops for Defending Adversarial Attacks

l  Temporal Pyramid Network for Action Recognition

l  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

l  Structure-Guided Ranking Loss for Single Image Depth Prediction

l  In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction From 2D Landmarks

l  When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks

l  Towards Transferable Targeted Attack

l  Self-Supervised Human Depth Estimation From Monocular Videos

l  Recursive Social Behavior Graph for Trajectory Prediction

l  Context-Aware and Scale-Insensitive Temporal Repetition Counting

l  OASIS: A Large-Scale Dataset for Single Image 3D in the Wild

l  VPLNet: Deep Single View Normal Estimation With Vanishing Points and Lines

l  Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

l  Defending Against Universal Attacks Through Selective Feature Regeneration

l  Universal Physical Camouflage Attacks on Object Detectors

l  Intra- and Inter-Action Understanding via Temporal Action Parsing

l  Lightweight Photometric Stereo for Facial Details Recovery

l  Bundle Pooling for Polygonal Architecture Segmentation Problem

l  AvatarMe: Realistically Renderable 3D Facial Reconstruction "In-the-Wild"

l  Defending Against Model Stealing Attacks With Adaptive Misinformation

l  Learning to Generate 3D Training Data Through Hybrid Gradient

l  Cascaded Refinement Network for Point Cloud Completion

l  Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

l  Learning to Discriminate Information for Online Action Detection

l  Adversarial Examples Improve Image Recognition

l  PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

l  Actor-Transformers for Group Activity Recognition

l  SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans

l  Geometry-Aware Satellite-to-Ground Image Synthesis for Urban Areas

l  Action Modifiers: Learning From Adverbs in Instructional Videos

l  ZSTAD: Zero-Shot Temporal Activity Detection

l  Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery

l  Deep Kinematics Analysis for Monocular 3D Human Pose Estimation

l  TEA: Temporal Excitation and Aggregation for Action Recognition

l  Oops! Predicting Unintentional Action in Video

l  Scene Recomposition by Learning-Based ICP

l  Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction

l  Single-Step Adversarial Training With Dropout Scheduling

l  Deep Non-Line-of-Sight Reconstruction

l  SSRNet: Scalable 3D Surface Reconstruction Network

l  Progressive Relation Learning for Group Activity Recognition

l  Cooling-Shrinking Attack: Blinding the Tracker With Imperceptible Noises

l  Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles

l  Weakly-Supervised Action Localization by Generative Attention Modeling

l  Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes

l  Polishing Decision-Based Adversarial Noise With a Customized Sampling

l  Towards Large Yet Imperceptible Adversarial Image Perturbations With Perceptual Color Distance

l  Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks

l  Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image

l  Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

l  Active Vision for Early Recognition of Human Actions

l  SmallBigNet: Integrating Core and Contextual Views for Video Classification

l  Gate-Shift Networks for Video Action Recognition

l  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

l  Exploiting Joint Robustness to Adversarial Perturbations

l  From Image Collections to Point Clouds With Self-Supervised Shape and Pose Networks

l  Searching for Actions on the Hyperbole

l  ColorFool: Semantic Adversarial Colorization

l  Boosting the Transferability of Adversarial Samples via Attention

l  ActionBytes: Learning From Trimmed Videos to Localize Actions

l  Efficient Adversarial Training With Transferable Adversarial Examples

l  Alleviation of Gradient Exploding in GANs: Fake Can Be Real

l  On Isometry Robustness of Deep 3D Point Cloud Models Under Adversarial Attacks

l  Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations

l  QEBA: Query-Efficient Boundary-Based Blackbox Attack

l  Learning to Simulate Dynamic Environments With GameGAN

l  Learn2Perturb: An End-to-End Feature Perturbation Learning to Improve Adversarial Robustness

 

3D From Multiview and Sensors (1)

multiviewからの3Dやセンサー(hand-held sensorやLiDAR etc)に関する論文です。(oral)

l  SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization

l  Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes

l  TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning

l  D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

l  Deep Implicit Volume Compression

l  MAGSAC++, a Fast, Reliable and Accurate Robust Estimator

l  OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

l  4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras

l  Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion

l  Robust 3D Self-Portraits in Seconds

 

Computational Photography

"画像"にメインが置かれた論文です。超解像度もここに含まれています。(oral)

l  FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation

l  Learning to Have an Ear for Face Super-Resolution

l  Deep Optics for Single-Shot High-Dynamic-Range Imaging

l  Learning Rank-1 Diffractive Optics for Single-Shot High Dynamic Range Imaging

l  Deep White-Balance Editing

l  Non-Line-of-Sight Surface Reconstruction Using the Directional Light-Cone Transform

l  Seeing the World in a Bag of Chips

l  Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers

l  Retina-Like Visual Image Reconstruction via Spiking Neural Model

l  Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging

 

Efficient Training and Inference

"効率"にメインが置かれた論文です。高速化や効率的に精度を出すと言った観点もここに含まれます。(oral)

l  Neural Network Pruning With Residual-Connections and Limited-Data

l  AdderNet: Do We Really Need Multiplications in Deep Learning?

l  NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

l  Training Quantized Neural Networks With a Full-Precision Auxiliary Module

l  Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

l  Multi-Dimensional Pruning: A Unified Framework for Model Compression

l  Towards Efficient Model Compression via Learned Global Ranking

l  HRank: Filter Pruning Using High-Rank Feature Map

l  DMCP: Differentiable Markov Channel Pruning for Neural Networks

l  ReSprop: Reuse Sparsified Backpropagation

l  Adversarial Texture Optimization From RGB-D Scans

 

3D From Multiview and Sensors; Computational Photography; Efficient Training and Inference Methods for Networks

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  Synchronizing Probability Measures on Rotations via Optimal Transport

l  GhostNet: More Features From Cheap Operations

l  Attention-Aware Multi-View Stereo

l  Bi3D: Stereo Depth Estimation via Binary Classifications

l  Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging

l  SGAS: Sequential Greedy Architecture Search

l  HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection

l  Frequency Domain Compact 3D Convolutional Neural Networks

l  Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

l  DNU: Deep Non-Local Unrolling for Computational Spectral Imaging

l  Single Image Optical Flow Estimation With an Event Camera

l  Multi-View Neural Human Rendering

l  Depth Sensing Beyond LiDAR Range

l  Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras

l  Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud

l  Self-Learning Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

l  Neuromorphic Camera Guided High Dynamic Range Imaging

l  Learning in the Frequency Domain

l  Polarized Reflection Removal With Perfect Alignment in the Wild

l  Learning Multiview 3D Point Cloud Registration

l  A Sparse Resultant Based Method for Efficient Minimal Solvers

l  Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

l  BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks

l  Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis

l  A Semi-Supervised Assessor of Neural Architectures

l  Learning a Reinforced Agent for Flexible Exposure Bracketing Selection

l  CARS: Continuous Evolution for Efficient Neural Architecture Search

l  Joint 3D Instance Segmentation and Object Detection for Autonomous Driving

l  View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis

l  Collaborative Distillation for Ultra-Resolution Universal Style Transfer

l  TomoFluid: Reconstructing Dynamic Fluid From Sparse View Videos

l  Instance Shadow Detection

l  Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image

l  Discrete Model Compression With Resource Constraint for Deep Neural Networks

l  Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

l  End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds

l  Minimal Solutions for Relative Pose With a Single Affine Correspondence

l  Point Cloud Completion by Skip-Attention Network With Hierarchical Folding

l  Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement

l  AANet: Adaptive Aggregation Network for Efficient Stereo Matching

l  Towards Unified INT8 Training for Convolutional Neural Network

l  Active 3D Motion Visualization Based on Spatiotemporal Light-Ray Integration

l  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation

l  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet

l  Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration

l  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing

l  Visually Imbalanced Stereo Matching

l  Mesh-Guided Multi-View Stereo With Pyramid Architecture

l  BiDet: An Efficient Binarized Object Detector

l  Local Non-Rigid Structure-From-Motion From Diffeomorphic Mappings

l  Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar

l  APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

l  On the Acceleration of Deep Learning Model Parallelism With Staleness

l  RevealNet: Seeing Behind Objects in RGB-D Scans

l  MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning

l  StegaStamp: Invisible Hyperlinks in Physical Photographs

l  L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

l  Polarized Non-Line-of-Sight Imaging

l  AdaBits: Neural Network Quantization With Adaptive Bit-Widths

l  Multi-Scale Boosted Dehazing Network With Dense Feature Fusion

l  ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

l  Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

l  Normal Assisted Stereo Depth Estimation

l  Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach

l  gDLS*: Generalized Pose-and-Scale Estimation Given Scale and Gravity Priors

l  Embodied Language Grounding With 3D Visual Feature Representations

l  Learning to Autofocus

l  Joint Demosaicing and Denoising With Self Guidance

l  Forward and Backward Information Retention for Accurate Binary Neural Networks

l  Light Field Spatial Super-Resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

l  A Multi-Hypothesis Approach to Color Constancy

l  Learning to Restore Low-Light Images via Decomposition-and-Enhancement

l  Background Matting: The World Is Your Green Screen

l  Supervised Raw Video Denoising With a Benchmark Dataset on Dynamic Scenes

l  Photometric Stereo via Discrete Hypothesis-and-Test Search

l  Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

l  Fixed-Point Back-Propagation Training

l  Heterogeneous Knowledge Distillation Using Information Flow Modeling

l  Rethinking Differentiable Search for Mixed-Precision Neural Networks

l  Residual Feature Aggregation Network for Image Super-Resolution

l  Resolution Adaptive Networks for Efficient Inference

l  Learning to Forget for Meta-Learning

l  Deep Learning for Handling Kernel/model Uncertainty in Image Deconvolution

l  Reflection Scene Separation From a Single Image

l  Wavelet Synthesis Net for Disparity Estimation to Synthesize DSLR Calibre Bokeh Effect on Smartphones

l  Bundle Adjustment on a Graph Processor

l  3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset

l  PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

l  Scalability in Perception for Autonomous Driving: Waymo Open Dataset

 

3D From a Single Image and Shape-From-X (2); 3D From Multiview and Sensors (2)

上記ですでに出ている内容のパート2です。(oral)

l  Extreme Relative Pose Network Under Hybrid Representations

l  Single-Shot Monocular RGB-D Imaging Using Uneven Double Refraction

l  Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

l  3D Packing for Self-Supervised Monocular Depth Estimation

l  Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

l  From Two Rolling Shutters to One Global Shutter

l  Deep Global Registration

l  Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness

l  Why Having 10,000 Parameters in Your Camera Model Is Better Than Twelve

l  Blur Aware Calibration of Multi-Focus Plenoptic Camera

l  Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields

l  SAL: Sign Agnostic Learning of Shapes From Raw Data

l  Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

 

Image Retrieval; Datasets and Evaluation

画像検索や人物検索といった技術のデータセットの作成や評価法についてです。すなわち、新しいデータセットが発表されたという認識でOKです。分野は様々で、3D頭蓋内動脈瘤データセットやファッション検索用のデータセットなど、論文内で新しいモデルを提案していても、データセットの評価や作成が含まれるものがここになります。(oral)

l  Instance Guided Proposal Network for Person Search

l  Which Is Plagiarism: Fashion Image Retrieval Based on Regional Representation for Design Protection

l  Inter-Task Association Critic for Cross-Resolution Person Re-Identification

l  FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding

l  Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition

l  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

l  Rethinking Computer-Aided Tuberculosis Diagnosis

l  IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning

l  Revisiting Saliency Metrics: Farthest-Neighbor Area Under Curve

l  Computing the Testing Error Without a Testing Set

l  Improving Confidence Estimates for Unfamiliar Examples

l  CycleISP: Real Image Restoration via Improved Data Synthesis

 

Low-Level and Physics-Based Vision

劣化したデータやノイズの多い画像に関する論文です。復元やノイズ除去が多いです。(oral)

l  Enhanced Blind Face Restoration With Multi-Exemplar Images and Adaptive Spatial Feature Fusion

l  Explorable Super Resolution

l  Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes

l  Deblurring by Realistic Blurring

l  Bringing Old Photos Back to Life

l  A Physics-Based Noise Formation Model for Extreme Low-Light Raw Denoising

l  Learning to Super Resolve Intensity Images From Events

l  Camouflaged Object Detection

l  Holistically-Attracted Wireframe Parsing

 

3D From a Single Image and Shape-From-X; 3D From Multiview and Sensors; Image Retrieval; Datasets and Evaluation; Low-Level and Physics-Based Vision

ここからが上記の4つのセッションをまとめたポスターの内容になります。

l  Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction

l  Domain Adaptation for Image Dehazing

l  Auto-Encoding Twin-Bottleneck Hashing

l  Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

l  Bi-Directional Interaction Network for Person Search

l  Meshlet Priors for 3D Mesh Reconstruction

l  Space-Time-Aware Multi-Resolution Video Enhancement

l  FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

l  MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation

l  DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

l  Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification

l  Online Joint Multi-Metric Adaptation From Frequent Sharing-Subset Mining for Person Re-Identification

l  Taking a Deeper Look at Co-Salient Object Detection

l  Single-Stage 6D Object Pose Estimation

l  OccuSeg: Occupancy-Aware 3D Instance Segmentation

l  Camera Trace Erasing

l  Deep Metric Learning via Adaptive Learnable Assessment

l  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective

l  Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention

l  HUMBI: A Large Multiview Dataset of Human Body Expressions

l  Image Search With Text Feedback by Visiolinguistic Attention Learning

l  Image Processing Using Multi-Code GAN Prior

l  What Does Plate Glass Reveal About Camera Calibration?

l  Zero-Assignment Constraint for Graph Matching With Outliers

l  Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

l  JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

l  From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement

l  Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution

l  Central Similarity Quantization for Efficient Image and Video Retrieval

l  ARCH: Animatable Reconstruction of Clothed Humans

l  A Model-Driven Deep Neural Network for Single Image Rain Removal

l  Novel Object Viewpoint Estimation Through Reconstruction Alignment

l  Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

l  Evaluating Weakly Supervised Object Localization Methods Right

l  Style Normalization and Restitution for Generalizable Person Re-Identification

l  Reconstruct Locally, Localize Globally: A Model Free Method for Object Pose Estimation

l  RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

l  All in One Bad Weather Removal Using Architectural Search

l  Relation-Aware Global Attention for Person Re-Identification

l  HOnnotate: A Method for 3D Annotation of Hand and Object Poses

l  Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics

l  Deep Unfolding Network for Image Super-Resolution

l  On the Uncertainty of Self-Supervised Monocular Depth Estimation

l  Proxy Anchor Loss for Deep Metric Learning

l  Unsupervised Learning for Intrinsic Image Decomposition From a Single Image

l  Multi-Domain Learning for Accurate and Few-Shot Color Constancy

l  PANDA: A Gigapixel-Level Human-Centric Video Dataset

l  Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS

l  Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification

l  Salience-Guided Cascaded Suppression Network for Person Re-Identification

l  Fashion Outfit Complementary Item Retrieval

l  Learning Event-Based Motion Deblurring

l  Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

l  Neural Blind Deconvolution Using Deep Priors

l  Anisotropic Convolutional Networks for 3D Semantic Scene Completion

l  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

l  Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

l  Fast MSER

l  Unsupervised Person Re-Identification via Softened Similarity Learning

l  COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification

l  Learning Formation of Physically-Based Face Attributes

l  Generalized Product Quantization Network for Semi-Supervised Image Retrieval

l  Stereoscopic Flash and No-Flash Photography for Shape and Albedo Recovery

l  Context-Aware Group Captioning via Self-Attention and Contrastive Features

l  MEBOW: Monocular Estimation of Body Orientation in the Wild

l  Distilling Image Dehazing With Heterogeneous Task Imitation

l  Select, Supplement and Focus for RGB-D Saliency Detection

l  Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization

l  On Joint Estimation of Pose, Geometry and svBRDF From a Handheld Scanner

l  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision

l  Meta-Transfer Learning for Zero-Shot Super-Resolution

l  Solving Jigsaw Puzzles With Eroded Boundaries

l  Context-Aware Attention Network for Image-Text Retrieval

l  M-LVC: Multiple Frames Prediction for Learned Video Compression

l  Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training

l  Single Image Reflection Removal Through Cascaded Refinement

l  From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality

l  Video to Events: Recycling Video Datasets for Event Cameras

l  Composed Query Image Retrieval Using Locally Bounded Features

l  Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring

l  End-to-End Illuminant Estimation Based on Deep Metric Learning

l  Variational-EM-Based Deep Learning for Noise-Blind Image Deblurring

l  Image Demoireing with Learnable Bandpass Filters

l  Assessing Image Quality Issues for Real-World Problems

l  Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

l  Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network

l  Perceptual Quality Assessment of Smartphone Photography

l  Don't Hit Me! Glass Detection in Real-World Scenes

l  Progressive Mirror Detection

 

Scene Analysis and Understanding

シーングラフ生成(SGG)に関する研究やデータに限りがある際に合成データを使用し、その後、実データで学習することでギャップを改善する研究があるが、それは実分布のギャップを評価していないので評価することでここを理解しよう(Understanding)と言った研究内容の論文です。(oral)

l  Category-Level Articulated Object Pose Estimation

l  Unbiased Scene Graph Generation From Biased Training

l  Dynamic Graph Message Passing Networks

l  Weakly Supervised Visual Semantic Parsing

l  GPS-Net: Graph Property Sensing Network for Scene Graph Generation

l  End-to-End Optimization of Scene Layout

l  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision

l  Dual Super-Resolution Learning for Semantic Segmentation

l  Self-Supervised Scene De-Occlusion

l  BANet: Bidirectional Aggregation Network With Occlusion Handling for Panoptic Segmentation

 

Medical, Biological and Cell Microscopy

医療関係の論文です。例えば、顕微鏡におけるギガピクセルも持つwhole slide imagesの処理をすることが近年行われているが、医師と同じで、怪しい部分だけ使用することが正しいということを主張している研究と言った医療関係の方はここを見ればいいですね。(oral)

l  CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

l  Cross-View Correspondence Reasoning Based on Bipartite Graph Convolutional Network for Mammogram Mass Detection

l  MPM: Joint Representation of Motion and Position Map for Cell Tracking

l  Deep Distance Transform for Tubular Structure Segmentation in CT Scans

l  Instance Segmentation of Biological Images Using Harmonic Embeddings

l  Multi-scale Domain-adversarial Multiple-instance CNN for Cancer Subtype Classification with Unannotated Histopathological Images

l  SOS: Selective Objective Switch for Rapid Immunofluorescence Whole Slide Image Classification

 

Transfer/Low-Shot/Semi/Unsupervised Learning (1)

Transfer/Low-Shot/Semi/Unsupervised Learningがまとまっています。かなりビックリするものもありますのでここから覗いてみてもいいかもしれませんね。(oral)

l  Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density Networks

l  METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos

l  Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

l  Revisiting Knowledge Distillation via Label Smoothing Regularization

l  WCP: Worst-Case Perturbations for Semi-Supervised Deep Learning

l  DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

l  Conditional Channel Gated Networks for Task-Aware Continual Learning

l  Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations

 

Scene Analysis and Understanding; Medical, Biological and Cell Microscopy; Transfer/Low-Shot/Semi/Unsupervised Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

l  Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions

l  Self-Supervised Viewpoint Learning From Image Collections

l  Two-Shot Spatially-Varying BRDF and Shape Estimation

l  Variational Context-Deformable ConvNets for Indoor Scene Parsing

l  Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

l  Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector

l  What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation

l  ADINet: Attribute Driven Incremental Network for Retinal Image Classification

l  Unsupervised Domain Adaptation With Hierarchical Gradient Synchronization

l  Deep Grouping Model for Unified Perceptual Parsing

l  Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

l  Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging

l  FDA: Fourier Domain Adaptation for Semantic Segmentation

l  Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery

l  When2com: Multi-Agent Perception via Communication Graph Grouping

l  Learning Human-Object Interaction Detection Using Interaction Points

l  C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

l  Adaptive Subspaces for Few-Shot Learning

l  Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection

l  Stochastic Sparse Subspace Clustering

l  CRNet: Cross-Reference Networks for Few-Shot Segmentation

l  Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data

l  Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings

l  3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior

l  Graph-Guided Architecture Search for Real-Time Semantic Segmentation

l  Composing Good Shots by Exploiting Mutual Relations

l  Organ at Risk Segmentation for Head and Neck Cancer Using Stratified Learning and Neural Architecture Search

l  G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features

l  Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-Weighting

l  Single-Stage Semantic Segmentation From Image Labels

l  Cascaded Human-Object Interaction Recognition

l  DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction With Deep T1 Prior

l  Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation

l  FPConv: Learning Local Flattening for Point Convolution

l  Rotation Equivariant Graph Convolutional Network for Spherical Image Classification

l  FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation

l  ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

l  Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer

l  Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

l  Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior

l  Structure Preserving Generative Cross-Domain Learning

l  Reverse Perspective Network for Perspective-Aware Object Counting

l  Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

l  Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation

l  ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes

l  Understanding Road Layout From Videos as a Whole

l  Bi-Directional Relationship Inferring Network for Referring Image Segmentation

l  Perspective Plane Program Induction From a Single Image

l  DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration

l  Semi-Supervised Learning for Few-Shot Image-to-Image Translation

l  Semantic Correspondence as an Optimal Transport Problem

l  How Much Time Do You Have? Modeling Multi-Duration Saliency

l  Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention

l  Online Depth Learning Against Forgetting in Monocular Videos

l  Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation

l  Pattern-Structure Diffusion for Multi-Task Learning

l  Training Noise-Robust Deep Neural Networks via Meta-Learning

l  Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

l  Universal Source-Free Domain Adaptation

l  Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

l  Varicolored Image De-Hazing

l  SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds

l  Separating Particulate Matter From a Single Microscopic Image

l  Adaptive Dilated Network With Self-Correction Supervision for Counting

l  PointPainting: Sequential Fusion for 3D Object Detection

l  Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

l  Learning to Select Base Classes for Few-Shot Classification

l  CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

l  Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks

l  Distilled Semantics for Comprehensive Scene Understanding from Videos

l  Modeling Biological Immunity to Adversarial Examples

l  DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-Move Forgery Detection and Localization

l  Correspondence-Free Material Reconstruction using Sparse Surface Constraints

l  Augmenting Colonoscopy Using Extended and Directional CycleGAN for Lossy Image Translation

l  Attention Scaling for Crowd Counting

l  Shape Reconstruction by Learning Differentiable Surface Representations

l  A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image

l  Attention-Based Context Aware Reasoning for Situation Recognition

l  PatchVAE: Learning Local Latent Codes for Recognition

l  Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

l  STAViS: Spatio-Temporal AudioVisual Saliency Network

l  More Grounded Image Captioning by Distilling Image-Text Matching Model

l  DUNIT: Detection-Based Unsupervised Image-to-Image Translation

l  Learning to Observe: Approximating Human Perceptual Thresholds for Detection of Suprathreshold Image Transformations

l  Show, Edit and Tell: A Framework for Editing Image Captions

l  Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary

l  Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers

l  Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

l  Extremely Dense Point Correspondences Using a Learned Feature Descriptor

3D From Multiview and Sensors (3)

3D From Multiview and Sensors の最後のoralになります。あとは下にポスターがあり、3D From Multiview and Sensors に興味がある人はそこまで頑張りましょう。(oral)

l  Local Deep Implicit Functions for 3D Shape

l  PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

l  Cost Volume Pyramid Based Depth Inference for Multi-View Stereo

l  RoutedFusion: Learning Real-Time Depth Map Fusion

l  VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals

l  Learning to Optimize Non-Rigid Tracking

l  KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering

l  Information-Driven Direct RGB-D Odometry

l  SuperGlue: Learning Feature Matching With Graph Neural Networks

l  Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

 

Face, Gesture, and Body Pose (1)

顔や体に関する論文です。顔のセキュリティであるなりすましに関する研究や顔の3Dなどもここに含まれます。顔認証技術や企業でそう言った内容をやられている方にはオススメです。(oral)

l  ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction

l  EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera

l  Cross-Modal Deep Face Normals With Deactivable Skip Connections

l  Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild

l  Face X-Ray for More General Face Forgery Detection

l  A Morphable Face Albedo Model

l  Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses

l  GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes

l  Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing

l  DeepCap: Monocular Human Performance Capture Using Weak Supervision

l  Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

l  Advancing High Fidelity Identity Swapping for Forgery Detection

 

Image and Video Synthesis (1)

”Synthesis(合成)”にメインが置かれています。GANに関する研究やGANじゃないもの、とにかく新しく何かを合成している論文です。(oral)

l  Controllable Person Image Synthesis With Attribute-Decomposed GAN

l  Attentive Normalization for Conditional Image Generation

l  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization

l  Blurry Video Frame Interpolation

l  Learning Physics-Guided Face Relighting Under Directional Light

l  Disentangled Image Generation Through Structured Noise Injection

l  Cross-Domain Correspondence Learning for Exemplar-Based Image Translation

l  Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning

l  Single Image Reflection Removal With Physically-Based Training Images

l  SketchyCOCO: Image Generation From Freehand Scene Sketches

l  Image Based Virtual Try-On Network From Unpaired Data

l  PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

 

3D From Multiview and Sensors; Face, Gesture, and Body Pose; Image and Video Synthesis

ここからが上記の2つのセッションをまとめたポスターの内容になります。

l  RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild

l  Semantic Image Manipulation Using Scene Graphs

l  A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

l  Transferring Dense Pose to Proximal Animal Classes

l  Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild

l  VIBE: Video Inference for Human Body Pose and Shape Estimation

l  G3AN: Disentangling Appearance and Motion for Video Generation

l  Domain Adaptive Image-to-Image Translation

l  GAN Compression: Efficient Architectures for Interactive Conditional GANs

l  Searching Central Difference Convolutional Networks for Face Anti-Spoofing

l  TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

l  AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

l  FReeNet: Multi-Identity Face Reenactment

l  Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera

l  Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data

l  The GAN That Warped: Semantic Attribute Editing With Unpaired Data

l  4D Visualization of Dynamic Events From Unconstrained Multi-View Videos

l  Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

l  HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

l  Detecting Attended Visual Targets in Video

l  Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

l  Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool

l  Neural Contours: Learning to Draw Lines From 3D Shapes

l  Softmax Splatting for Video Frame Interpolation

l  CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

l  Probabilistic Structural Latent Representation for Unsupervised Embedding

l  Semantically Multi-Modal Image Synthesis

l  Nested Scale-Editing for Conditional Image Synthesis

l  UnrealText: Synthesizing Realistic Scene Text Images From the Unreal World

l  Fast Texture Synthesis via Pseudo Optimizer

l  Towards Learning Structure via Consensus for Face Segmentation and Parsing

l  CookGAN: Causality Based Text-to-Image Synthesis

l  Weakly Supervised Discriminative Feature Learning With State Information for Person Identification

l  Future Video Synthesis With Object Motion Prediction

l  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

l  A Graduated Filter Method for Large Scale Robust Estimation

l  Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation

l  Coherent Reconstruction of Multiple Humans From a Single Image

l  PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling

l  A Neural Rendering Framework for Free-Viewpoint Relighting

l  A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection

l  GroupFace: Learning Latent Groups and Constructing Group-Based Representations for Face Recognition

l  Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution

l  Time Flies: Animating a Still Image With Time-Lapse Video As Reference

l  SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness

l  Grid-GCN for Fast and Scalable Point Cloud Learning

l  Domain Balancing: Face Recognition on Long-Tailed Domains

l  AdversarialNAS: Adversarial Neural Architecture Search for GANs

l  Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

l  The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation

l  Data Uncertainty Learning in Face Recognition

l  Regularizing Discriminative Capability of CGANs for Semi-Supervised Generative Learning

l  FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification

l  UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation

l  Decoupled Representation Learning for Skeleton-Based Gesture Recognition

l  An Efficient PointLSTM for Point Clouds Based Gesture Recognition

l  Editing in Style: Uncovering the Local Semantics of GANs

l  On the Detection of Digital Face Manipulation

l  Learning Texture Transformer Network for Image Super-Resolution

l  Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence

l  Deblurring Using Analysis-Synthesis Networks Pair

l  Exploring Unlabeled Faces for Novel Attribute Discovery

l  Neural Pose Transfer by Spatially Adaptive Instance Normalization

l  Fine-Grained Image-to-Image Transformation Towards Visual Recognition

l  Deep Facial Non-Rigid Multi-View Stereo

l  Attention-Driven Cropping for Very High Resolution Facial Landmark Detection

l  Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

l  End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

l  Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks

l  CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

l  Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

l  One-Shot Domain Adaptation for Face Generation

l  BidNet: Binocular Image Dehazing Without Explicit Disparity Estimation

l  Deep Shutter Unrolling Network

l  Joint Texture and Geometry Optimization for RGB-D Reconstruction

l  Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images

l  Auto-Tuning Structured Light by Optical Stochastic Gradient Descent

l  MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction

l  Uncertainty Based Camera Model Selection

l  Local Implicit Grid Representations for 3D Scenes

l  TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell

l  Averaging Essential and Fundamental Matrices in Collinear Camera Settings

l  On the Distribution of Minima in Intrinsic-Metric Rotation Averaging

l  Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation

l  A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset

l  Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation

l  Effectively Unbiased FID and Inception Score and Where to Find Them

l  Robust Homography Estimation via Dual Principal Component Pursuit

l  Non-Adversarial Video Synthesis With Learned Priors

l  Uncertainty-Aware Mesh Decoder for High Fidelity 3D Face Reconstruction

 

Face, Gesture, and Body Pose (2)

上記ですでに出ているFace, Gesture, and Body Poseのパート2です。(oral)

l  3FabRec: Fast Few-Shot Face Alignment by Reconstruction

l  Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects

l  Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition

l  StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images

l  Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis

l  Learning Meta Face Recognition in Unseen Domains

l  Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data

l  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models

l  Generating 3D People in Scenes Without People

l  Transferring Cross-Domain Knowledge for Video Sign Language Recognition

l  Bodies at Rest: 3D Human Pose and Shape Estimation From a Pressure Image Using Synthetic Data

l  Bayesian Adversarial Human Motion Synthesis

 

Motion and Tracking (1)

動きた追跡に関する論文です。(oral)

l  LSM: Learning Subspace Minimization for Low-Level Vision

l  Learning a Neural Solver for Multiple Object Tracking

l  GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences

l  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking

l  MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask

l  Tracking by Instance Detection: A Meta-Learning Approach

l  High-Performance Long-Term Tracking With Meta-Updater

l  TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

l  Collaborative Motion Prediction via Neural Motion Message Passing

l  P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

l  Self-Supervised Deep Visual Odometry With Online Adaptation

l Globally Optimal Contrast Maximisation for Event-Based Motion Estimation

Representation Learning

Representationに関する論文です。(oral)

l  D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

l  Towards Backward-Compatible Representation Learning

l  PointAugment: An Auto-Augmentation Framework for Point Cloud Classification

l  Cross-Batch Memory for Embedding Learning

l  Circle Loss: A Unified Perspective of Pair Similarity Optimization

l  Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

l  Hyperbolic Image Embeddings

l  Controllable Orthogonalization in Training DNNs

l  An Investigation Into the Stochasticity of Batch Whitening

 

Face, Gesture, and Body Pose; Motion and Tracking; Representation Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

l  Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

l  Learning to Dress 3D People in Generative Clothing

l  MAST: A Memory-Augmented Self-Supervised Tracker

l  Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation

l  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning

l  ClusterFit: Improving Generalization of Visual Representations

l  Learning Dynamic Relationships for 3D Human Motion Prediction

l  Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge

l  S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

l  Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

l  Learning to Manipulate Individual Objects in an Image

l  PADS: Policy-Adapted Sampling for Visual Similarity Learning

l  Siam R-CNN: Visual Tracking by Re-Detection

l  ASLFeat: Learning Local Features of Accurate Shape and Localization

l  Filter Grafting for Deep Neural Networks

l  HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation

l  DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

l  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement

l  Learning Better Lossless Compression Using Lossy Compression

l  Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching

l  Multi-Scale Fusion Subspace Clustering Using Similarity Constraint

l  Siamese Box Adaptive Network for Visual Tracking

l  Cross-Domain Face Presentation Attack Detection via Multi-Domain Disentangled Representation Learning

l  Online Deep Clustering for Unsupervised Representation Learning

l  Density-Aware Feature Embedding for Face Clustering

l  Self-Supervised Learning of Pretext-Invariant Representations

l  ROAM: Recurrently Optimizing Tracking Model

l  Deformable Siamese Attention Networks for Visual Object Tracking

l  15 Keypoints Is All You Need

l  Optical Flow in the Dark

l  Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt

l  A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

l  Sub-Frame Appearance and 6D Pose Estimation of Fast Moving Objects

l  How to Train Your Deep Multi-Object Tracker

l  TPNet: Trajectory Proposal Network for Motion Prediction

l  Large Scale Video Representation Learning via Relational Graph Clustering

l  Towards Universal Representation Learning for Deep Face Recognition

l  Robust Partial Matching for Person Search in the Wild

l  Correlation-Guided Attention for Corner Detection Based Visual Tracking

l  Learning Multi-Object Tracking and Segmentation From Automatic Annotations

l  PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation

l  Rotation Consistent Margin Loss for Efficient Low-Bit Face Recognition

l  Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

l  Unity Style Transfer for Person Re-Identification

l  Suppressing Uncertainties for Large-Scale Facial Expression Recognition

l  Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation

l  Regularizing Neural Networks via Minimizing Hyperspherical Energy

l  Learning Representations by Predicting Bags of Visual Words

l  AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

l  A Transductive Approach for Video Object Segmentation

l  Dynamic Face Video Segmentation via Reinforcement Learning

l  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

l  Semantic Drift Compensation for Class-Incremental Learning

l  Context-Aware Human Motion Prediction

l  DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data

l  Optical Non-Line-of-Sight Physics-Based 3D Human Pose Estimation

l  Learning to Transfer Texture From Clothing Images to 3D Humans

l  UniPose: Unified Human Pose Estimation in Single Images and Videos

l  Minimal Solutions to Relative Pose Estimation From Two Views Sharing a Common Direction With Unknown Focal Length

l  3D Human Mesh Regression With Dense Correspondence

l  Cross-Modal Pattern-Propagation for RGB-T Tracking

l  Distilling Knowledge From Graph Convolutional Networks

l  Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment

l  Distribution-Aware Coordinate Representation for Human Pose Estimation

l  Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification

l  HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

l  Determinant Regularization for Gradient-Efficient Graph Matching

l  D3S - A Discriminative Single Shot Segmentation Tracker

l  MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction

l  End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

l  GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-Wise Transformations

l  Can Facial Pose and Expression Be Separated With Weak Perspective Camera?

l  Probabilistic Regression for Visual Tracking

l  3DRegNet: A Deep Neural Network for 3D Point Registration

l  Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

l  Three-Dimensional Reconstruction of Human Interactions

l  Distribution-Induced Bidirectional Generative Adversarial Network for Graph Representation Learning

l  Minimal Solvers for 3D Scan Alignment With Pairs of Intersecting Lines

l  Wavelet Integrated CNNs for Noise-Robust Image Classification

l  Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning

l  PropagationNet: Propagate Points to Curve to Learn Structure Information

l  Sequential 3D Human Pose and Shape Estimation From Point Clouds

l  Improving the Robustness of Capsule Networks to Image Affine Transformations

l  Noise Modeling, Synthesis and Classification for Generic Object Anti-Spoofing

l  Quaternion Product Units for Deep Learning on 3D Rotation Groups

l  Unsupervised Representation Learning for Gaze Estimation

l  P-nets: Deep Polynomial Neural Networks

l  Hierarchically Robust Representation Learning

l  How Useful Is Self-Supervised Pretraining for Visual Tasks?

Face, Gesture, and Body Pose (3); Motion and Tracking (2)

Face, Gesture, and Body Poseの(3)とMotion and Tracking(2)です。(oral)

l  Copy and Paste GAN: Face Hallucination From Shaded Thumbnails

l  TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style

l  Object-Occluded Human Shape and Pose Estimation From a Single Color Image

l  Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

l  Self-Supervised Monocular Scene Flow Estimation

l  Learning Fast and Robust Target Models for Video Object Segmentation

l  Reciprocal Learning Networks for Human Trajectory Prediction

l  Nonparametric Object and Parts Modeling With Lie Group Dynamics

 

Image and Video Synthesis (2); Neural Generative Models

合成に関する内容ですが少し内容がNeural Generative Modelsにメインが置かれています。(oral)

l  Learning to Shadow Hand-Drawn Sketches

l  Intuitive, Interactive Beard and Hair Synthesis With Generative Models

l  Semantic Pyramid for Image Generation

l  SynSin: End-to-End View Synthesis From a Single Image

l  A Characteristic Function Approach to Deep Implicit Generative Modeling

l  High-Resolution Daytime Translation Without Domain Labels

l  Leveraging 2D Data to Learn Textured 3D Mesh Generation

l  Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

l  Flow Contrastive Estimation of Energy-Based Models

 

Optimization and Learning Methods

”Optimization(最適化)”にメインが置かれています。(oral)

l  Hardware-in-the-Loop End-to-End Optimization of Camera Image Processing Pipelines

l  Search to Distill: Pearls Are Everywhere but Not the Eyes

l  Total Deep Variation for Linear Inverse Problems

l  Relative Interior Rule in Block-Coordinate Descent

l  Learning Combinatorial Solver for Graph Matching

l  SampleNet: Differentiable Point Cloud Sampling

l  Can We Learn Heuristics for Graphical Model Inference Using Reinforcement Learning?

l  Quasi-Newton Solver for Robust Non-Rigid Registration

l  Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective

l  Optimizing Rank-Based Metrics With Blackbox Differentiation

 

Face, Gesture, and Body Pose; Motion and Tracking; Image and Video Synthesis; Nearal Generative Models; Optimization and Learning Methods

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  DualSDF: Semantic Shape Manipulation Using a Two-Level Representation

l  Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

l  Deep Homography Estimation for Dynamic Scenes

l  PF-Net: Point Fractal Network for 3D Point Cloud Completion

l  On the Regularization Properties of Structured Dropout

l  Learning Oracle Attention for High-Fidelity Face Completion

l  Deep Image Spatial Transformation for Person Image Generation

l  Learning to Optimize on SPD Manifolds

l  Deep 3D Portrait From a Single Image

l  RDCFace: Radial Distortion Correction for Face Recognition

l  Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition

l  MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis

l  SAINT: Spatially Aware Interpolation NeTwork for Medical Slice Synthesis

l  Recurrent Feature Reasoning for Image Inpainting

l  Structure-Preserving Super Resolution With Gradient Guidance

l  Epipolar Transformers

l  Diversified Arbitrary Style Transfer via Deep Feature Perturbation

l  MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks

l  Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

l  Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds

l  Neural Point Cloud Rendering via Multi-Plane Projection

l  Wish You Were Here: Context-Aware Human Generation

l  Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content

l  Breaking the Cycle - Colleagues Are All You Need

l  Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

l  ManiGAN: Text-Guided Image Manipulation

l  Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions

l  Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems

l  Barycenters of Natural Images Constrained Wasserstein Barycenters for Image Morphing

l  Guided Variational Autoencoder for Disentanglement Learning

l  Cross-Spectral Face Hallucination via Disentangling Independent Factors

l  Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules

l  C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

l  Cogradient Descent for Bilinear Optimization

l  Instance-Aware Image Colorization

l  Joint Training of Variational Auto-Encoder and Latent Energy-Based Model

l  Adaptive Loss-Aware Quantization for Multi-Bit Networks

l  ScopeFlow: Dynamic Scene Scoping for Optical Flow

l  Video Super-Resolution With Temporal Group Attention

l  Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

l  3D Photography Using Context-Aware Layered Depth Inpainting

l  MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

l  Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer

l  Global Texture Enhancement for Fake Face Detection in the Wild

l  Panoptic-Based Image Synthesis

l  Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

l  Learning to Cartoonize Using White-Box Cartoon Representations

l  End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization

l  Analyzing and Improving the Image Quality of StyleGAN

l  Fashion Editing With Adversarial Parsing Learning

l  Augment Your Batch: Improving Generalization Through Instance Repetition

l  ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes

l  An End-to-End Edge Aggregation Network for Moving Object Segmentation

l  Learning Video Stabilization Using Optical Flow

l  Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

l  Robust Design of Deep Neural Networks Against Adversarial Attacks Based on Lyapunov Theory

l  StarGAN v2: Diverse Image Synthesis for Multiple Domains

l  Warping Residual Based Image Stitching for Large Parallax

l  A U-Net Based Discriminator for Generative Adversarial Networks

l  Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping

l  When to Use Convolutional Neural Networks for Inverse Problems

l  LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

l  Affinity Graph Supervision for Visual Recognition

l  Unsupervised Magnification of Posture Deviations Across Subjects

l  Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network

l  Fast Soft Color Segmentation

l  Global Optimality for Point Set Registration Using Semidefinite Programming

l  Image2StyleGAN++: How to Edit the Embedded Images?

l  SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking

l  EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning

l  Hierarchical Pyramid Diverse Attention Networks for Face Recognition

l  RGBD-Dog: Predicting Canine Pose from RGBD Sensors

l  Multi-Scale Progressive Fusion Network for Single Image Deraining

l  Learning a Neural 3D Texture Space From 2D Exemplars

l  BachGAN: High-Resolution Image Synthesis From Salient Object Layout

l  Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy

l  On Positive-Unlabeled Classification in GAN

l  DoveNet: Deep Image Harmonization via Domain Verification

l  Noise Robust Generative Adversarial Networks

l  Normalizing Flows With Multi-Scale Autoregressive Priors

l  Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution

l  Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

l  GeoDA: A Geometric Framework for Black-Box Adversarial Attacks

l  GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data

l  An Internal Covariate Shift Bounding Algorithm for Deep Neural Networks by Unitizing Layers' Outputs

l  A Unified Optimization Framework for Low-Rank Inducing Penalties

l  Single-Side Domain Generalization for Face Anti-Spoofing

l  The Knowledge Within: Methods for Data-Free Model Compression

l  Scale-Space Flow for End-to-End Optimized Video Compression

l  Dynamic Neural Relational Inference

 

Segmentation, Grouping and Shape (1)

Unetなど、segmentationに関する研究です。(oral)

l  Real-Time Panoptic Segmentation From Dense Detections

l  Deep Snake for Real-Time Instance Segmentation

l  AdaCoSeg: Adaptive Shape Co-Segmentation With Group Consistency Loss

l  Learning Dynamic Routing for Semantic Segmentation

l  Boosting Semantic Human Matting With Coarse Annotations

l  BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

l  UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders

l  Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence

l  Deep Polarization Cues for Transparent Object Segmentation

l  DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes

l  F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

l  Approximating shapes in images with low-complexity polygons

 

Explainable AI; Fairness, Accountability, Transparency and Ethics in Vision

AIの説明性や公平性、倫理と言った部分の論文です。(oral)

l  Towards Visually Explaining Variational Autoencoders

l  Towards Global Explanations of Convolutional Neural Networks With Concept Attribution

l  Interpretable and Accurate Fine-grained Recognition via Region Grouping

l  SAM: The Sensitivity of Attribution Methods to Hyperparameters

l  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks

l  CNN-Generated Images Are Surprisingly Easy to Spot... for Now

l  FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions

 

Transfer/Low-Shot/Semi/Unsupervised Learning (2)

Transfer/Low-Shot/Semi/Unsupervised Learningに関する(2)です。(oral)

l  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion

l  Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

l  HyperSTAR: Task-Aware Hyperparameters for Deep Networks

l  ActBERT: Learning Global-Local Video-Text Representations

l  State-Relabeling Adversarial Active Learning

l  Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization

l  A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning

l  Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos

 

Segmentaiton, Grouping and Shape; Explainable AI; Fairness, Accountability, Transparency and Ethics in Vision; Transfer/Low-Shot/Semi/Unsupervised Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  Few-Shot Open-Set Recognition Using Meta-Learning

l  Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions

l  Temporally Distributed Networks for Fast Video Semantic Segmentation

l  Benchmarking the Robustness of Semantic Segmentation Models

l  There and Back Again: Revisiting Backpropagation Saliency Methods

l  Deep Semantic Clustering by Partition Confidence Maximisation

l  StructEdit: Learning Structural Shape Variations

l  Harmonizing Transferability and Discriminability for Adapting Object Detectors

l  Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching

l  CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

l  Correlating Edge, Pose With Parsing

l  VecRoad: Point-Based Iterative Graph Exploration for Road Graphs Extraction

l  Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

l  Hierarchical Human Parsing With Typed Part-Relation Reasoning

l  Compositional Convolutional Neural Networks: A Deep Architecture With Innate Robustness to Partial Occlusion

l  Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

l  Learning Video Object Segmentation From Unlabeled Videos

l  Part-Aware Context Network for Human Parsing

l  SCOUT: Self-Aware Discriminant Counterfactual Explanations

l  Weakly-Supervised Semantic Segmentation via Sub-Category Exploration

l  Continual Learning With Extended Kronecker-Factored Approximate Curvature

l  Phase Consistent Ecological Domain Adaptation

l  AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification

l  3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation

l  Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

l  Adaptive Graph Convolutional Network With Attention Graph Clustering for Co-Saliency Detection

l  A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection

l  Deep Fair Clustering for Visual Learning

l  Bidirectional Graph Reasoning Network for Panoptic Segmentation

l  Exploit Clues From Views: Self-Supervised and Regularized Learning for Multiview Object Recognition

l  Spherical Space Domain Adaptation With Robust Pseudo-Label Loss

l  Stochastic Classifiers for Unsupervised Domain Adaptation

l  Unsupervised Learning of Intrinsic Structural Representation Points

l  PolyTransform: Deep Polygon Transformer for Instance Segmentation

l  Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection

l  Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet

l  LT-Net: Label Transfer by Learning Reversible Voxel-Wise Correspondence for One-Shot Medical Image Segmentation

l  FGN: Fully Guided Network for Few-Shot Instance Segmentation

l  A Quantum Computational Approach to Correspondence Problems on Point Sets

l  Data-Efficient Semi-Supervised Learning by Reliable Edge Mining

l  NestedVAE: Isolating Common Factors via Weak Supervision

l  Progressive Adversarial Networks for Fine-Grained Domain Adaptation

l  A Disentangling Invertible Interpretation Network for Explaining Latent Representations

l  Modeling the Background for Incremental Learning in Semantic Segmentation

l  Interpreting the Latent Space of GANs for Semantic Face Editing

l  Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

l  Self-Learning With Rectification Strategy for Human Parsing

l  Hyperbolic Visual Embedding Learning for Zero-Shot Recognition

l  Sequential Mastery of Multiple Visual Tasks: Networks Naturally Learn to Learn and Forget to Forget

l  Distilling Effective Supervision From Severe Label Noise

l  Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

l  CenterMask: Single Shot Instance Segmentation With Point Representation

l  Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning

l  MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images

l  DLWL: Improving Detection for Lowshot Classes With Weakly Labelled Data

l  Unsupervised Deep Shape Descriptor With Point Distribution Learning

l  Stylization-Based Architecture for Fast Deep Exemplar Colorization

l  Cars Can't Fly Up in the Sky: Improving Urban-Scene Segmentation via Height-Driven Attention Networks

l  State-Aware Tracker for Real-Time Video Object Segmentation

l  Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning

l  ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition

l  Multi-Scale Interactive Network for Salient Object Detection

l  Interactive Multi-Label CNN Learning With Partial Labels

l  ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

l  Scene-Adaptive Video Frame Interpolation via Meta-Learning

l  Action Segmentation With Joint Self-Supervised Temporal Domain Adaptation

l  Pixel Consensus Voting for Panoptic Segmentation

l  Minimizing Discrete Total Curvature for Image Processing

l  Towards Robust Image Classification Using Sequential Attention Models

l  Discovering Synchronized Subsets of Sequences: A Large Scale Solution

l  Going Deeper With Lean Point Networks

l  Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment

l  Explainable Object-Induced Action Decision for Autonomous Vehicles

l  Spatially Attentive Output Layer for Image Classification

l  Attack to Explain Deep Representation

l  Computing Valid P-Values for Image Segmentation by Selective Inference

l  Unsupervised Learning From Video With Deep Neural Embeddings

l  Partial Weight Adaptation for Robust DNN Inference

l  Probability Weighted Compact Feature for Domain Adaptive Retrieval

l  Where Does It End? - Reasoning About Hidden Surfaces by Object Intersection Constraints

l  PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

l  Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

l  Transferring and Regularizing Prediction for Semantic Segmentation

l  PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition

l  Model Adaptation: Unsupervised Domain Adaptation Without Source Data

l  Evade Deep Image Retrieval by Stashing Private Images in the Hash Space

l  Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules

l  ProAlignNet: Unsupervised Learning for Progressively Aligning Noisy Contours

l  Attribution in Scale and Space

l  Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Recognition (Detection, Categorization) (1)

認識(検出など)に関する論文です。(oral)

l  Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

l  Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels

l  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition

l  Momentum Contrast for Unsupervised Visual Representation Learning

l  Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

l  Weakly Supervised Fine-Grained Image Classification via Guassian Mixture Model Oriented Discriminative Learning

l  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection

l  Learning User Representations for Open Vocabulary Image Hashtag Prediction

l  Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval

l  Few-Shot Pill Recognition

l  PointRend: Image Segmentation As Rendering

l  ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

 

Video Analysis and Understanding

動画に関する論文です。また、Understandingですので、動画解析において今まで検討されていなかった部分の評価等を行ったという論文になります。(oral)

l  Learning Temporal Co-Attention Models for Unsupervised Video Action Localization

l  Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

l  Uncertainty-Aware Score Distribution Learning for Action Quality Assessment

l  Learning Interactions and Relationships Between Movie Characters

l  Video Panoptic Segmentation

l  Understanding Human Hands in Contact at Internet Scale

l  End-to-End Learning of Visual Representations From Uncurated Instructional Videos

l  You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

l  Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection

l  Learning to Measure the Static Friction Coefficient in Cloth Contact

l  SpeedNet: Learning the Speediness in Videos

l  Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

 

Vision & Language

視覚や言語に関する研究です。例えば、キャプション生成をユーザーの意図に沿ったキャプションを生成する手法などです。(oral)

l  Visual-Textual Capsule Routing for Text-Based Video Segmentation

l  Graph-Structured Referring Expression Reasoning in the Wild

l  Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs

l  Hierarchical Conditional Relation Networks for Video Question Answering

l  REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

l  Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

l  SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions

l  Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks

l  Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

l  Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

l  Counterfactual Vision and Language Learning

l  Iterative Context-Aware Graph Inference for Visual Dialog

l  TA-Student VQA: Multi-Agents Training by Self-Questioning

 

Recognition (Detection, Categorization); Video Analysis and Understanding; Vision + Language

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  Exploring Self-Attention for Image Recognition

l  Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

l  Improving Convolutional Networks With Self-Calibrated Convolutions

l  Modality Shifting Attention Network for Multi-Modal Video Question Answering

l  Learning to Structure an Image With Few Colors

l  On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

l  From Paris to Berlin: Discovering Fashion Style Influences Around the World

l  A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation

l  G-TAD: Sub-Graph Localization for Temporal Action Detection

l  Detailed 2D-3D Joint Representation for Human-Object Interaction

l  One-Shot Adversarial Attacks on Visual Tracking With Dual Attention

l  Rethinking Classification and Localization for Object Detection

l  Correspondence Networks With Adaptive Neighbourhood Consensus

l  Multiple Anchor Learning for Visual Object Detection

l  PhraseCut: Language-Based Image Segmentation in the Wild

l  Mask Encoding for Single Shot Instance Segmentation

l  Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs

l  Learning Unseen Concepts via Hierarchical Decomposition and Composition

l  Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification

l  In Defense of Grid Features for Visual Question Answering

l  Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation

l  Dense Regression Network for Video Grounding

l  Neural Architecture Search for Lightweight Non-Local Networks

l  Learning Saliency Propagation for Semi-Supervised Instance Segmentation

l  Speech2Action: Cross-Modal Supervision for Action Recognition

l  Normalized and Geometry-Aware Self-Attention Network for Image Captioning

l  Memory Enhanced Global-Local Aggregation for Video Object Detection

l  Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval

l  LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks

l  Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

l  VQA With No Questions-Answers Training

l  Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

l  SaccadeNet: A Fast and Accurate Object Detector

l  Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

l  Video Object Grounding Using Semantic Roles in Language Description

l  Designing Network Design Spaces

l  12-in-1: Multi-Task Vision and Language Representation Learning

l  MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

l  Listen to Look: Action Recognition by Previewing Audio

l  Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

l  Music Gesture for Visual Sound Separation

l  Referring Image Segmentation via Cross-Modal Progressive Comprehension

l  Cloth in the Wind: A Case Study of Physical Measurement Through Simulation

l  The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

l  CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection

l  PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

l  Graph Embedded Pose Clustering for Anomaly Detection

l  Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

l  Deepstrip: High-Resolution Boundary Refinement

l  Smoothing Adversarial Domain Attack and P-Memory Reconsolidation for Cross-Domain Person Re-Identification

l  Meshed-Memory Transformer for Image Captioning

l  Learning From Noisy Anchors for One-Stage Object Detection

l  Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

l  Density-Based Clustering for 3D Object Detection in Point Clouds

l  Few-Shot Video Classification via Temporal Alignment

l  Densely Connected Search Space for More Flexible Neural Architecture Search

l  Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning

l  Warp to the Future: Joint Forecasting of Features and Feature Motion

l  Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio

l  Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

l  Cross-Modal Cross-Domain Moment Alignment Network for Person Search

l  Self-Training With Noisy Student Improves ImageNet Classification

l  Learning Longterm Representations for Person Re-Identification Using Radio Signals

l  LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation

l  Learning Instance Occlusion for Panoptic Segmentation

l  Vision-Dialog Navigation by Exploring Cross-Modal Memory

l  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

l  NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

l  Visual Commonsense R-CNN

l  What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective

l  EfficientDet: Scalable and Efficient Object Detection

l  Fast Template Matching and Update for Video Object Tracking and Segmentation

l  Counterfactual Samples Synthesizing for Robust Visual Question Answering

l  Local-Global Video-Text Interactions for Temporal Grounding

l  Set-Constrained Viterbi for Set-Supervised Action Segmentation

l  Probabilistic Video Prediction From Noisy Data With a Posterior Confidence

l  Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context

l  Visual Grounding in Video for Unsupervised Word Translation

l  Two Causal Principles for Improving Visual Dialog

l  Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

l  A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension

l  Better Captioning With Sequence-Level Exploration

l  Violin: A Large-Scale Dataset for Video-and-Language Inference

l  RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge

l  Graph Structured Network for Image-Text Matching

l  Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data

l  Multi-Modality Cross Attention Network for Image and Sentence Matching

l  Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data

l  Learning Augmentation Network via Influence Functions

l  X-Linear Attention Networks for Image Captioning

 

Recognition (Detection, Categorization) (2)

認識(検出など)に関する論文(2)です。(oral)

l  Unsupervised Person Re-Identification via Multi-Label Classification

l  Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax

l  What You See is What You Get: Exploiting Visibility for 3D Object Detection

l  Deep Structure-Revealed Network for Texture Recognition

l  Online Knowledge Distillation via Collaborative Learning

l  Dynamic Convolution: Attention Over Convolution Kernels

l  3DSSD: Point-Based 3D Single Stage Object Detector

l  Deep Degradation Prior for Low-Quality Image Classification

l  ViBE: Dressing for Diverse Body Shapes

l  Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

l  SESS: Self-Ensembling Semi-Supervised 3D Object Detection

l  Combining Detection and Tracking for Human Pose Estimation in Videos

Vision for Robotics and Autonomous Vehicles

ここはとにかくロボティックスに関する論文です。ロボット系はここを覗きましょう(oral)

l  SAPIEN: A SimulAted Part-Based Interactive ENvironment

l  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

l  SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving

l  A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors

l  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks

l  Efficient Derivative Computation for Cumulative B-Splines on Lie Groups

l  RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real

l  LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World

l  Just Go With the Flow: Self-Supervised Scene Flow Estimation

l  TITAN: Future Forecast Using Action Priors

 

Machine Learning Architectures and Formulations

機械学習自体の構造に関する論文です。(oral)

l  Robust Learning Through Cross-Task Consistency

l  Dynamic Refinement Network for Oriented and Densely Packed Object Detection

l  AOWS: Adaptive and Optimal Network Width Search With Latency Constraints

l  High-Dimensional Convolutional Networks for Geometric Pattern Recognition

l  Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks

l  Deep Iterative Surface Normal Estimation

l  Dataless Model Selection With the Deep Frame Potential

l  UNAS: Differentiable Architecture Search Meets Reinforcement Learning

l  Local Context Normalization: Revisiting Local Normalization

 

Recognition (Detection, Categorization); Vision for Robotics and Autonomous Vehicles; Machine Learning Architectures and Formulations

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l  ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning

l  Learning Situational Driving

l  From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction

l  Symmetry and Group in Attribute-Object Compositions

l  Noise-Aware Fully Webly Supervised Object Detection

l  3D Part Guided Image Editing for Fine-Grained Object Understanding

l  STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

l  Rethinking Performance Estimation in Neural Architecture Search

l  Feature-Metric Registration: A Fast Semi-Supervised Approach for Robust Point Cloud Registration Without Correspondences

l  Learning Multi-View Camera Relocalization With Graph Neural Networks

l  MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

l  EcoNAS: Finding Proxies for Economical Neural Architecture Search

l  Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

l  Geometrically Principled Connections in Graph Neural Networks

l  On Vocabulary Reliance in Scene Text Recognition

l  Generating Accurate Pseudo-Labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations

l  GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping

l  PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation

l  Through Fog High-Resolution Imaging Using Millimeter Wave Radar

l  Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction

l  D2Det: Towards High Quality Object Detection and Instance Segmentation

l  LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention

l  Orthogonal Convolutional Neural Networks

l  Self-Robust 3D Point Recognition via Gather-Vector Guidance

l  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

l  ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

l  MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

l  PnPNet: End-to-End Perception and Prediction With Tracking in the Loop

l  Revisiting the Sibling Head in Object Detector

l  Visual Reaction: Learning to Play Catch With Your Drone

l  Prime Sample Attention in Object Detection

l  SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

l  KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

l  SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel

l  nuScenes: A Multimodal Dataset for Autonomous Driving

l  PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation

l  Probabilistic Pixel-Adaptive Refinement Networks

l  Discovering Human Interactions With Novel Objects via Zero-Shot Learning

l  Equalization Loss for Long-Tailed Object Recognition

l  Learning Depth-Guided Convolutions for Monocular 3D Object Detection

l  Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

l  Don't Even Look Once: Synthesizing Features for Zero-Shot Detection

l  EPOS: Estimating 6D Pose of Objects With Symmetries

l  Train in Germany, Test in the USA: Making 3D Object Detectors Generalize

l  Exploring Categorical Regularization for Domain Adaptive Object Detection

l  Neural Implicit Embedding for Point Cloud Analysis

l  Pose-Guided Visible Part Matching for Occluded Person ReID

l  ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

l  Exploring Data Aggregation in Policy Learning for Vision-Based Urban Autonomous Driving

l  Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition

l  Recognizing Objects From Any View With Object and Viewer-Centered Representations

l  Gated Channel Transformation for Visual Recognition

l  Non-Local Neural Networks With Grouped Bilinear Attentional Transforms

l  Generative-Discriminative Feature Representations for Open-Set Recognition

l  RPM-Net: Robust Point Matching Using Learned Features

l  Sideways: Depth-Parallel Training of Video Models

l  Basis Prediction Networks for Effective Burst Denoising With Large Kernels

l  Private-kNN: Practical Differential Privacy for Computer Vision

l  SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

l  Structure Aware Single-Stage 3D Object Detection From Point Cloud

l  Looking at the Right Stuff - Guided Semantic-Gaze for Autonomous Driving

l  What's Hidden in a Randomly Weighted Neural Network?

l  Structured Multi-Hashing for Model Compression

l  DOPS: Learning to Detect 3D Objects and Predict Their 3D Shapes

l  AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization

l  GP-NAS: Gaussian Process Based Neural Architecture Search

l  NAS-FCOS: Fast Neural Architecture Search for Object Detection

l  TCTS: A Task-Consistent Two-Stage Framework for Person Search

l  SCATTER: Selective Context Attentional Scene Text Recognizer

l  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation

l  Hierarchical Scene Coordinate Classification and Regression for Visual Localization

l  MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation

l  Scalable Uncertainty for Computer Vision With Functional Variational Inference

l  Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End

l  Butterfly Transform: An Efficient FFT Based Neural Architecture Design

l  A Certifiably Globally Optimal Solution to Generalized Essential Matrix Estimation

l  MUXConv: Information Multiplexing in Convolutional Neural Networks

l  PointGMM: A Neural GMM Network for Point Clouds

l  Noisier2Noise: Learning to Denoise From Unpaired Noisy Data

l  TRPLP - Trifocal Relative Pose From Lines at Points

l  DSNAS: Direct Neural Architecture Search Without Parameter Retraining

l  MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships

l  Regularization on Spatio-Temporally Smoothed Feature for Action Recognition

l  Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

l  Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

l  Inferring Attention Shift Ranks of Objects for Image Saliency

l  Camera On-Boarding for Person Re-Identification Using Hypothesis Transfer Learning

l  Joint Graph-Based Depth Refinement and Normal Estimation

l  DR Loss: Improving Object Detection by Distributional Ranking

l  Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection

l  Few-Shot Class-Incremental Learning

l  PolarMask: Single Shot Instance Segmentation With Polar Representation

l  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers

l  Detection in Crowded Scenes: One Proposal, Multiple Predictions

l  Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors

l  Interactive Object Segmentation With Inside-Outside Guidance

l  Mnemonics Training: Multi-Class Incremental Learning Without Forgetting

l  Learning to Segment 3D Point Clouds in 2D Image Space

l  Smooth Shells: Multi-Scale Shape Registration With Functional Maps

l  Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

Vision Applications and Systems; Vision & Other Modalities; Visual Reasoning and Logical Representation

視覚系アプリケーションやシステムと言った研究で、視覚に関する分類ができなかったその他の論文になります。(oral)

l  Efficient Neural Vision Systems Based on Convolutional Image Acquisition

l  Visual Chirality

l  What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images

l  Dynamic Traffic Modeling From Overhead Imagery

l  Satellite Image Time Series Classification With Pixel-Set Encoders and Temporal Self-Attention

l  DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads

l  Learning When and Where to Zoom With Deep Reinforcement Learning

 

Transfer/Low-Shot/Semi/Unsupervised Learning (3)

Transfer/Low-Shot/Semi/Unsupervised Learning (3)の論文になります。(oral)

l  Cross-Domain Detection via Graph-Induced Prototype Alignment

l  Meta-Learning of Neural Architectures for Few-Shot Learning

l  Towards Inheritable Models for Open-Set Domain Adaptation

l  Learning From Synthetic Animals

l  Distilling Cross-Task Knowledge via Relationship Matching

l  Open Compound Domain Adaptation

 

Recognition (Detection, Categorization); Segmentation, Grouping and Shape; Vision Applications and Systems; Vision & Other Modalities; Transfer/Low-Shot/Semi/Unsupervised Learning

ここからが上記の3つのセッションをまとめたポスターの内容です。

l  Context Prior for Scene Segmentation

l  Tangent Images for Mitigating Spherical Distortion

l  Learning a Dynamic Map of Visual Appearance

l  Webly Supervised Knowledge Embedding Model for Visual Reasoning

l  Gradually Vanishing Bridge for Adversarial Domain Adaptation

l  Active Speakers in Context

l  Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

l  Inter-Region Affinity Distillation for Road Marking Segmentation

l  Unified Dynamic Convolutional Network for Super-Resolution With Variational Degradations

l  Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks

l  Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN

l  Screencast Tutorial Video Understanding

l  DSGN: Deep Stereo Geometry Network for 3D Object Detection

l  Weakly-Supervised Salient Object Detection via Scribble Annotations

l  Learning to Learn Single Domain Generalization

l  Severity-Aware Semantic Segmentation With Reinforced Wasserstein Training

l  Boosting Few-Shot Learning With Adaptive Margin Loss

l  JA-POLS: A Moving-Camera Background Model via Joint Alignment and Partially-Overlapping Local Subspaces

l  AugFPN: Improving Multi-Scale Feature Learning for Object Detection

l  xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

l  Norm-Aware Embedding for Efficient Person Search

l  Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only

l  Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

l  Robust Object Detection Under Occlusion With Context-Aware CompositionalNets

l  IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval

l  Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning

l  Semi-Supervised Semantic Segmentation With Cross-Consistency Training

l  Learning to Learn Cropping Models for Different Aspect Ratio Requirements

l  What Makes Training Multi-Modal Classification Networks Hard?

l  Selective Transfer With Reinforced Transfer Network for Partial Domain Adaptation

l  Semi-Supervised Semantic Image Segmentation With Self-Correcting Networks

l  Exemplar Normalization for Learning Deep Representation

l  Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation

l  Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

l  StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching

l  Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning

l  Sparse Layered Graphs for Multi-Object Segmentation

l  Visual-Semantic Matching by Exploring High-Order Attention and Distraction

l  End-to-End 3D Point Cloud Instance Segmentation Without Detection

l  Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images

l  Differentiable Adaptive Computation Time for Visual Reasoning

l  DeepLPF: Deep Local Parametric Filters for Image Enhancement

l  Instance Credibility Inference for Few-Shot Learning

l  Learning From Web Data With Self-Organizing Memory Module

l  TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning

l  Learning the Redundancy-Free Features for Generalized Zero-Shot Object Recognition

l  Neural Topological SLAM for Visual Navigation

l  WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching

l  Robust Superpixel-Guided Attentional Adversarial Attack

l  BEDSR-Net: A Deep Shadow Removal Network From a Single Document Image

l  Cross-Domain Document Object Detection: Benchmark Suite and Method

l  Explaining Knowledge Distillation by Quantifying the Knowledge

l  Exploring Bottom-Up and Top-Down Cues With Attentive Learning for Webly Supervised Object Detection

l  Enhancing Generic Segmentation With Learned Region Representations

l  Adaptive Hierarchical Down-Sampling for Point Cloud Classification

l  FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

l  Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation

l  Putting Visual Object Recognition in Context

l  SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection

l  Universal Weighting Metric Learning for Cross-Modal Matching

l  IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous Driving

l  Label Decoupling Framework for Salient Object Detection

l  Transform and Tell: Entity-Aware News Image Captioning

l  HAMBox: Delving Into Mining High-Quality Anchors on Face Detection

l  Hierarchical Feature Embedding for Attribute Recognition

l  Squeeze-and-Attention Networks for Semantic Segmentation

l  Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection

l  Mixture Dense Regression for Object Detection and Human Pose Estimation

l  Syntax-Aware Action Targeting for Video Captioning

l  Learning Visual Emotion Representations From Web Data

l  The Edge of Depth: Explicit Constraints Between Segmentation and Depth

l  A Context-Aware Loss Function for Action Spotting in Soccer Videos

l  Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training

l  Video Instance Segmentation Tracking With a Modified VAE Architecture

l  Deformation-Aware Unpaired Image Translation for Pose Estimation on Laboratory Animals

l  ZeroQ: A Novel Zero Shot Quantization Framework

l  Disparity-Aware Domain Adaptation in Stereo Image Restoration

l  Offset Bin Classification Network for Accurate Object Detection

l  TBT: Targeted Neural Network Attack With Bit Trojan

l  Maintaining Discrimination and Fairness in Class Incremental Learning

l  Background Data Resampling for Outlier-Aware Classification

l  STEFANN: Scene Text Editor Using Font Adaptive Neural Network

l  Geometry and Learning Co-Supported Normal Estimation for Unstructured Point Cloud

l  Sequential Motif Profiles and Topological Plots for Offline Signature Verification

l  Optical Flow in Dense Foggy Scenes Using Semi-Supervised Learning

l  A Spatial RNN Codec for End-to-End Image Compression

l  Object Relational Graph With Teacher-Recommended Learning for Video Captioning

l  MMTM: Multimodal Transfer Module for CNN Fusion

l  Generalized Zero-Shot Learning via Over-Complete Distribution

l  Gait Recognition via Semi-supervised Disentangled Representation Learning to Identity and Covariate Features

 

Miscellaneous

どこにも分類ができなかった”その他”の論文です。こちらは全てポスターになります。

l  Unifying Training and Inference for Panoptic Segmentation

l  Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

l  Interactive Image Segmentation With First Click Attention

l  NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection

l  Scale-Equalizing Pyramid Convolution for Object Detection

l  Learning to Cluster Faces via Confidence and Connectivity Estimation

l  Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer

l  DPGN: Distribution Propagation Graph Network for Few-Shot Learning

l  Density-Aware Graph for Deep Semi-Supervised Visual Recognition

l  Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

l  Binarizing MobileNet via Evolution-Based Searching

l  Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians

l  Orderless Recurrent Models for Multi-Label Classification

l  Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning

l  Rethinking the Route Towards Weakly Supervised Object Localization

l  Adversarial Feature Hallucination Networks for Few-Shot Learning

l  Conditional Gaussian Distribution Learning for Open Set Recognition

l  Connect-and-Slice: An Hybrid Approach for Reconstructing 3D Objects

l  Attentive Weights Generation for Few Shot Learning via Information Maximization

l  Assessing Eye Aesthetics for Automatic Multi-Reference Eye In-Painting

l  PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

l  SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

l  Texture and Shape Biased Two-Stream Networks for Clothing Classification and Attribute Recognition

l  Distortion Agnostic Deep Watermarking

l  RMP-SNN: Residual Membrane Potential Neuron for Enabling Deeper High-Accuracy and Low-Latency Spiking Neural Network

l  BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector

l  PFCNN: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames

l  iTAML: An Incremental Task-Agnostic Meta-learning Approach

l  Optimal least-squares solution to the hand-eye calibration problem

l  MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices

l  VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

l  End-to-End Camera Calibration for Broadcast Videos

l  Regularizing CNN Transfer Learning With Randomised Regression

l  KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations

l  Hierarchical Clustering With Hard-Batch Triplet Loss for Person Re-Identification

l  Joint Semantic Segmentation and Boundary Detection Using Iterative Pyramid Contexts

l  Attention-Guided Hierarchical Structure Aggregation for Image Matting

l  MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation

l  Prior Guided GAN Based Semantic Inpainting

l  Weakly Supervised Semantic Point Cloud Segmentation: Towards 10x Fewer Labels

l  Physically Realizable Adversarial Examples for LiDAR Object Detection

l  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization

l  Light-weight Calibrator: A Separable Component for Unsupervised Domain Adaptation

l  Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

l  Learning Selective Self-Mutual Attention for RGB-D Saliency Detection

l  Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation

l  Estimating Low-Rank Region Likelihood Maps

l  Neural Head Reenactment with Latent Pose Descriptors

l  Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

l  Self-Supervised Learning of Video-Induced Visual Invariances

l  Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

l  MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment

l  Improving One-Shot NAS by Suppressing the Posterior Fading

l  Incremental Few-Shot Object Detection

l  Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data

l  Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation

l  Regularizing Class-Wise Predictions via Self-Knowledge Distillation

l  Hierarchical Graph Attention Network for Visual Relationship Detection

l  M2m: Imbalanced Classification via Major-to-Minor Translation

l  CenterMask: Real-Time Anchor-Free Instance Segmentation

l  Multi-Path Learning for Object Pose Estimation Across Domains

l  Incremental Learning in Online Scenario

l  Enhanced Transport Distance for Unsupervised Domain Adaptation

l  TESA: Tensor Element Self-Attention via Matricization

l  Training a Steerable CNN for Guidewire Detection

l  Superpixel Segmentation With Fully Convolutional Networks

l  SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation

l  Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition

l  Deep Residual Flow for Out of Distribution Detection

l  FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

l  Learning Nanoscale Motion Patterns of Vesicles in Living Cells

l  Improving Action Segmentation via Graph-Based Temporal Reasoning

l  Episode-Based Prototype Generating Network for Zero-Shot Learning

l  Learning to Segment the Tail

l  Learning to Evaluate Perception Models Using Planner-Centric Metrics

l  Where, What, Whether: Multi-Modal Learning Meets Pedestrian Detection

l  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets

l  Real-World Person Re-Identification via Degradation Invariance Learning

l  Defending and Harnessing the Bit-Flip Based Adversarial Weight Attack

l  Adversarial Latent Autoencoders

l  Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment

l  Deep Generative Model for Robust Imbalance Classification

l  Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses

l  MetaIQA: Deep Meta-Learning for No-Reference Image Quality Assessment

l  Sketchformer: Transformer-Based Representation for Sketched Structure

l  Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

l  Learning a Unified Sample Weighting Network for Object Detection

l  Old Is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm

l  An Adaptive Neural Network for Unsupervised Mosaic Consistency Analysis in Image Forensics

l  McFlow: Monte Carlo Flow Models for Data Imputation

l  Learning to See Through Obstructions

l  GaitPart: Temporal Part-Based Model for Gait Recognition

l  EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle

l  Can Deep Learning Recognize Subtle Human Activities?

l  PhysGAN: Generating Physical-World-Resilient Adversarial Examples for Autonomous Driving

l  ILFO: Adversarial Attack on Adaptive Neural Networks

l  On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location

l  Diverse Image Generation via Self-Conditioned GANs

l  Inducing Hierarchical Compositional Model by Sparsifying Generator Network

l  CARP: Compression Through Adaptive Recursive Partitioning for Multi-Dimensional Images

l  GrappaNet: Combining Parallel Imaging With Deep Learning for Multi-Coil MRI Reconstruction

l  Can Weight Sharing Outperform Random Architecture Search? An Investigation With TuNAS

l  Context Aware Graph Convolution for Skeleton-Based Action Recognition

l  Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning

l  Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition

l  RankMI: A Mutual Information Maximizing Ranking Loss

l  Learning Memory-Guided Normality for Anomaly Detection

l  Appearance Shock Grammar for Fast Medial Axis Extraction From Real Images

l  Generalizing Hand Segmentation in Egocentric Videos With Uncertainty-Guided Model Adaptation

l  DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning

l  Learning Visual Motion Segmentation Using Event Surfaces

l  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

l  Discriminative Multi-Modality Speech Recognition

l  Clean-Label Backdoor Attacks on Video Recognition Models

l  Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

l  Unsupervised Model Personalization While Preserving Privacy and Scalability: An Open Problem

l  GIFnets: Differentiable GIF Encoding Framework

l  Learning Invariant Representation for Unsupervised Image Restoration

l  Improved Few-Shot Visual Classification

l  Learning Weighted Submanifolds With Variational Autoencoders and Riemannian Variational Autoencoders

l  Learning Geocentric Object Pose in Oblique Monocular Images

l  Understanding Adversarial Examples From the Mutual Influence of Images and Perturbations

l  Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

l  MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

l  HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting

l  Detail-recovery Image Deraining via Context Aggregation Networks

l  MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

l  Hypergraph Attention Networks for Multimodal Learning

l  Moving in the Right Direction: A Regularization for Deep Metric Learning

l  Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets

l  Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization

l  End-to-End Adversarial-Attention Network for Multi-Modal Clustering

l  Fast Sparse ConvNets

l  Few Sample Knowledge Distillation for Efficient Network Compression

l  Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields

l  Shape correspondence using anisotropic Chebyshev spectral CNNs

l  RetinaTrack: Online Single Stage Joint Detection and Tracking

l  Multimodal Categorization of Crisis Events in Social Media

l  SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings

l  SwapText: Image Based Texts Transfer in Scenes

l  OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

l  FroDO: From Detections to 3D Objects

 

記事の内容等について改善箇所などございましたら、
お問い合わせフォームよりAI-SCHOLAR編集部の方にご連絡を頂けますと幸いです。
どうぞよろしくお願いします。

お問い合わせする