画像検索や人物検索といった技術のデータセットの作成や評価法についてです。すなわち、新しいデータセットが発表されたという認識でOKです。分野は様々で、3D頭蓋内動脈瘤データセットやファッション検索用のデータセットなど、論文内で新しいモデルを提案していても、データセットの評価や作成が含まれるものがここになります。(oral)

l Instance Guided Proposal Network for Person Search

l Which Is Plagiarism: Fashion Image Retrieval Based on Regional Representation for Design Protection

l Inter-Task Association Critic for Cross-Resolution Person Re-Identification

l FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding

l Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition

l BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

l Rethinking Computer-Aided Tuberculosis Diagnosis

l IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning

l Revisiting Saliency Metrics: Farthest-Neighbor Area Under Curve

l Computing the Testing Error Without a Testing Set

l Improving Confidence Estimates for Unfamiliar Examples

l CycleISP: Real Image Restoration via Improved Data Synthesis

Low-Level and Physics-Based Vision

劣化したデータやノイズの多い画像に関する論文です。復元やノイズ除去が多いです。(oral)

l Enhanced Blind Face Restoration With Multi-Exemplar Images and Adaptive Spatial Feature Fusion

l Explorable Super Resolution

l Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes

l Deblurring by Realistic Blurring

l Bringing Old Photos Back to Life

l A Physics-Based Noise Formation Model for Extreme Low-Light Raw Denoising

l Learning to Super Resolve Intensity Images From Events

l Camouflaged Object Detection

l Holistically-Attracted Wireframe Parsing

3D From a Single Image and Shape-From-X; 3D From Multiview and Sensors; Image Retrieval; Datasets and Evaluation; Low-Level and Physics-Based Vision

ここからが上記の4つのセッションをまとめたポスターの内容になります。

l Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction

l Domain Adaptation for Image Dehazing

l Auto-Encoding Twin-Bottleneck Hashing

l Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

l Bi-Directional Interaction Network for Person Search

l Meshlet Priors for 3D Mesh Reconstruction

l Space-Time-Aware Multi-Resolution Video Enhancement

l FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

l MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation

l DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

l Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification

l Online Joint Multi-Metric Adaptation From Frequent Sharing-Subset Mining for Person Re-Identification

l Taking a Deeper Look at Co-Salient Object Detection

l Single-Stage 6D Object Pose Estimation

l OccuSeg: Occupancy-Aware 3D Instance Segmentation

l Camera Trace Erasing

l Deep Metric Learning via Adaptive Learnable Assessment

l Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective

l Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention

l HUMBI: A Large Multiview Dataset of Human Body Expressions

l Image Search With Text Feedback by Visiolinguistic Attention Learning

l Image Processing Using Multi-Code GAN Prior

l What Does Plate Glass Reveal About Camera Calibration?

l Zero-Assignment Constraint for Graph Matching With Outliers

l Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

l JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

l From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement

l Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution

l ARCH: Animatable Reconstruction of Clothed Humans

l A Model-Driven Deep Neural Network for Single Image Rain Removal

l Novel Object Viewpoint Estimation Through Reconstruction Alignment

l Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

l Evaluating Weakly Supervised Object Localization Methods Right

l Style Normalization and Restitution for Generalizable Person Re-Identification

l Reconstruct Locally, Localize Globally: A Model Free Method for Object Pose Estimation

l RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

l All in One Bad Weather Removal Using Architectural Search

l Relation-Aware Global Attention for Person Re-Identification

l HOnnotate: A Method for 3D Annotation of Hand and Object Poses

l Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics

l Deep Unfolding Network for Image Super-Resolution

l On the Uncertainty of Self-Supervised Monocular Depth Estimation

l Proxy Anchor Loss for Deep Metric Learning

l Unsupervised Learning for Intrinsic Image Decomposition From a Single Image

l Multi-Domain Learning for Accurate and Few-Shot Color Constancy

l PANDA: A Gigapixel-Level Human-Centric Video Dataset

l Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS

l Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification

l Salience-Guided Cascaded Suppression Network for Person Re-Identification

l Fashion Outfit Complementary Item Retrieval

l Learning Event-Based Motion Deblurring

l Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

l Neural Blind Deconvolution Using Deep Priors

l Anisotropic Convolutional Networks for 3D Semantic Scene Completion

l TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

l Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

l Fast MSER

l Unsupervised Person Re-Identification via Softened Similarity Learning

l COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification

l Learning Formation of Physically-Based Face Attributes

l Generalized Product Quantization Network for Semi-Supervised Image Retrieval

l Stereoscopic Flash and No-Flash Photography for Shape and Albedo Recovery

l Context-Aware Group Captioning via Self-Attention and Contrastive Features

l MEBOW: Monocular Estimation of Body Orientation in the Wild

l Distilling Image Dehazing With Heterogeneous Task Imitation

l Select, Supplement and Focus for RGB-D Saliency Detection

l Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization

l On Joint Estimation of Pose, Geometry and svBRDF From a Handheld Scanner

l Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision

l Meta-Transfer Learning for Zero-Shot Super-Resolution

l Solving Jigsaw Puzzles With Eroded Boundaries

l Context-Aware Attention Network for Image-Text Retrieval

l M-LVC: Multiple Frames Prediction for Learned Video Compression

l Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training

l Single Image Reflection Removal Through Cascaded Refinement

l From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality

l Video to Events: Recycling Video Datasets for Event Cameras

l Composed Query Image Retrieval Using Locally Bounded Features

l Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring

l End-to-End Illuminant Estimation Based on Deep Metric Learning

l Variational-EM-Based Deep Learning for Noise-Blind Image Deblurring

l Image Demoireing with Learnable Bandpass Filters

l Assessing Image Quality Issues for Real-World Problems

l Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

l Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network

l Perceptual Quality Assessment of Smartphone Photography

l Don't Hit Me! Glass Detection in Real-World Scenes

l Progressive Mirror Detection

Scene Analysis and Understanding

シーングラフ生成(SGG)に関する研究やデータに限りがある際に合成データを使用し、その後、実データで学習することでギャップを改善する研究があるが、それは実分布のギャップを評価していないので評価することでここを理解しよう(Understanding)と言った研究内容の論文です。(oral)

l Category-Level Articulated Object Pose Estimation

l Unbiased Scene Graph Generation From Biased Training

l Dynamic Graph Message Passing Networks

l Weakly Supervised Visual Semantic Parsing

l GPS-Net: Graph Property Sensing Network for Scene Graph Generation

l End-to-End Optimization of Scene Layout

l Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision

l Dual Super-Resolution Learning for Semantic Segmentation

l Self-Supervised Scene De-Occlusion

l BANet: Bidirectional Aggregation Network With Occlusion Handling for Panoptic Segmentation

Medical, Biological and Cell Microscopy

医療関係の論文です。例えば、顕微鏡におけるギガピクセルも持つwhole slide imagesの処理をすることが近年行われているが、医師と同じで、怪しい部分だけ使用することが正しいということを主張している研究と言った医療関係の方はここを見ればいいですね。(oral)

l CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

l Cross-View Correspondence Reasoning Based on Bipartite Graph Convolutional Network for Mammogram Mass Detection

l MPM: Joint Representation of Motion and Position Map for Cell Tracking

l Deep Distance Transform for Tubular Structure Segmentation in CT Scans

l Instance Segmentation of Biological Images Using Harmonic Embeddings

l Multi-scale Domain-adversarial Multiple-instance CNN for Cancer Subtype Classification with Unannotated Histopathological Images

l SOS: Selective Objective Switch for Rapid Immunofluorescence Whole Slide Image Classification

Transfer/Low-Shot/Semi/Unsupervised Learning (1)

Transfer/Low-Shot/Semi/Unsupervised Learningがまとまっています。かなりビックリするものもありますのでここから覗いてみてもいいかもしれませんね。(oral)

l Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density Networks

l METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos

l Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

l Revisiting Knowledge Distillation via Label Smoothing Regularization

l WCP: Worst-Case Perturbations for Semi-Supervised Deep Learning

l DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

l Conditional Channel Gated Networks for Task-Aware Continual Learning

l Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations

Scene Analysis and Understanding; Medical, Biological and Cell Microscopy; Transfer/Low-Shot/Semi/Unsupervised Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

l Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions

l Self-Supervised Viewpoint Learning From Image Collections

l Two-Shot Spatially-Varying BRDF and Shape Estimation

l Variational Context-Deformable ConvNets for Indoor Scene Parsing

l Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

l Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector

l What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation

l ADINet: Attribute Driven Incremental Network for Retinal Image Classification

l Unsupervised Domain Adaptation With Hierarchical Gradient Synchronization

l Deep Grouping Model for Unified Perceptual Parsing

l Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

l Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging

l FDA: Fourier Domain Adaptation for Semantic Segmentation

l Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery

l When2com: Multi-Agent Perception via Communication Graph Grouping

l Learning Human-Object Interaction Detection Using Interaction Points

l C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

l Adaptive Subspaces for Few-Shot Learning

l Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection

l Stochastic Sparse Subspace Clustering

l CRNet: Cross-Reference Networks for Few-Shot Segmentation

l Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data

l Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings

l 3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior

l Graph-Guided Architecture Search for Real-Time Semantic Segmentation

l Composing Good Shots by Exploiting Mutual Relations

l Organ at Risk Segmentation for Head and Neck Cancer Using Stratified Learning and Neural Architecture Search

l G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features

l Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-Weighting

l Single-Stage Semantic Segmentation From Image Labels

l Cascaded Human-Object Interaction Recognition

l DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction With Deep T1 Prior

l Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation

l FPConv: Learning Local Flattening for Point Convolution

l Rotation Equivariant Graph Convolutional Network for Spherical Image Classification

l FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation

l ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

l Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer

l Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

l Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior

l Structure Preserving Generative Cross-Domain Learning

l Reverse Perspective Network for Perspective-Aware Object Counting

l Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

l Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation

l ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes

l Understanding Road Layout From Videos as a Whole

l Bi-Directional Relationship Inferring Network for Referring Image Segmentation

l Perspective Plane Program Induction From a Single Image

l DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration

l Semi-Supervised Learning for Few-Shot Image-to-Image Translation

l Semantic Correspondence as an Optimal Transport Problem

l How Much Time Do You Have? Modeling Multi-Duration Saliency

l Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention

l Online Depth Learning Against Forgetting in Monocular Videos

l Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation

l Pattern-Structure Diffusion for Multi-Task Learning

l Training Noise-Robust Deep Neural Networks via Meta-Learning

l Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

l Universal Source-Free Domain Adaptation

l Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

l Varicolored Image De-Hazing

l SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds

l Separating Particulate Matter From a Single Microscopic Image

l Adaptive Dilated Network With Self-Correction Supervision for Counting

l PointPainting: Sequential Fusion for 3D Object Detection

l Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

l Learning to Select Base Classes for Few-Shot Classification

l CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

l Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks

l Distilled Semantics for Comprehensive Scene Understanding from Videos

l Modeling Biological Immunity to Adversarial Examples

l DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-Move Forgery Detection and Localization

l Correspondence-Free Material Reconstruction using Sparse Surface Constraints

l Augmenting Colonoscopy Using Extended and Directional CycleGAN for Lossy Image Translation

l Attention Scaling for Crowd Counting

l Shape Reconstruction by Learning Differentiable Surface Representations

l A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image

l Attention-Based Context Aware Reasoning for Situation Recognition

l PatchVAE: Learning Local Latent Codes for Recognition

l Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

l STAViS: Spatio-Temporal AudioVisual Saliency Network

l More Grounded Image Captioning by Distilling Image-Text Matching Model

l DUNIT: Detection-Based Unsupervised Image-to-Image Translation

l Learning to Observe: Approximating Human Perceptual Thresholds for Detection of Suprathreshold Image Transformations

l Show, Edit and Tell: A Framework for Editing Image Captions

l Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary

l Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers

l Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

l Extremely Dense Point Correspondences Using a Learned Feature Descriptor

3D From Multiview and Sensors (3)

3D From Multiview and Sensors の最後のoralになります。あとは下にポスターがあり、3D From Multiview and Sensors に興味がある人はそこまで頑張りましょう。(oral)

l Local Deep Implicit Functions for 3D Shape

l PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

l Cost Volume Pyramid Based Depth Inference for Multi-View Stereo

l RoutedFusion: Learning Real-Time Depth Map Fusion

l VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals

l Learning to Optimize Non-Rigid Tracking

l KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering

l Information-Driven Direct RGB-D Odometry

l SuperGlue: Learning Feature Matching With Graph Neural Networks

l Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Face, Gesture, and Body Pose (1)

顔や体に関する論文です。顔のセキュリティであるなりすましに関する研究や顔の3Dなどもここに含まれます。顔認証技術や企業でそう言った内容をやられている方にはオススメです。(oral)

l ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction

l EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera

l Cross-Modal Deep Face Normals With Deactivable Skip Connections

l Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild

l Face X-Ray for More General Face Forgery Detection

l A Morphable Face Albedo Model

l Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses

l GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes

l Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing

l DeepCap: Monocular Human Performance Capture Using Weak Supervision

l Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

l Advancing High Fidelity Identity Swapping for Forgery Detection

Image and Video Synthesis (1)

”Synthesis(合成)”にメインが置かれています。GANに関する研究やGANじゃないもの、とにかく新しく何かを合成している論文です。(oral)

l Controllable Person Image Synthesis With Attribute-Decomposed GAN

l Attentive Normalization for Conditional Image Generation

l SEAN: Image Synthesis With Semantic Region-Adaptive Normalization

l Blurry Video Frame Interpolation

l Learning Physics-Guided Face Relighting Under Directional Light

l Disentangled Image Generation Through Structured Noise Injection

l Cross-Domain Correspondence Learning for Exemplar-Based Image Translation

l Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning

l Single Image Reflection Removal With Physically-Based Training Images

l SketchyCOCO: Image Generation From Freehand Scene Sketches

l Image Based Virtual Try-On Network From Unpaired Data

l PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

3D From Multiview and Sensors; Face, Gesture, and Body Pose; Image and Video Synthesis

ここからが上記の2つのセッションをまとめたポスターの内容になります。

l RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild

l Semantic Image Manipulation Using Scene Graphs

l A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

l Transferring Dense Pose to Proximal Animal Classes

l Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild

l VIBE: Video Inference for Human Body Pose and Shape Estimation

l G3AN: Disentangling Appearance and Motion for Video Generation

l Domain Adaptive Image-to-Image Translation

l GAN Compression: Efficient Architectures for Interactive Conditional GANs

l Searching Central Difference Convolutional Networks for Face Anti-Spoofing

l TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

l AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

l FReeNet: Multi-Identity Face Reenactment

l Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera

l Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data

l The GAN That Warped: Semantic Attribute Editing With Unpaired Data

l 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos

l Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

l HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

l Detecting Attended Visual Targets in Video

l Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

l Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool

l Neural Contours: Learning to Draw Lines From 3D Shapes

l Softmax Splatting for Video Frame Interpolation

l CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

l Probabilistic Structural Latent Representation for Unsupervised Embedding

l Semantically Multi-Modal Image Synthesis

l Nested Scale-Editing for Conditional Image Synthesis

l UnrealText: Synthesizing Realistic Scene Text Images From the Unreal World

l Fast Texture Synthesis via Pseudo Optimizer

l Towards Learning Structure via Consensus for Face Segmentation and Parsing

l CookGAN: Causality Based Text-to-Image Synthesis

l Weakly Supervised Discriminative Feature Learning With State Information for Person Identification

l Future Video Synthesis With Object Motion Prediction

l MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

l A Graduated Filter Method for Large Scale Robust Estimation

l Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation

l Coherent Reconstruction of Multiple Humans From a Single Image

l PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling

l A Neural Rendering Framework for Free-Viewpoint Relighting

l A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection

l GroupFace: Learning Latent Groups and Constructing Group-Based Representations for Face Recognition

l Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution

l Time Flies: Animating a Still Image With Time-Lapse Video As Reference

l SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness

l Grid-GCN for Fast and Scalable Point Cloud Learning

l Domain Balancing: Face Recognition on Long-Tailed Domains

l AdversarialNAS: Adversarial Neural Architecture Search for GANs

l Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

l The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation

l Data Uncertainty Learning in Face Recognition

l Regularizing Discriminative Capability of CGANs for Semi-Supervised Generative Learning

l FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification

l UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation

l Decoupled Representation Learning for Skeleton-Based Gesture Recognition

l An Efficient PointLSTM for Point Clouds Based Gesture Recognition

l Editing in Style: Uncovering the Local Semantics of GANs

l On the Detection of Digital Face Manipulation

l Learning Texture Transformer Network for Image Super-Resolution

l Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence

l Deblurring Using Analysis-Synthesis Networks Pair

l Exploring Unlabeled Faces for Novel Attribute Discovery

l Neural Pose Transfer by Spatially Adaptive Instance Normalization

l Fine-Grained Image-to-Image Transformation Towards Visual Recognition

l Deep Facial Non-Rigid Multi-View Stereo

l Attention-Driven Cropping for Very High Resolution Facial Landmark Detection

l Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

l End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

l Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks

l CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

l Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

l One-Shot Domain Adaptation for Face Generation

l BidNet: Binocular Image Dehazing Without Explicit Disparity Estimation

l Deep Shutter Unrolling Network

l Joint Texture and Geometry Optimization for RGB-D Reconstruction

l Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images

l Auto-Tuning Structured Light by Optical Stochastic Gradient Descent

l MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction

l Uncertainty Based Camera Model Selection

l Local Implicit Grid Representations for 3D Scenes

l TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell

l Averaging Essential and Fundamental Matrices in Collinear Camera Settings

l On the Distribution of Minima in Intrinsic-Metric Rotation Averaging

l Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation

l A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset

l Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation

l Effectively Unbiased FID and Inception Score and Where to Find Them

l Robust Homography Estimation via Dual Principal Component Pursuit

l Non-Adversarial Video Synthesis With Learned Priors

l Uncertainty-Aware Mesh Decoder for High Fidelity 3D Face Reconstruction

Face, Gesture, and Body Pose (2)

上記ですでに出ているFace, Gesture, and Body Poseのパート２です。(oral)

l 3FabRec: Fast Few-Shot Face Alignment by Reconstruction

l Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects

l Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition

l StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images

l Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis

l Learning Meta Face Recognition in Unseen Domains

l Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data

l GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models

l Generating 3D People in Scenes Without People

l Transferring Cross-Domain Knowledge for Video Sign Language Recognition

l Bodies at Rest: 3D Human Pose and Shape Estimation From a Pressure Image Using Synthetic Data

l Bayesian Adversarial Human Motion Synthesis

Motion and Tracking (1)

動きた追跡に関する論文です。(oral)

l LSM: Learning Subspace Minimization for Low-Level Vision

l Learning a Neural Solver for Multiple Object Tracking

l GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences

l SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking

l MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask

l Tracking by Instance Detection: A Meta-Learning Approach

l High-Performance Long-Term Tracking With Meta-Updater

l TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

l Collaborative Motion Prediction via Neural Motion Message Passing

l P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

l Self-Supervised Deep Visual Odometry With Online Adaptation

l Globally Optimal Contrast Maximisation for Event-Based Motion Estimation

Representation Learning

Representationに関する論文です。(oral)

l D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

l Towards Backward-Compatible Representation Learning

l PointAugment: An Auto-Augmentation Framework for Point Cloud Classification

l Cross-Batch Memory for Embedding Learning

l Circle Loss: A Unified Perspective of Pair Similarity Optimization

l Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

l Hyperbolic Image Embeddings

l Controllable Orthogonalization in Training DNNs

l An Investigation Into the Stochasticity of Batch Whitening

Face, Gesture, and Body Pose; Motion and Tracking; Representation Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

l Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

l Learning to Dress 3D People in Generative Clothing

l MAST: A Memory-Augmented Self-Supervised Tracker

l Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation

l GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning

l ClusterFit: Improving Generalization of Visual Representations

l Learning Dynamic Relationships for 3D Human Motion Prediction

l Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge

l S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

l Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

l Learning to Manipulate Individual Objects in an Image

l PADS: Policy-Adapted Sampling for Visual Similarity Learning

l Siam R-CNN: Visual Tracking by Re-Detection

l ASLFeat: Learning Local Features of Accurate Shape and Localization

l Filter Grafting for Deep Neural Networks

l HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation

l DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

l Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement

l Learning Better Lossless Compression Using Lossy Compression

l Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching

l Multi-Scale Fusion Subspace Clustering Using Similarity Constraint

l Siamese Box Adaptive Network for Visual Tracking

l Cross-Domain Face Presentation Attack Detection via Multi-Domain Disentangled Representation Learning

l Online Deep Clustering for Unsupervised Representation Learning

l Density-Aware Feature Embedding for Face Clustering

l Self-Supervised Learning of Pretext-Invariant Representations

l ROAM: Recurrently Optimizing Tracking Model

l Deformable Siamese Attention Networks for Visual Object Tracking

l 15 Keypoints Is All You Need

l Optical Flow in the Dark

l Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt

l A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

l Sub-Frame Appearance and 6D Pose Estimation of Fast Moving Objects

l How to Train Your Deep Multi-Object Tracker

l TPNet: Trajectory Proposal Network for Motion Prediction

l Large Scale Video Representation Learning via Relational Graph Clustering

l Towards Universal Representation Learning for Deep Face Recognition

l Robust Partial Matching for Person Search in the Wild

l Correlation-Guided Attention for Corner Detection Based Visual Tracking

l Learning Multi-Object Tracking and Segmentation From Automatic Annotations

l PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation

l Rotation Consistent Margin Loss for Efficient Low-Bit Face Recognition

l Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

l Unity Style Transfer for Person Re-Identification

l Suppressing Uncertainties for Large-Scale Facial Expression Recognition

l Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation

l Regularizing Neural Networks via Minimizing Hyperspherical Energy

l Learning Representations by Predicting Bags of Visual Words

l AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

l A Transductive Approach for Video Object Segmentation

l Dynamic Face Video Segmentation via Reinforcement Learning

l Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

l Semantic Drift Compensation for Class-Incremental Learning

l Context-Aware Human Motion Prediction

l DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data

l Optical Non-Line-of-Sight Physics-Based 3D Human Pose Estimation

l Learning to Transfer Texture From Clothing Images to 3D Humans

l UniPose: Unified Human Pose Estimation in Single Images and Videos

l Minimal Solutions to Relative Pose Estimation From Two Views Sharing a Common Direction With Unknown Focal Length

l 3D Human Mesh Regression With Dense Correspondence

l Cross-Modal Pattern-Propagation for RGB-T Tracking

l Distilling Knowledge From Graph Convolutional Networks

l Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment

l Distribution-Aware Coordinate Representation for Human Pose Estimation

l Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification

l HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

l Determinant Regularization for Gradient-Efficient Graph Matching

l D3S - A Discriminative Single Shot Segmentation Tracker

l MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction

l End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

l GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-Wise Transformations

l Can Facial Pose and Expression Be Separated With Weak Perspective Camera?

l Probabilistic Regression for Visual Tracking

l 3DRegNet: A Deep Neural Network for 3D Point Registration

l Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

l Three-Dimensional Reconstruction of Human Interactions

l Distribution-Induced Bidirectional Generative Adversarial Network for Graph Representation Learning

l Minimal Solvers for 3D Scan Alignment With Pairs of Intersecting Lines

l Wavelet Integrated CNNs for Noise-Robust Image Classification

l Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning

l PropagationNet: Propagate Points to Curve to Learn Structure Information

l Sequential 3D Human Pose and Shape Estimation From Point Clouds

l Improving the Robustness of Capsule Networks to Image Affine Transformations

l Noise Modeling, Synthesis and Classification for Generic Object Anti-Spoofing

l Quaternion Product Units for Deep Learning on 3D Rotation Groups

l Unsupervised Representation Learning for Gaze Estimation

l P-nets: Deep Polynomial Neural Networks

l Hierarchically Robust Representation Learning

l How Useful Is Self-Supervised Pretraining for Visual Tasks?

Face, Gesture, and Body Pose (3); Motion and Tracking (2)

Face, Gesture, and Body Poseの(3)とMotion and Tracking(2)です。(oral)

l Copy and Paste GAN: Face Hallucination From Shaded Thumbnails

l TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style

l Object-Occluded Human Shape and Pose Estimation From a Single Color Image

l Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

l Self-Supervised Monocular Scene Flow Estimation

l Learning Fast and Robust Target Models for Video Object Segmentation

l Reciprocal Learning Networks for Human Trajectory Prediction

l Nonparametric Object and Parts Modeling With Lie Group Dynamics

Image and Video Synthesis (2); Neural Generative Models

合成に関する内容ですが少し内容がNeural Generative Modelsにメインが置かれています。(oral)

l Learning to Shadow Hand-Drawn Sketches

l Intuitive, Interactive Beard and Hair Synthesis With Generative Models

l Semantic Pyramid for Image Generation

l SynSin: End-to-End View Synthesis From a Single Image

l A Characteristic Function Approach to Deep Implicit Generative Modeling

l High-Resolution Daytime Translation Without Domain Labels

l Leveraging 2D Data to Learn Textured 3D Mesh Generation

l Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

l Flow Contrastive Estimation of Energy-Based Models

Optimization and Learning Methods

”Optimization(最適化)”にメインが置かれています。(oral)

l Hardware-in-the-Loop End-to-End Optimization of Camera Image Processing Pipelines

l Search to Distill: Pearls Are Everywhere but Not the Eyes

l Total Deep Variation for Linear Inverse Problems

l Relative Interior Rule in Block-Coordinate Descent

l Learning Combinatorial Solver for Graph Matching

l SampleNet: Differentiable Point Cloud Sampling

l Can We Learn Heuristics for Graphical Model Inference Using Reinforcement Learning?

l Quasi-Newton Solver for Robust Non-Rigid Registration

l Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective

l Optimizing Rank-Based Metrics With Blackbox Differentiation

Face, Gesture, and Body Pose; Motion and Tracking; Image and Video Synthesis; Nearal Generative Models; Optimization and Learning Methods

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l DualSDF: Semantic Shape Manipulation Using a Two-Level Representation

l Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

l Deep Homography Estimation for Dynamic Scenes

l PF-Net: Point Fractal Network for 3D Point Cloud Completion

l On the Regularization Properties of Structured Dropout

l Learning Oracle Attention for High-Fidelity Face Completion

l Deep Image Spatial Transformation for Person Image Generation

l Learning to Optimize on SPD Manifolds

l Deep 3D Portrait From a Single Image

l RDCFace: Radial Distortion Correction for Face Recognition

l Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition

l MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis

l SAINT: Spatially Aware Interpolation NeTwork for Medical Slice Synthesis

l Recurrent Feature Reasoning for Image Inpainting

l Structure-Preserving Super Resolution With Gradient Guidance

l Epipolar Transformers

l Diversified Arbitrary Style Transfer via Deep Feature Perturbation

l MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks

l Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

l Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds

l Neural Point Cloud Rendering via Multi-Plane Projection

l Wish You Were Here: Context-Aware Human Generation

l Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content

l Breaking the Cycle - Colleagues Are All You Need

l Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

l ManiGAN: Text-Guided Image Manipulation

l Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions

l Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems

l Barycenters of Natural Images Constrained Wasserstein Barycenters for Image Morphing

l Guided Variational Autoencoder for Disentanglement Learning

l Cross-Spectral Face Hallucination via Disentangling Independent Factors

l Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules

l C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

l Cogradient Descent for Bilinear Optimization

l Instance-Aware Image Colorization

l Joint Training of Variational Auto-Encoder and Latent Energy-Based Model

l Adaptive Loss-Aware Quantization for Multi-Bit Networks

l ScopeFlow: Dynamic Scene Scoping for Optical Flow

l Video Super-Resolution With Temporal Group Attention

l Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

l 3D Photography Using Context-Aware Layered Depth Inpainting

l MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

l Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer

l Global Texture Enhancement for Fake Face Detection in the Wild

l Panoptic-Based Image Synthesis

l Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

l Learning to Cartoonize Using White-Box Cartoon Representations

l End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization

l Analyzing and Improving the Image Quality of StyleGAN

l Fashion Editing With Adversarial Parsing Learning

l Augment Your Batch: Improving Generalization Through Instance Repetition

l ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes

l An End-to-End Edge Aggregation Network for Moving Object Segmentation

l Learning Video Stabilization Using Optical Flow

l Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

l Robust Design of Deep Neural Networks Against Adversarial Attacks Based on Lyapunov Theory

l StarGAN v2: Diverse Image Synthesis for Multiple Domains

l Warping Residual Based Image Stitching for Large Parallax

l A U-Net Based Discriminator for Generative Adversarial Networks

l Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping

l When to Use Convolutional Neural Networks for Inverse Problems

l LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

l Affinity Graph Supervision for Visual Recognition

l Unsupervised Magnification of Posture Deviations Across Subjects

l Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network

l Fast Soft Color Segmentation

l Global Optimality for Point Set Registration Using Semidefinite Programming

l Image2StyleGAN++: How to Edit the Embedded Images?

l SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking

l EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning

l Hierarchical Pyramid Diverse Attention Networks for Face Recognition

l RGBD-Dog: Predicting Canine Pose from RGBD Sensors

l Multi-Scale Progressive Fusion Network for Single Image Deraining

l Learning a Neural 3D Texture Space From 2D Exemplars

l BachGAN: High-Resolution Image Synthesis From Salient Object Layout

l Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy

l On Positive-Unlabeled Classification in GAN

l DoveNet: Deep Image Harmonization via Domain Verification

l Noise Robust Generative Adversarial Networks

l Normalizing Flows With Multi-Scale Autoregressive Priors

l Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution

l Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

l GeoDA: A Geometric Framework for Black-Box Adversarial Attacks

l GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data

l An Internal Covariate Shift Bounding Algorithm for Deep Neural Networks by Unitizing Layers' Outputs

l A Unified Optimization Framework for Low-Rank Inducing Penalties

l Single-Side Domain Generalization for Face Anti-Spoofing

l The Knowledge Within: Methods for Data-Free Model Compression

l Scale-Space Flow for End-to-End Optimized Video Compression

l Dynamic Neural Relational Inference

Segmentation, Grouping and Shape (1)

Unetなど、segmentationに関する研究です。(oral)

l Real-Time Panoptic Segmentation From Dense Detections

l Deep Snake for Real-Time Instance Segmentation

l AdaCoSeg: Adaptive Shape Co-Segmentation With Group Consistency Loss

l Learning Dynamic Routing for Semantic Segmentation

l Boosting Semantic Human Matting With Coarse Annotations

l BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

l UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders

l Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence

l Deep Polarization Cues for Transparent Object Segmentation

l DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes

l F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

l Approximating shapes in images with low-complexity polygons

Explainable AI; Fairness, Accountability, Transparency and Ethics in Vision

AIの説明性や公平性、倫理と言った部分の論文です。(oral)

l Towards Visually Explaining Variational Autoencoders

l Towards Global Explanations of Convolutional Neural Networks With Concept Attribution

l Interpretable and Accurate Fine-grained Recognition via Region Grouping

l SAM: The Sensitivity of Attribution Methods to Hyperparameters

l High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks

l CNN-Generated Images Are Surprisingly Easy to Spot... for Now

l FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions

Transfer/Low-Shot/Semi/Unsupervised Learning (2)

Transfer/Low-Shot/Semi/Unsupervised Learningに関する(2)です。(oral)

l Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion

l Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

l HyperSTAR: Task-Aware Hyperparameters for Deep Networks

l ActBERT: Learning Global-Local Video-Text Representations

l State-Relabeling Adversarial Active Learning

l Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization

l A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning

l Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos

Segmentaiton, Grouping and Shape; Explainable AI; Fairness, Accountability, Transparency and Ethics in Vision; Transfer/Low-Shot/Semi/Unsupervised Learning

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l Few-Shot Open-Set Recognition Using Meta-Learning

l Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions

l Temporally Distributed Networks for Fast Video Semantic Segmentation

l Benchmarking the Robustness of Semantic Segmentation Models

l There and Back Again: Revisiting Backpropagation Saliency Methods

l Deep Semantic Clustering by Partition Confidence Maximisation

l StructEdit: Learning Structural Shape Variations

l Harmonizing Transferability and Discriminability for Adapting Object Detectors

l Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching

l CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

l Correlating Edge, Pose With Parsing

l VecRoad: Point-Based Iterative Graph Exploration for Road Graphs Extraction

l Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

l Hierarchical Human Parsing With Typed Part-Relation Reasoning

l Compositional Convolutional Neural Networks: A Deep Architecture With Innate Robustness to Partial Occlusion

l Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

l Learning Video Object Segmentation From Unlabeled Videos

l Part-Aware Context Network for Human Parsing

l SCOUT: Self-Aware Discriminant Counterfactual Explanations

l Weakly-Supervised Semantic Segmentation via Sub-Category Exploration

l Continual Learning With Extended Kronecker-Factored Approximate Curvature

l Phase Consistent Ecological Domain Adaptation

l AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification

l 3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation

l Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

l Adaptive Graph Convolutional Network With Attention Graph Clustering for Co-Saliency Detection

l A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection

l Deep Fair Clustering for Visual Learning

l Bidirectional Graph Reasoning Network for Panoptic Segmentation

l Exploit Clues From Views: Self-Supervised and Regularized Learning for Multiview Object Recognition

l Spherical Space Domain Adaptation With Robust Pseudo-Label Loss

l Stochastic Classifiers for Unsupervised Domain Adaptation

l Unsupervised Learning of Intrinsic Structural Representation Points

l PolyTransform: Deep Polygon Transformer for Instance Segmentation

l Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection

l Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet

l LT-Net: Label Transfer by Learning Reversible Voxel-Wise Correspondence for One-Shot Medical Image Segmentation

l FGN: Fully Guided Network for Few-Shot Instance Segmentation

l A Quantum Computational Approach to Correspondence Problems on Point Sets

l Data-Efficient Semi-Supervised Learning by Reliable Edge Mining

l NestedVAE: Isolating Common Factors via Weak Supervision

l Progressive Adversarial Networks for Fine-Grained Domain Adaptation

l A Disentangling Invertible Interpretation Network for Explaining Latent Representations

l Modeling the Background for Incremental Learning in Semantic Segmentation

l Interpreting the Latent Space of GANs for Semantic Face Editing

l Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

l Self-Learning With Rectification Strategy for Human Parsing

l Hyperbolic Visual Embedding Learning for Zero-Shot Recognition

l Sequential Mastery of Multiple Visual Tasks: Networks Naturally Learn to Learn and Forget to Forget

l Distilling Effective Supervision From Severe Label Noise

l Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

l CenterMask: Single Shot Instance Segmentation With Point Representation

l Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning

l MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images

l DLWL: Improving Detection for Lowshot Classes With Weakly Labelled Data

l Unsupervised Deep Shape Descriptor With Point Distribution Learning

l Stylization-Based Architecture for Fast Deep Exemplar Colorization

l Cars Can't Fly Up in the Sky: Improving Urban-Scene Segmentation via Height-Driven Attention Networks

l State-Aware Tracker for Real-Time Video Object Segmentation

l Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning

l ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition

l Multi-Scale Interactive Network for Salient Object Detection

l Interactive Multi-Label CNN Learning With Partial Labels

l ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

l Scene-Adaptive Video Frame Interpolation via Meta-Learning

l Action Segmentation With Joint Self-Supervised Temporal Domain Adaptation

l Pixel Consensus Voting for Panoptic Segmentation

l Minimizing Discrete Total Curvature for Image Processing

l Towards Robust Image Classification Using Sequential Attention Models

l Discovering Synchronized Subsets of Sequences: A Large Scale Solution

l Going Deeper With Lean Point Networks

l Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment

l Explainable Object-Induced Action Decision for Autonomous Vehicles

l Spatially Attentive Output Layer for Image Classification

l Attack to Explain Deep Representation

l Computing Valid P-Values for Image Segmentation by Selective Inference

l Unsupervised Learning From Video With Deep Neural Embeddings

l Partial Weight Adaptation for Robust DNN Inference

l Probability Weighted Compact Feature for Domain Adaptive Retrieval

l Where Does It End? - Reasoning About Hidden Surfaces by Object Intersection Constraints

l PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

l Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

l Transferring and Regularizing Prediction for Semantic Segmentation

l PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition

l Model Adaptation: Unsupervised Domain Adaptation Without Source Data

l Evade Deep Image Retrieval by Stashing Private Images in the Hash Space

l Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules

l ProAlignNet: Unsupervised Learning for Progressively Aligning Noisy Contours

l Attribution in Scale and Space

l Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Recognition (Detection, Categorization) (1)

認識(検出など)に関する論文です。(oral)

l Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

l Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels

l BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition

l Momentum Contrast for Unsupervised Visual Representation Learning

l Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

l Weakly Supervised Fine-Grained Image Classification via Guassian Mixture Model Oriented Discriminative Learning

l Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection

l Learning User Representations for Open Vocabulary Image Hashtag Prediction

l Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval

l Few-Shot Pill Recognition

l PointRend: Image Segmentation As Rendering

l ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

Video Analysis and Understanding

動画に関する論文です。また、Understandingですので、動画解析において今まで検討されていなかった部分の評価等を行ったという論文になります。(oral)

l Learning Temporal Co-Attention Models for Unsupervised Video Action Localization

l Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

l Uncertainty-Aware Score Distribution Learning for Action Quality Assessment

l Learning Interactions and Relationships Between Movie Characters

l Video Panoptic Segmentation

l Understanding Human Hands in Contact at Internet Scale

l End-to-End Learning of Visual Representations From Uncurated Instructional Videos

l You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

l Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection

l Learning to Measure the Static Friction Coefficient in Cloth Contact

l SpeedNet: Learning the Speediness in Videos

l Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

Vision & Language

視覚や言語に関する研究です。例えば、キャプション生成をユーザーの意図に沿ったキャプションを生成する手法などです。(oral)

l Visual-Textual Capsule Routing for Text-Based Video Segmentation

l Graph-Structured Referring Expression Reasoning in the Wild

l Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs

l Hierarchical Conditional Relation Networks for Video Question Answering

l REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

l Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

l SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions

l Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks

l Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

l Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

l Counterfactual Vision and Language Learning

l Iterative Context-Aware Graph Inference for Visual Dialog

l TA-Student VQA: Multi-Agents Training by Self-Questioning

Recognition (Detection, Categorization); Video Analysis and Understanding; Vision + Language

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l Exploring Self-Attention for Image Recognition

l Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

l Improving Convolutional Networks With Self-Calibrated Convolutions

l Modality Shifting Attention Network for Multi-Modal Video Question Answering

l Learning to Structure an Image With Few Colors

l On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

l From Paris to Berlin: Discovering Fashion Style Influences Around the World

l A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation

l G-TAD: Sub-Graph Localization for Temporal Action Detection

l Detailed 2D-3D Joint Representation for Human-Object Interaction

l One-Shot Adversarial Attacks on Visual Tracking With Dual Attention

l Rethinking Classification and Localization for Object Detection

l Correspondence Networks With Adaptive Neighbourhood Consensus

l Multiple Anchor Learning for Visual Object Detection

l PhraseCut: Language-Based Image Segmentation in the Wild

l Mask Encoding for Single Shot Instance Segmentation

l Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs

l Learning Unseen Concepts via Hierarchical Decomposition and Composition

l Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification

l In Defense of Grid Features for Visual Question Answering

l Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation

l Dense Regression Network for Video Grounding

l Neural Architecture Search for Lightweight Non-Local Networks

l Learning Saliency Propagation for Semi-Supervised Instance Segmentation

l Speech2Action: Cross-Modal Supervision for Action Recognition

l Normalized and Geometry-Aware Self-Attention Network for Image Captioning

l Memory Enhanced Global-Local Aggregation for Video Object Detection

l Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval

l LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks

l Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

l VQA With No Questions-Answers Training

l Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

l SaccadeNet: A Fast and Accurate Object Detector

l Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

l Video Object Grounding Using Semantic Roles in Language Description

l Designing Network Design Spaces

l 12-in-1: Multi-Task Vision and Language Representation Learning

l MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

l Listen to Look: Action Recognition by Previewing Audio

l Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

l Music Gesture for Visual Sound Separation

l Referring Image Segmentation via Cross-Modal Progressive Comprehension

l Cloth in the Wind: A Case Study of Physical Measurement Through Simulation

l The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

l CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection

l PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

l Graph Embedded Pose Clustering for Anomaly Detection

l Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

l Deepstrip: High-Resolution Boundary Refinement

l Smoothing Adversarial Domain Attack and P-Memory Reconsolidation for Cross-Domain Person Re-Identification

l Meshed-Memory Transformer for Image Captioning

l Learning From Noisy Anchors for One-Stage Object Detection

l Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

l Density-Based Clustering for 3D Object Detection in Point Clouds

l Few-Shot Video Classification via Temporal Alignment

l Densely Connected Search Space for More Flexible Neural Architecture Search

l Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning

l Warp to the Future: Joint Forecasting of Features and Feature Motion

l Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio

l Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

l Cross-Modal Cross-Domain Moment Alignment Network for Person Search

l Self-Training With Noisy Student Improves ImageNet Classification

l Learning Longterm Representations for Person Re-Identification Using Radio Signals

l LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation

l Learning Instance Occlusion for Panoptic Segmentation

l Vision-Dialog Navigation by Exploring Cross-Modal Memory

l ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

l NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

l Visual Commonsense R-CNN

l What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective

l EfficientDet: Scalable and Efficient Object Detection

l Fast Template Matching and Update for Video Object Tracking and Segmentation

l Counterfactual Samples Synthesizing for Robust Visual Question Answering

l Local-Global Video-Text Interactions for Temporal Grounding

l Set-Constrained Viterbi for Set-Supervised Action Segmentation

l Probabilistic Video Prediction From Noisy Data With a Posterior Confidence

l Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context

l Visual Grounding in Video for Unsupervised Word Translation

l Two Causal Principles for Improving Visual Dialog

l Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

l A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension

l Better Captioning With Sequence-Level Exploration

l Violin: A Large-Scale Dataset for Video-and-Language Inference

l RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge

l Graph Structured Network for Image-Text Matching

l Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data

l Multi-Modality Cross Attention Network for Image and Sentence Matching

l Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data

l Learning Augmentation Network via Influence Functions

l X-Linear Attention Networks for Image Captioning

Recognition (Detection, Categorization) (2)

認識(検出など)に関する論文(2)です。(oral)

l Unsupervised Person Re-Identification via Multi-Label Classification

l Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax

l What You See is What You Get: Exploiting Visibility for 3D Object Detection

l Deep Structure-Revealed Network for Texture Recognition

l Online Knowledge Distillation via Collaborative Learning

l Dynamic Convolution: Attention Over Convolution Kernels

l 3DSSD: Point-Based 3D Single Stage Object Detector

l Deep Degradation Prior for Low-Quality Image Classification

l ViBE: Dressing for Diverse Body Shapes

l Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

l SESS: Self-Ensembling Semi-Supervised 3D Object Detection

l Combining Detection and Tracking for Human Pose Estimation in Videos

Vision for Robotics and Autonomous Vehicles

ここはとにかくロボティックスに関する論文です。ロボット系はここを覗きましょう(oral)

l SAPIEN: A SimulAted Part-Based Interactive ENvironment

l RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

l SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving

l A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors

l Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks

l Efficient Derivative Computation for Cumulative B-Splines on Lie Groups

l RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real

l LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World

l Just Go With the Flow: Self-Supervised Scene Flow Estimation

l TITAN: Future Forecast Using Action Priors

Machine Learning Architectures and Formulations

機械学習自体の構造に関する論文です。(oral)

l Robust Learning Through Cross-Task Consistency

l Dynamic Refinement Network for Oriented and Densely Packed Object Detection

l AOWS: Adaptive and Optimal Network Width Search With Latency Constraints

l High-Dimensional Convolutional Networks for Geometric Pattern Recognition

l Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks

l Deep Iterative Surface Normal Estimation

l Dataless Model Selection With the Deep Frame Potential

l UNAS: Differentiable Architecture Search Meets Reinforcement Learning

l Local Context Normalization: Revisiting Local Normalization

Recognition (Detection, Categorization); Vision for Robotics and Autonomous Vehicles; Machine Learning Architectures and Formulations

ここからが上記の3つのセッションをまとめたポスターの内容になります。

l ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning

l Learning Situational Driving

l From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction

l Symmetry and Group in Attribute-Object Compositions

l Noise-Aware Fully Webly Supervised Object Detection

l 3D Part Guided Image Editing for Fine-Grained Object Understanding

l STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

l Rethinking Performance Estimation in Neural Architecture Search

l Feature-Metric Registration: A Fast Semi-Supervised Approach for Robust Point Cloud Registration Without Correspondences

l Learning Multi-View Camera Relocalization With Graph Neural Networks

l MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

l EcoNAS: Finding Proxies for Economical Neural Architecture Search

l Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

l Geometrically Principled Connections in Graph Neural Networks

l On Vocabulary Reliance in Scene Text Recognition

l Generating Accurate Pseudo-Labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations

l GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping

l PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation

l Through Fog High-Resolution Imaging Using Millimeter Wave Radar

l Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction

l D2Det: Towards High Quality Object Detection and Instance Segmentation

l LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention

l Orthogonal Convolutional Neural Networks

l Self-Robust 3D Point Recognition via Gather-Vector Guidance

l VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

l ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

l MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

l PnPNet: End-to-End Perception and Prediction With Tracking in the Loop

l Revisiting the Sibling Head in Object Detector

l Visual Reaction: Learning to Play Catch With Your Drone

l Prime Sample Attention in Object Detection

l SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

l KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

l SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel

l nuScenes: A Multimodal Dataset for Autonomous Driving

l PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation

l Probabilistic Pixel-Adaptive Refinement Networks

l Discovering Human Interactions With Novel Objects via Zero-Shot Learning

l Equalization Loss for Long-Tailed Object Recognition

l Learning Depth-Guided Convolutions for Monocular 3D Object Detection

l Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

l Don't Even Look Once: Synthesizing Features for Zero-Shot Detection

l EPOS: Estimating 6D Pose of Objects With Symmetries

l Train in Germany, Test in the USA: Making 3D Object Detectors Generalize

l Exploring Categorical Regularization for Domain Adaptive Object Detection

l Neural Implicit Embedding for Point Cloud Analysis

l Pose-Guided Visible Part Matching for Occluded Person ReID

l ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

l Exploring Data Aggregation in Policy Learning for Vision-Based Urban Autonomous Driving

l Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition

l Recognizing Objects From Any View With Object and Viewer-Centered Representations

l Gated Channel Transformation for Visual Recognition

l Non-Local Neural Networks With Grouped Bilinear Attentional Transforms

l Generative-Discriminative Feature Representations for Open-Set Recognition

l RPM-Net: Robust Point Matching Using Learned Features

l Sideways: Depth-Parallel Training of Video Models

l Basis Prediction Networks for Effective Burst Denoising With Large Kernels

l Private-kNN: Practical Differential Privacy for Computer Vision

l SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

l Structure Aware Single-Stage 3D Object Detection From Point Cloud

l Looking at the Right Stuff - Guided Semantic-Gaze for Autonomous Driving

l What's Hidden in a Randomly Weighted Neural Network?

l Structured Multi-Hashing for Model Compression

l DOPS: Learning to Detect 3D Objects and Predict Their 3D Shapes

l AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization

l GP-NAS: Gaussian Process Based Neural Architecture Search

l NAS-FCOS: Fast Neural Architecture Search for Object Detection

l TCTS: A Task-Consistent Two-Stage Framework for Person Search

l SCATTER: Selective Context Attentional Scene Text Recognizer

l Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation

l Hierarchical Scene Coordinate Classification and Regression for Visual Localization

l MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation

l Scalable Uncertainty for Computer Vision With Functional Variational Inference

l Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End

l Butterfly Transform: An Efficient FFT Based Neural Architecture Design

l A Certifiably Globally Optimal Solution to Generalized Essential Matrix Estimation

l MUXConv: Information Multiplexing in Convolutional Neural Networks

l PointGMM: A Neural GMM Network for Point Clouds

l Noisier2Noise: Learning to Denoise From Unpaired Noisy Data

l TRPLP - Trifocal Relative Pose From Lines at Points

l DSNAS: Direct Neural Architecture Search Without Parameter Retraining

l MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships

l Regularization on Spatio-Temporally Smoothed Feature for Action Recognition

l Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

l Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

l Inferring Attention Shift Ranks of Objects for Image Saliency

l Camera On-Boarding for Person Re-Identification Using Hypothesis Transfer Learning

l Joint Graph-Based Depth Refinement and Normal Estimation

l DR Loss: Improving Object Detection by Distributional Ranking

l Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection

l Few-Shot Class-Incremental Learning

l PolarMask: Single Shot Instance Segmentation With Polar Representation

l DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers

l Detection in Crowded Scenes: One Proposal, Multiple Predictions

l Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors

l Interactive Object Segmentation With Inside-Outside Guidance

l Mnemonics Training: Multi-Class Incremental Learning Without Forgetting

l Learning to Segment 3D Point Clouds in 2D Image Space

l Smooth Shells: Multi-Scale Shape Registration With Functional Maps

l Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

Vision Applications and Systems; Vision & Other Modalities; Visual Reasoning and Logical Representation

視覚系アプリケーションやシステムと言った研究で、視覚に関する分類ができなかったその他の論文になります。(oral)

l Efficient Neural Vision Systems Based on Convolutional Image Acquisition

l Visual Chirality

l What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images

l Dynamic Traffic Modeling From Overhead Imagery

l Satellite Image Time Series Classification With Pixel-Set Encoders and Temporal Self-Attention

l DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads

l Learning When and Where to Zoom With Deep Reinforcement Learning

Transfer/Low-Shot/Semi/Unsupervised Learning (3)

Transfer/Low-Shot/Semi/Unsupervised Learning (3)の論文になります。(oral)

l Cross-Domain Detection via Graph-Induced Prototype Alignment

l Meta-Learning of Neural Architectures for Few-Shot Learning

l Towards Inheritable Models for Open-Set Domain Adaptation

l Learning From Synthetic Animals

l Distilling Cross-Task Knowledge via Relationship Matching

l Open Compound Domain Adaptation

Recognition (Detection, Categorization); Segmentation, Grouping and Shape; Vision Applications and Systems; Vision & Other Modalities; Transfer/Low-Shot/Semi/Unsupervised Learning

ここからが上記の3つのセッションをまとめたポスターの内容です。

l Context Prior for Scene Segmentation

l Tangent Images for Mitigating Spherical Distortion

l Learning a Dynamic Map of Visual Appearance

l Webly Supervised Knowledge Embedding Model for Visual Reasoning

l Gradually Vanishing Bridge for Adversarial Domain Adaptation

l Active Speakers in Context

l Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

l Inter-Region Affinity Distillation for Road Marking Segmentation

l Unified Dynamic Convolutional Network for Super-Resolution With Variational Degradations

l Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks

l Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN