MVANet: The Most Powerful Model For Background Removal MVANet: The Most Powerful Model For Background Removal 31/01/2025 Neural Network
Zero-shot Learning] AI Voice Cloning And Lip-syncing Verification And Explanation Zero-shot Learning] AI Voice Cloning And Lip-syncing Verification And Explanation 29/01/2025 Neural Network
MaskDiT: Low Learning Cost Diffusion Model For Image Generation MaskDiT: Low Learning Cost Diffusion Model For Image Generation 27/01/2025 Image Generation
E-commerce Background Image Generation Based On Product Category And Brand Style E-commerce Background Image Generation Based On Product Category And Brand Style 17/01/2025 Image Generation
MimicBrush, A New Image Editing Method "Imitative Editing" Is Proposed MimicBrush, A New Image Editing Method "Imitative Editing" Is Proposed 16/01/2025 Image Editing
Object Background Generation Using Text-2-Image Diffusion Model Object Background Generation Using Text-2-Image Diffusion Model 10/01/2025 Image Generation
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoning Ability! Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoni ... 26/12/2024 Prompting Method
MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-dollar Models MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-doll ... 25/12/2024 Image Generation
Human-robot Cooperative Assembly Realized By Large-scale Language Models Human-robot Cooperative Assembly Realized By Large-scale Language Models 24/12/2024 Robot
GenAI-Arena] New Platform To Evaluate Generated Models By User Votes GenAI-Arena] New Platform To Evaluate Generated Models By User Votes 20/12/2024 Large Language Models
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches 18/12/2024 Large Language Models
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation Plot2Code] Benchmark For Testing Multimodal LLM Code Generation 17/12/2024 Large Language Models
LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM 13/12/2024 Large Language Models
YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature! YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature! 22/11/2024 Dataset
Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V 06/11/2024 Large Language Models
MMSEARCH] Multimodal Search System Integrating Image And Text MMSEARCH] Multimodal Search System Integrating Image And Text 29/10/2024 Large Language Models
GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders 13/10/2024 Large Language Models
Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions 01/10/2024 Large Language Models