MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-dollar Models MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-doll ... 25/12/2024 Image Generation
Human-robot Cooperative Assembly Realized By Large-scale Language Models Human-robot Cooperative Assembly Realized By Large-scale Language Models 24/12/2024 Robot
GenAI-Arena] New Platform To Evaluate Generated Models By User Votes GenAI-Arena] New Platform To Evaluate Generated Models By User Votes 20/12/2024 Large Language Models
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches 18/12/2024 Large Language Models
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation Plot2Code] Benchmark For Testing Multimodal LLM Code Generation 17/12/2024 Large Language Models
LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM 13/12/2024 Large Language Models
YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature! YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature! 22/11/2024 Dataset
Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V 06/11/2024 Large Language Models
MMSEARCH] Multimodal Search System Integrating Image And Text MMSEARCH] Multimodal Search System Integrating Image And Text 29/10/2024 Large Language Models
GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders 13/10/2024 Large Language Models
Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions 01/10/2024 Large Language Models
TryOnDiffusion: The Most Powerful Model For Generating Fitting Images TryOnDiffusion: The Most Powerful Model For Generating Fitting Images 30/09/2024 Image Generation
See Finer, See More: Implicit Modality Alignment For Text-Based Person Search See Finer, See More: Implicit Modality Alignment For Text-Based Person Search 29/09/2024 Deep Learning
[OmniGen] All Image-related Tasks Can Be Performed With Only One Generation Model! [OmniGen] All Image-related Tasks Can Be Performed With Only One Generation Model! 29/09/2024 Image Generation
[LDDGAN] Diffusion Model With The Highest Speed Inference [LDDGAN] Diffusion Model With The Highest Speed Inference 29/09/2024 Diffusion Model
[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks [NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks 27/09/2024 Large Language Models
New Frontier Of Deep Faking Detection Using CLIP New Frontier Of Deep Faking Detection Using CLIP 30/08/2024 Fake Detection
GenTron: Diffusion Transformers For Image And Video Generation GenTron: Diffusion Transformers For Image And Video Generation 26/08/2024 Image Generation