Image Recognition Articles | AI-SCHOLAR.TECH | AI-SCHOLAR | AI: (Artificial Intelligence) Articles and technical information media

MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-dollar Models

MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-doll ...

25/12/2024 Image Generation

Human-robot Cooperative Assembly Realized By Large-scale Language Models

24/12/2024 Robot

GenAI-Arena] New Platform To Evaluate Generated Models By User Votes

20/12/2024 Large Language Models

SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches

18/12/2024 Large Language Models

Plot2Code] Benchmark For Testing Multimodal LLM Code Generation

17/12/2024 Large Language Models

LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM

13/12/2024 Large Language Models

YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature!

22/11/2024 Dataset

Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V

06/11/2024 Large Language Models

MMSEARCH] Multimodal Search System Integrating Image And Text

29/10/2024 Large Language Models

GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders

13/10/2024 Large Language Models

Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions

01/10/2024 Large Language Models

TryOnDiffusion: The Most Powerful Model For Generating Fitting Images

30/09/2024 Image Generation

See Finer, See More: Implicit Modality Alignment For Text-Based Person Search

29/09/2024 Deep Learning

[OmniGen] All Image-related Tasks Can Be Performed With Only One Generation Model!

29/09/2024 Image Generation

Image Recognition

MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-dollar Models

MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-doll ...

Human-robot Cooperative Assembly Realized By Large-scale Language Models

Human-robot Cooperative Assembly Realized By Large-scale Language Models

GenAI-Arena] New Platform To Evaluate Generated Models By User Votes

GenAI-Arena] New Platform To Evaluate Generated Models By User Votes

SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches

SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches

Plot2Code] Benchmark For Testing Multimodal LLM Code Generation

Plot2Code] Benchmark For Testing Multimodal LLM Code Generation

LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM

LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM

YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature!

YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature!

Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V

Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V

MMSEARCH] Multimodal Search System Integrating Image And Text

MMSEARCH] Multimodal Search System Integrating Image And Text

GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders

GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders

Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions

Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions

TryOnDiffusion: The Most Powerful Model For Generating Fitting Images

TryOnDiffusion: The Most Powerful Model For Generating Fitting Images

See Finer, See More: Implicit Modality Alignment For Text-Based Person Search

See Finer, See More: Implicit Modality Alignment For Text-Based Person Search

[OmniGen] All Image-related Tasks Can Be Performed With Only One Generation Model!

[OmniGen] All Image-related Tasks Can Be Performed With Only One Generation Model!

[LDDGAN] Diffusion Model With The Highest Speed Inference

[LDDGAN] Diffusion Model With The Highest Speed Inference

[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks

[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks

New Frontier Of Deep Faking Detection Using CLIP

New Frontier Of Deep Faking Detection Using CLIP

GenTron: Diffusion Transformers For Image And Video Generation

GenTron: Diffusion Transformers For Image And Video Generation