Image Recognition
Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation
Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation
Open Vocabulary Object Detection Enabled By OWL-ViT
Open Vocabulary Object Detection Enabled By OWL-ViT
Neural Network
Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems
Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems
Large Language Models
MVANet: The Most Powerful Model For Background Removal
MVANet: The Most Powerful Model For Background Removal
Neural Network
Zero-shot Learning] AI Voice Cloning And Lip-syncing Verification And Explanation
Zero-shot Learning] AI Voice Cloning And Lip-syncing Verification And Explanation
Neural Network
MaskDiT: Low Learning Cost Diffusion Model For Image Generation
MaskDiT: Low Learning Cost Diffusion Model For Image Generation
Image Generation
E-commerce Background Image Generation Based On Product Category And Brand Style
E-commerce Background Image Generation Based On Product Category And Brand Style
Image Generation
MimicBrush, A New Image Editing Method "Imitative Editing" Is Proposed
MimicBrush, A New Image Editing Method "Imitative Editing" Is Proposed
Image Editing
Object Background Generation Using Text-2-Image Diffusion Model
Object Background Generation Using Text-2-Image Diffusion Model
Image Generation
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoning Ability!
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoni ...
Prompting Method
MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-dollar Models
MicroDiffusion: A Thousand-dollar Generative Image Quality Model That Outperforms Multi-million-doll ...
Image Generation
Human-robot Cooperative Assembly Realized By Large-scale Language Models
Human-robot Cooperative Assembly Realized By Large-scale Language Models
Robot
GenAI-Arena] New Platform To Evaluate Generated Models By User Votes
GenAI-Arena] New Platform To Evaluate Generated Models By User Votes
Large Language Models
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches
Large Language Models
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation
Large Language Models
LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM
LAVE, An Agent-assisted Video Editing Tool That Utilizes LLM
Large Language Models
YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature!
YesBut: The Emergence Of A Dataset That Makes The VLM Understand Irony And Caricature!
Dataset