Computer Vision
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation
Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation
SOK-Bench] Situational Video Inference Benchmark Using Real-World Knowledge In Video
SOK-Bench] Situational Video Inference Benchmark Using Real-World Knowledge In Video
Computer Vision
Vript-Hard, A New Benchmark For Testing Comprehension Of Long-form Video
Vript-Hard, A New Benchmark For Testing Comprehension Of Long-form Video
Large Language Models
Machine Learning In Non-Euclidean Space Enabled By The Kuramoto Model
Machine Learning In Non-Euclidean Space Enabled By The Kuramoto Model
Computer Vision
[InsectMamba] Classification Of Pests Using State Space Models To Support Smart Agriculture
[InsectMamba] Classification Of Pests Using State Space Models To Support Smart Agriculture
Computer Vision
[CoMat] Resolve The Discrepancy Between Text And Image
[CoMat] Resolve The Discrepancy Between Text And Image
Computer Vision
[OW-VISCap] Look Out For Unseen Objects - A New Approach To Understanding Open World Video
[OW-VISCap] Look Out For Unseen Objects - A New Approach To Understanding Open World Video
Computer Vision
Assessing The Robustness Of Zero-shot Image Understanding Models Through CLIP
Assessing The Robustness Of Zero-shot Image Understanding Models Through CLIP
Contrastive Learning
[VideoAgent] Understanding Long-form Video Using A Large-scale Language Model As An Agent
[VideoAgent] Understanding Long-form Video Using A Large-scale Language Model As An Agent
Computer Vision
[Segment Anything] Zero-shot Segmentation Model
[Segment Anything] Zero-shot Segmentation Model
Segmentation
Apple Developed A Large Scale Autoregressive Image Model That Is Scalable Like An LLM.
Apple Developed A Large Scale Autoregressive Image Model That Is Scalable Like An LLM.
Computer Vision
[Swin Transformer] Transformer-based Image Recognition Models To Keep Now!
[Swin Transformer] Transformer-based Image Recognition Models To Keep Now!
Image Recognition
[DiffYOLO] Innovative Framework Improves Object Detection With Low Quality Data
[DiffYOLO] Innovative Framework Improves Object Detection With Low Quality Data
Computer Vision
InstructPix2Pix: A New Model For Image Editing At The User's Direction
InstructPix2Pix: A New Model For Image Editing At The User's Direction
Computer Vision
[mPLUG-Owl] Developing An LLM That Can Understand Images And Text
[mPLUG-Owl] Developing An LLM That Can Understand Images And Text
Computation And Language
T2I-Adapter: Frontiers In Text-to-Image Conversion Technology
T2I-Adapter: Frontiers In Text-to-Image Conversion Technology
Computer Vision