Catch up on the latest AI articles

What is AI-SCHOLAR?

LongVie: A New Era Of 1-minute Ultra-High Quality Video Generation Realized By Multimodal Control

LongVie: A New Era Of 1-minute Ultra-High Quality Video Generation Realized By Multimodal Control

HiWave: Innovation In Wavelet Diffusion Generation For 4K Images Without Additional Learning

HiWave: Innovation In Wavelet Diffusion Generation For 4K Images Without Additional Learning

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robots

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robo ...

What Is DualTHOR? Next Generation Simulator For Dual-Arm Robots' Adaptability To Reality

What Is DualTHOR? Next Generation Simulator For Dual-Arm Robots' Adaptability To Reality

Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge

Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge

Toward AI That Doesn't Forget Images, CoMemo Pioneers Next-generation Vision And Language Models

Toward AI That Doesn't Forget Images, CoMemo Pioneers Next-generation Vision And Language Models

PictSure: A New Method To Challenge Few-Shot Classification With The Power Of Visual Embedding

PictSure: A New Method To Challenge Few-Shot Classification With The Power Of Visual Embedding

Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency

Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency

Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought

Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought

Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation

Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation

SOK-Bench] Situational Video Inference Benchmark Using Real-World Knowledge In Video

SOK-Bench] Situational Video Inference Benchmark Using Real-World Knowledge In Video

28/02/2025 Computer Vision

Vript-Hard, A New Benchmark For Testing Comprehension Of Long-form Video

Vript-Hard, A New Benchmark For Testing Comprehension Of Long-form Video

21/01/2025 Large Language Models

Machine Learning In Non-Euclidean Space Enabled By The Kuramoto Model

Machine Learning In Non-Euclidean Space Enabled By The Kuramoto Model

04/12/2024 Computer Vision

[InsectMamba] Classification Of Pests Using State Space Models To Support Smart Agriculture

[InsectMamba] Classification Of Pests Using State Space Models To Support Smart Agriculture

04/09/2024 Computer Vision

[CoMat] Resolve The Discrepancy Between Text And Image

[CoMat] Resolve The Discrepancy Between Text And Image

28/08/2024 Computer Vision

[OW-VISCap] Look Out For Unseen Objects - A New Approach To Understanding Open World Video

[OW-VISCap] Look Out For Unseen Objects - A New Approach To Understanding Open World Video

21/08/2024 Computer Vision

Assessing The Robustness Of Zero-shot Image Understanding Models Through CLIP

Assessing The Robustness Of Zero-shot Image Understanding Models Through CLIP

24/06/2024 Contrastive Learning

[VideoAgent] Understanding Long-form Video Using A Large-scale Language Model As An Agent

[VideoAgent] Understanding Long-form Video Using A Large-scale Language Model As An Agent

21/06/2024 Computer Vision