Catch up on the latest AI articles

What is AI-SCHOLAR?

LongVie: A New Era Of 1-minute Ultra-High Quality Video Generation Realized By Multimodal Control

LongVie: A New Era Of 1-minute Ultra-High Quality Video Generation Realized By Multimodal Control

Skywork UniPic: Next-generation Multimodal Model That Integrates Image Understanding, Generation, And Editing With High Efficiency

Skywork UniPic: Next-generation Multimodal Model That Integrates Image Understanding, Generation, An ...

MATE: Multi-agent Accessibility-specific Modality Transformation Framework

MATE: Multi-agent Accessibility-specific Modality Transformation Framework

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robots

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robo ...

What Is DualTHOR? Next Generation Simulator For Dual-Arm Robots' Adaptability To Reality

What Is DualTHOR? Next Generation Simulator For Dual-Arm Robots' Adaptability To Reality

Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge

Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge

FedNano: Lightweight And Efficient Distributed Learning Of Large-scale Multimodal Models

FedNano: Lightweight And Efficient Distributed Learning Of Large-scale Multimodal Models

ImmerseGen: Agent-guided, Lightweight X Highly Realistic Next-generation VR Scene Generation

ImmerseGen: Agent-guided, Lightweight X Highly Realistic Next-generation VR Scene Generation

Toward AI That Doesn't Forget Images, CoMemo Pioneers Next-generation Vision And Language Models

Toward AI That Doesn't Forget Images, CoMemo Pioneers Next-generation Vision And Language Models

SCIVER's Future: The Frontiers Of Multimodal Scientific Claim Verification

SCIVER's Future: The Frontiers Of Multimodal Scientific Claim Verification

The Challenge Of "Embodied Web Agents," The Next Generation AI That Fuses The Physical And Digital

The Challenge Of "Embodied Web Agents," The Next Generation AI That Fuses The Physical And Digital

GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation

GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation

ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning

ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning

A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal LLMs!

A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal ...

UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation

UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation

30/06/2025 Other

Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought

Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...

24/01/2025 Large Language Models

Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoning Ability!

Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoni ...

26/12/2024 Prompting Method