Multimodal
What Is DualTHOR? Next Generation Simulator For Dual-Arm Robots' Adaptability To Reality
What Is DualTHOR? Next Generation Simulator For Dual-Arm Robots' Adaptability To Reality
Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge
Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge
FedNano: Lightweight And Efficient Distributed Learning Of Large-scale Multimodal Models
FedNano: Lightweight And Efficient Distributed Learning Of Large-scale Multimodal Models
ImmerseGen: Agent-guided, Lightweight X Highly Realistic Next-generation VR Scene Generation
ImmerseGen: Agent-guided, Lightweight X Highly Realistic Next-generation VR Scene Generation
Toward AI That Doesn't Forget Images, CoMemo Pioneers Next-generation Vision And Language Models
Toward AI That Doesn't Forget Images, CoMemo Pioneers Next-generation Vision And Language Models
SCIVER's Future: The Frontiers Of Multimodal Scientific Claim Verification
SCIVER's Future: The Frontiers Of Multimodal Scientific Claim Verification
The Challenge Of "Embodied Web Agents," The Next Generation AI That Fuses The Physical And Digital
The Challenge Of "Embodied Web Agents," The Next Generation AI That Fuses The Physical And Digital
GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation
GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation
ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning
ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning
A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal LLMs!
A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal ...
UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation
UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation
Other
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models
The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...
Large Language Models
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoning Ability!
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoni ...
Prompting Method
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches
Large Language Models
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation
Large Language Models
Cross-Layer Attention Significantly Reduces Transformer Memory
Cross-Layer Attention Significantly Reduces Transformer Memory
Transformer