Multimodal
GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation
GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation
ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning
ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning
A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal LLMs!
A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal ...
UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation
UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation
Other
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models
The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...
Large Language Models
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoning Ability!
Giving LLMs A Whiteboard To Write Down Their Reasoning Process Greatly Improves Their Visual Reasoni ...
Prompting Method
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches
SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches
Large Language Models
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation
Large Language Models
Cross-Layer Attention Significantly Reduces Transformer Memory
Cross-Layer Attention Significantly Reduces Transformer Memory
Transformer
Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V
Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V
Large Language Models
MMSEARCH] Multimodal Search System Integrating Image And Text
MMSEARCH] Multimodal Search System Integrating Image And Text
Large Language Models
Systematic Investigation Of Gen-RecSys, A Recommender System Evolving With Generative And Large-scale Language Models
Systematic Investigation Of Gen-RecSys, A Recommender System Evolving With Generative And Large-scal ...
Large Language Models
Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions
Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions
Large Language Models
[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks
[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks
Large Language Models
Bread That "tastes Like Love" Created By AI X Craftsmanship
Bread That "tastes Like Love" Created By AI X Craftsmanship
Human-Computer Interaction
Ferret-UI, A Multimodal Large-scale Language Model For Mobile UI
Ferret-UI, A Multimodal Large-scale Language Model For Mobile UI
Large Language Models