SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches SKETCHPAD] Enhanced Inference Of Multimodal Language Models With Intermediate Sketches 18/12/2024 Large Language Models
Plot2Code] Benchmark For Testing Multimodal LLM Code Generation Plot2Code] Benchmark For Testing Multimodal LLM Code Generation 17/12/2024 Large Language Models
Cross-Layer Attention Significantly Reduces Transformer Memory Cross-Layer Attention Significantly Reduces Transformer Memory 10/12/2024 Transformer
Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V Comprehensive Evaluation Of Generalized Emotion Recognition (GER) Using The GPT-4V 06/11/2024 Large Language Models
MMSEARCH] Multimodal Search System Integrating Image And Text MMSEARCH] Multimodal Search System Integrating Image And Text 29/10/2024 Large Language Models
Systematic Investigation Of Gen-RecSys, A Recommender System Evolving With Generative And Large-scale Language Models Systematic Investigation Of Gen-RecSys, A Recommender System Evolving With Generative And Large-scal ... 28/10/2024 Large Language Models
Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions 01/10/2024 Large Language Models
[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks [NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks 27/09/2024 Large Language Models
Bread That "tastes Like Love" Created By AI X Craftsmanship Bread That "tastes Like Love" Created By AI X Craftsmanship 25/09/2024 Human-Computer Interaction
Ferret-UI, A Multimodal Large-scale Language Model For Mobile UI Ferret-UI, A Multimodal Large-scale Language Model For Mobile UI 02/09/2024 Large Language Models
Fusion Of Speech And Image! Does The Multimodal Method "AV-HuBERT" Shine In Speech Recognition For The Dysarthric? Fusion Of Speech And Image! Does The Multimodal Method "AV-HuBERT" Shine In Speech Recognition For T ... 31/08/2024 Speech Recognition For The Dysarthric
SkySense: Multimodal Remote Sensing Foundation Model SkySense: Multimodal Remote Sensing Foundation Model 30/08/2024 CVPR
ScreenAI" Understands Images And Text From Infographics To UI ScreenAI" Understands Images And Text From Infographics To UI 24/06/2024 Large Language Models