Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions Qwen2-VL] Latest VLM That Can Process Images And Videos In Different Resolutions 01/10/2024 Large Language Models
[NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks [NVLM] Multimodal LLM Outperforms GPT-4o In Image And Language Tasks 27/09/2024 Large Language Models
Bread That "tastes Like Love" Created By AI X Craftsmanship Bread That "tastes Like Love" Created By AI X Craftsmanship 25/09/2024 Human-Computer Interaction
Ferret-UI, A Multimodal Large-scale Language Model For Mobile UI Ferret-UI, A Multimodal Large-scale Language Model For Mobile UI 02/09/2024 Large Language Models
Fusion Of Speech And Image! Does The Multimodal Method "AV-HuBERT" Shine In Speech Recognition For The Dysarthric? Fusion Of Speech And Image! Does The Multimodal Method "AV-HuBERT" Shine In Speech Recognition For T ... 31/08/2024 Speech Recognition For The Dysarthric
SkySense: Multimodal Remote Sensing Foundation Model SkySense: Multimodal Remote Sensing Foundation Model 30/08/2024 CVPR
ScreenAI" Understands Images And Text From Infographics To UI ScreenAI" Understands Images And Text From Infographics To UI 24/06/2024 Large Language Models