Articles
The Challenge Of "Embodied Web Agents," The Next Generation AI That Fuses The Physical And Digital
The Challenge Of "Embodied Web Agents," The Next Generation AI That Fuses The Physical And Digital
A New Wave Of Multispeaker Speech Recognition! The Challenge Of High Accuracy Systems By DiCoW And DiariZen
A New Wave Of Multispeaker Speech Recognition! The Challenge Of High Accuracy Systems By DiCoW And D ...
GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation
GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation
ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning
ProtoReasoning: General-purpose Reasoning Skills Honed Through Logic And Planning
A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal LLMs!
A Proposal For Mixed-first Optimization That Revolutionizes The Inference Performance Of Multimodal ...
UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation
UnifiedCrawl: A New Approach To Low-Resource Language Data Collection And Efficient LLM Adaptation
Other
OpenScholar: Knowledge Synthesis And Reliability Enhancement Of Scientific Literature With LLM
OpenScholar: Knowledge Synthesis And Reliability Enhancement Of Scientific Literature With LLM
LLMs As Mentors Instead Of Humans? Reinforcement Learning Agents Trained In Natural Language
LLMs As Mentors Instead Of Humans? Reinforcement Learning Agents Trained In Natural Language
Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency
Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency
Hymba, A New Architecture That Pushes The Limits Of Small LLMs
Hymba, A New Architecture That Pushes The Limits Of Small LLMs
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
Insight-V: A New Strategy For Multimodal Reasoning Connecting Vision And Thought
Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation
Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation
Open Vocabulary Object Detection Enabled By OWL-ViT
Open Vocabulary Object Detection Enabled By OWL-ViT
Neural Network
SOK-Bench] Situational Video Inference Benchmark Using Real-World Knowledge In Video
SOK-Bench] Situational Video Inference Benchmark Using Real-World Knowledge In Video
Computer Vision
Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems
Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems
Large Language Models
DrHouse] Diagnostic System Using Sensor Information And Expertise
DrHouse] Diagnostic System Using Sensor Information And Expertise
Medical
A New Method For Global Description Of Heterogeneous Graph Neural Networks Using Description Logic
A New Method For Global Description Of Heterogeneous Graph Neural Networks Using Description Logic
GNN
Wavelet Diffusion: The Fastest Diffusion Model
Wavelet Diffusion: The Fastest Diffusion Model
Image Generation
Improved Accuracy And Transparency Of Face Recognition With ChatGPT, New Developments In Soft Biometrics
Improved Accuracy And Transparency Of Face Recognition With ChatGPT, New Developments In Soft Biomet ...
Large Language Models
[RL-GPT] A Framework To Acquire Diamonds Several Times Faster Than Usual With Mincraft Is Now Available
[RL-GPT] A Framework To Acquire Diamonds Several Times Faster Than Usual With Mincraft Is Now Availa ...
Machine Learning
A Framework Is Now Available That Allows LLMs To Assess Human Personality Using The MBTI!
A Framework Is Now Available That Allows LLMs To Assess Human Personality Using The MBTI!
ChatGPT
Mask R-CNN: Efficient Detection Of Objects In Images
Mask R-CNN: Efficient Detection Of Objects In Images
Computer Vision
The First Framework To Utilize LLM To Detect Fake News Is Now Available!
The First Framework To Utilize LLM To Detect Fake News Is Now Available!
Fakenews
Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge
Democratizing GPT-4o Level Image Generation: The Janus-4o And ShareGPT-4o-Image Challenge
Graphs Are So Awesome! Review Of Integration With Deep Learning
Graphs Are So Awesome! Review Of Integration With Deep Learning
GNN
LLM Agents Successfully Lead Customers To Purchase 35% Of The Time!
LLM Agents Successfully Lead Customers To Purchase 35% Of The Time!
ChatGPT
Seed Diffusion Preview: Next-generation Code Generation Model That Combines Fast Inference And High Performance
Seed Diffusion Preview: Next-generation Code Generation Model That Combines Fast Inference And High ...
Superior To ViT! A New Underlying Model For Large-Scale CNNs! : InternImage
Superior To ViT! A New Underlying Model For Large-Scale CNNs! : InternImage
Deep Learning
ImageReward: A Reward Model That Learns Human Evaluation In Text-to-image
ImageReward: A Reward Model That Learns Human Evaluation In Text-to-image
Alignment
RLHF: How To Train Reinforcement Learning Agents Using Human Evaluation
RLHF: How To Train Reinforcement Learning Agents Using Human Evaluation
Alignment
Integration Of Large-scale Language Models In HCI Research And Ethical Issues
Integration Of Large-scale Language Models In HCI Research And Ethical Issues
Large Language Models