Catch up on the latest AI articles

What is AI-SCHOLAR?

MATE: Multi-agent Accessibility-specific Modality Transformation Framework

MATE: Multi-agent Accessibility-specific Modality Transformation Framework

The Time Has Come For Everyone To Speak English! Zero-shot Text-to-speech Technology For Multiple Languages Makes It Easy For Anyo ...

The Time Has Come For Everyone To Speak English! Zero-shot Text-to-speech Technology For Multiple La ...

04/02/2025 Speech Recognition For The Dysarthric

Zero-shot Learning] AI Voice Cloning And Lip-syncing Verification And Explanation

Zero-shot Learning] AI Voice Cloning And Lip-syncing Verification And Explanation

29/01/2025 Neural Network

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...

24/01/2025 Large Language Models

[HiFi-GAN] GAN-based Vocoder Capable Of Generating 22 KHz Audio On A Single GPU

[HiFi-GAN] GAN-based Vocoder Capable Of Generating 22 KHz Audio On A Single GPU

10/07/2024 Speech Synthesis

[VoiceCraft] A Language Model That Synthesizes Natural Speech At The Highest Level In The Industry

[VoiceCraft] A Language Model That Synthesizes Natural Speech At The Highest Level In The Industry

01/07/2024 Speech Synthesis

[MusicLDM] Text-to-Music Model With Low Risk Of Plagiarism

[MusicLDM] Text-to-Music Model With Low Risk Of Plagiarism

22/01/2024 Diffusion Model

CLAP] Contrastive Learning Model Of Speech And Text

CLAP] Contrastive Learning Model Of Speech And Text

21/12/2023 Contrastive Learning

LP-MusicCaps] Automatic Generation Of Music Captions Using LLM

LP-MusicCaps] Automatic Generation Of Music Captions Using LLM

20/11/2023 Contrastive Learning

Now There's A Technique For Editing The Facial Movements Of Characters In A Video To Match Any Emotion!

Now There's A Technique For Editing The Facial Movements Of Characters In A Video To Match Any Emoti ...

05/08/2022 CVPR

FreeMo, A Model That Automatically Generates Upper Body Gestures In Response To Speech, Is Here!

FreeMo, A Model That Automatically Generates Upper Body Gestures In Response To Speech, Is Here!

19/07/2022 Speech Synthesis