
Question The "norm"! Noise Suppression Using Ultra-low Complexity DNN
3 main points
✔️ Successfully developed DNN without structural complexity! Significant reduction in computation and model size while maintaining state-of-the-art performance
✔️ Two-stage processing framework that balances computational efficiency and speech enhancement performance
✔️ Modified sound source compression method improves subjective test performance
Ultra Low Complexity Deep Learning Based Noise Suppression
written by Shrishti Saha Shetu,Soumitro Chakrabarty,Oliver Thiergart,Edwin Mabande
[Submitted on 13 Dec 2023]
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
code:
The images used in this article are from the paper, the introductory slides, or were created based on them.
Aiming for Clearer and More Audible Voice...
Read at Least This Part! Super Summary of The Paper!
I am sure many of you have experienced that the telephone calls and recordedaudio of the past had some kind of rough noise in them, but you don't feel much noise in the recent calls and recorded audio. Eh! I don't care about that,
Well, don't say that... The background to this is the use of noise suppression (denoising) technology. Beforemachinelearning, the mainstream method for processing audio signals was to look at the waveform of the audio and see how to process the noise.
However, there is a problem with machine learning.The amount of computation required for learning is enormous, and as a result, the size of the model grows. This makes it impossible to use on small devices (such as smartphones) that have limited computing power,
This paper addresses the question of how to reduce computational complexity while maintaining high-performance denoising.
As a result, we have succeeded in reducing computational complexity and model size by approximately 80%, while maintaining the same processing performance as state-of-the-art models.
In the past, it was commonly believed that larger models were necessary to achieve high performance. However, this study shows that more efficient and effective architecture and compression methods are more important than the size of the model.
In the speech recognition area, too, the wave of larger models is sweeping the industry, but building a large model requires large amounts of data and money, and universities cannot compete.
However, it is a different story when it comes to reviewing models and aiming for greater efficiency. I believe that many artificial intelligence models originating from universities will appear in the future , and their application to small devices will become more popular.
What is The Two-Step Processing Framework? What Kind of Structure Does It Have...
My greatest thanks to you for reading this far!
Now that you' ve read this far, you're interested in this paper, right? Let's take it a little further from here...
Now look at the diagram above. No one should be able to understand this in an instant. I will take my time to explain it in as much detail as possible. I think this is one of the most important and interesting parts of the paper.
I would like to explain the two-stage processing framework, which is the model proposed in this paper and which I briefly touched on in the previous general summary, though again in a rough manner.
Before we get into the explanation of the main unit, let's take a quick look back at the background that led to the development of this model. The field of noise suppression, which is the subject of this article, used to be done by a method called speech signal processing. That has now been replaced by machine learning. But machine learning is computationally expensive and the models are large, so it is difficult to apply it to small devices.
In other words, we want the amazing high performance of machine learning!But we also want to realize a small size, which is also a selfish desire.
Now let's bite into the structure of the model. We'll start with the first step (red dotted line).
- Noisy signal input
- Pre-processing is added to the signal and features are extracted.
- Each channel of the audio signal receives processing
- Split into multiple pieces, each receiving its own treatment.
- The separated features are again combined to generate an intermediate mask.
- After calculation, features are generated
It looks like this. Then two steps (black line)
- Features generated in one step to CNN
- After passing through the convolution layer, a mask is generated
- Estimation of noise-free speech begins using masks.
- Noiseless audio is generated using compression methods
By following this process, effective speech enhancement is achieved while increasing computational efficiency: more complex processing is done in the first stage, and a lightweight CNN is used in the second stage to achieve high-quality speech enhancement while reducing the overall computational load.
What are The Results of The Noise Suppression Experiment? Try Two Tests, Subjective and Objective...
You have proposed a model to build a compact DNN with the least amount of computation possible.
The result, as I mentioned at the beginning, was a successful miniaturization while maintaining state-of-the-art performance, but let's dig a little deeper to see how the performance was evaluated.
The model proposed this time is the purple one in the figure above, and it is so high performance that it is not inferior to other models.
Now let's look at the evaluation process.
Two types of tests, subjective and objective, are used to measure the results of this model.
Let's look at the subjective test first. This is the experiment shown in the figure above, right? What we do is we prepare several listeners and ask them to listen to the audio. We quantify the results of how the audio they heard was. You can see from the figure above that the results were very good.
Objective tests measure voice quality and voice distortion using specialized metrics. Without going into details, the results for distortion were good, but not so good for voice quality.
It is Difficult to Rethink The "Ordinary" and Create The "New"... That's Why It's So Interesting.
It is very difficult to doubt the obvious. If you take the "obvious" for granted, it is easy and you can fit in with others. On the other hand, you may be looked at strangely by others when you doubt the "obvious. However, discovery is not found in nothing at all, but rather in the unexpected.
In your research and in your personal life, why don't you try to question the norm? It is difficult when you are not used to it, and it is hard to change the way you think because it is already established, isn't it? It's difficult, and that's what makes it interesting. Discovering something new is exciting and will add color to your daily life that rarely changes.
Yes, I have. So, in this issue, I have introduced a study of noise suppression that questioned the obvious and produced new results. I am very happy if I have satisfied your intellectual curiosity even a little.
See you then! See you in the next article~~!
A Little Chat with A Chick Writer, Ogasawara
We are looking for companies and graduate students who are interested in conducting collaborative research!
His specialty is speech recognition (experimental), especially with dysarthric people.
This field has limited resources available, and there will always be a limit to what one person can tackle alone.
Who would like to join us in solving social issues using the latest technology?
Categories related to this article