Apparatuses for Providing a Processed Audio Signal, Apparatuses for Providing Neural Network Parameters, Methods and Computer Program
Simple SummaryContent extracted from patent full text and abstract with AI.
This patent describes advanced methods and apparatuses for speech enhancement based on normalizing flows, a type of deep generative model. It introduces architectural innovations including the use of All-Pole Gammatone Filterbanks, double coupling schemes in affine coupling layers, and depthwise separable convolutions, all optimized to condition and process noisy speech signals. Mimicking aspects of human auditory perception, the technology extracts a cleaner speech signal from noisy input, offering improved perceptual audio quality (especially at low signal-to-noise ratios) over conventional or GAN-based approaches. The invention is implementable in hardware, software, or hybrid systems and is supported by experimental tests showing favorable perceived sound quality.
Use CasesContent extracted from patent full text and abstract with AI.
- Enhancement of speech clarity in mobile phones, teleconferencing, and voice communication devices, especially in noisy environments.
- Improving input quality for automatic speech recognition systems by reducing background noise.
- Application in hearing aids and cochlear implants to provide clearer sound to users by reducing noise.
- Broadcast and streaming audio post-processing to deliver cleaner, more intelligible dialog and content.
- Separating audio objects for advanced audio formats, e.g., for MPEG-H or dialog enhancement in broadcasting applications.
- Improving personal assistants (e.g., smart speakers) by providing them with cleaner audio input for wake word detection and command recognition.
- Enhancing audio recordings in forensic, surveillance, or archival scenarios to recover intelligible speech from degraded audio.
BenefitsContent extracted from patent full text and abstract with AI.
- Delivers higher perceived audio quality than state-of-the-art GAN-based or traditional enhancement methods, especially at low SNRs.
- Efficient neural network architecture reduces computational complexity and resource requirements, making it suitable for real-time or embedded applications.
- Leverages human auditory models (All-Pole Gammatone Filterbanks), optimizing enhancement for perceptual relevance.
- Double coupling flow blocks improve the expressive power and quality of the generative model, allowing for more effective noise removal.
- Depthwise separable convolutions further reduce model complexity and memory usage without sacrificing performance.
- Versatile implementation: applicable in software (as a program or app), hardware, or hybrid systems (DSPs, FPGAs, etc.).
- Suitable for both offline and real-time processing scenarios due to efficient design and invertibility properties.
Technical Classifications (CPCs)
Main Classifications
Physics & Measurement
Sub Classifications
Musical Instruments & Acoustics
CPC Codes
Inventors & Applicants
Applicants
Fraunhofer Ges Forschung
Univ Friedrich Alexander Er
Patent Abstract
Embodiments according to the invention relate to apparatuses for providing a processed audio signal, apparatuses for providing neural network parameters, methods and computer programs. Embodiments according to the invention relate to Improved Normalizing Flow-Based Speech Enhancement Using an All-Pole Gammatone Filterbank for Conditional Input Representation. Embodiments according to the invention relate to Improved Normalizing Flow-Based Speech Enhancement with Varied Input Conditions.
Key Information
Publication No.
WO2023186934A1
Family ID
81325295
Publication Date
2023-10-05
Application No.
EP2023058055W
Application Date
2023-03-28
Priority Date
2022-03-28
Granted
No
Possible Cooperation
For further information please contact the transfer office.