Apparatuses for Providing a Processed Audio Signal, Apparatuses for Providing Neural Network Parameters, Methods and Computer Program

Publication: WO2023186934A1

Published: 2023-10-05

Family Size: 4

Granted: No

Simple SummaryContent extracted from patent full text and abstract with AI.

This patent describes advanced methods and apparatuses for speech enhancement based on normalizing flows, a type of deep generative model. It introduces architectural innovations including the use of All-Pole Gammatone Filterbanks, double coupling schemes in affine coupling layers, and depthwise separable convolutions, all optimized to condition and process noisy speech signals. Mimicking aspects of human auditory perception, the technology extracts a cleaner speech signal from noisy input, offering improved perceptual audio quality (especially at low signal-to-noise ratios) over conventional or GAN-based approaches. The invention is implementable in hardware, software, or hybrid systems and is supported by experimental tests showing favorable perceived sound quality.

Use CasesContent extracted from patent full text and abstract with AI.

Enhancement of speech clarity in mobile phones, teleconferencing, and voice communication devices, especially in noisy environments.
Improving input quality for automatic speech recognition systems by reducing background noise.
Application in hearing aids and cochlear implants to provide clearer sound to users by reducing noise.
Broadcast and streaming audio post-processing to deliver cleaner, more intelligible dialog and content.
Separating audio objects for advanced audio formats, e.g., for MPEG-H or dialog enhancement in broadcasting applications.
Improving personal assistants (e.g., smart speakers) by providing them with cleaner audio input for wake word detection and command recognition.
Enhancing audio recordings in forensic, surveillance, or archival scenarios to recover intelligible speech from degraded audio.

BenefitsContent extracted from patent full text and abstract with AI.

Delivers higher perceived audio quality than state-of-the-art GAN-based or traditional enhancement methods, especially at low SNRs.
Efficient neural network architecture reduces computational complexity and resource requirements, making it suitable for real-time or embedded applications.
Leverages human auditory models (All-Pole Gammatone Filterbanks), optimizing enhancement for perceptual relevance.
Double coupling flow blocks improve the expressive power and quality of the generative model, allowing for more effective noise removal.
Depthwise separable convolutions further reduce model complexity and memory usage without sacrificing performance.
Versatile implementation: applicable in software (as a program or app), hardware, or hybrid systems (DSPs, FPGAs, etc.).
Suitable for both offline and real-time processing scenarios due to efficient design and invertibility properties.

Technical Classifications (CPCs)

Main Classifications

Physics & Measurement

Sub Classifications

Musical Instruments & Acoustics

CPC Codes

G10L21/0208G10L25/30

Inventors & Applicants

Inventors

Martin Strauss

Bernd Edler

Matteo Torcoli

Applicants

Fraunhofer Ges Forschung

Univ Friedrich Alexander Er

Patent Abstract

Embodiments according to the invention relate to apparatuses for providing a processed audio signal, apparatuses for providing neural network parameters, methods and computer programs. Embodiments according to the invention relate to Improved Normalizing Flow-Based Speech Enhancement Using an All-Pole Gammatone Filterbank for Conditional Input Representation. Embodiments according to the invention relate to Improved Normalizing Flow-Based Speech Enhancement with Varied Input Conditions.

Key Information

Publication No.

WO2023186934A1

Family ID

81325295

Publication Date

2023-10-05

Application No.

EP2023058055W

Application Date

2023-03-28

Priority Date

2022-03-28

Granted

Possible Cooperation

For further information please contact the transfer office.

See full document in Espacenet