Apparatus, Method or Computer Program for Generating a Bandwidth-Enhanced Audio Signal Using a Neural Network Processor

Publication: WO2019081070A1

Published: 2019-05-02

Family Size: 12

Granted: Yes (5/12)

Simple SummaryContent extracted from patent full text and abstract with AI.

This invention describes an apparatus, method, and software for enhancing the bandwidth of audio signals (e.g., speech codecs that only transmit limited frequency ranges) using a combination of traditional signal processing and a neural network. The system generates frequency components missing from the input signal by first creating a synthetic signal (the 'raw signal') in the higher frequency range and then shaping it with parameters predicted by a trained neural network, resulting in a more natural, intelligible, and pleasant-sounding audio output with extended bandwidth. The approach blends efficient traditional audio signal processing (e.g., spectral patching and whitening) with low-complexity convolutional/recurrent neural networks that generate parametric representations for the missing frequencies, thereby overcoming the limitations of narrowband codecs without any need to modify existing communications infrastructure.

Use CasesContent extracted from patent full text and abstract with AI.

Enhancing telephone call quality by providing wideband or fullband audio from narrowband signals (e.g., in GSM/AMR-NB telephony).
Improving the intelligibility and naturalness of recorded or live speech that has been band-limited (e.g., archival or voice logging systems).
Providing real-time, on-device speech enhancement in mobile phones, hearing aids, or communication devices with minimal computational resources.
Restoring frequency content to streamed or decoded speech/audio in VoIP, teleconferencing, or broadcasting without increasing bandwidth or requiring new codecs.
Supplying error concealment for lost or corrupted audio frames during streaming, by synthesizing missing frequency components.
As a post-processing tool for audio restoration in media, forensic analysis, or entertainment applications.

BenefitsContent extracted from patent full text and abstract with AI.

Provides substantial improvement in perceived audio quality and intelligibility by reconstructing lost high-frequency information.
Requires no change to existing communications networks or codecs—can be implemented on the receiver/decoder side only.
Uses a neural network in an efficient, low-complexity fashion (parametric prediction only), enabling real-time operation on resource-constrained devices.
Algorithm introduces no significant additional delay, suitable for live telecommunication applications.
Adaptable to various core codecs and bandwidth settings, supporting both blind and guided (low bitrate side information) bandwidth enhancement modes.
Versatile in handling both speech and general audio data, and robust to different languages, speakers, and recording conditions as demonstrated by training and evaluation on multi-lingual corpora.

Technical Classifications (CPCs)

Main Classifications

Electrical & Electronic Tech

Physics & Measurement

Sub Classifications

Computing & Calculating

Electric Communication Technique

Musical Instruments & Acoustics

CPC Codes

G06N3/063G06N3/084G06N20/10G10L19/005G10L19/02G10L21/02G10L21/038G10L21/0388H04L65/80

Inventors & Applicants

Inventors

Konstantin Schmidt

Christian Uhle

Bernd Edler

Applicants

Fraunhofer Ges Forschung

Univ Friedrich Alexander Er

Patent Abstract

An apparatus for generating a bandwidth enhanced audio signal from an input audio signal (50) having an input audio signal frequency range, comprises: a raw signal generator (10) configured for generating a raw signal (60) having an enhancement frequency range, wherein the enhancement frequency range is not included in the input audio signal frequency range; a neural network processor (30) configured for generating a parametric representation (70) for the enhancement frequency range using the input audio frequency range of the input audio signal and a trained neural network (31 ); and a raw signal processor (20) for processing the raw signal (60) using the parametric representation (70) for the enhancement frequency range to obtain a processed raw signal (80) having frequency components in the enhancement frequency range, wherein the processed raw signal (80) or the processed raw signal and the input audio signal frequency range of the input audio signal represent the bandwidth enhanced audio signal.

Key Information

Publication No.

WO2019081070A1

Family ID

60268209

Publication Date

2019-05-02

Application No.

EP2018059593W

Application Date

2018-04-13

Priority Date

2017-10-27

Granted

Yes (5/12)

Possible Cooperation

For further information please contact the transfer office.

See full document in Espacenet