Apparatus and Method for Encoding a Plurality of Audio Objects Using Direction Information During a Downmixing or Apparatus and Method for Decoding Using an Optimized Covariance Synthesis

Publication: WO2022079044A1
Published: 2022-04-21
Family Size: 15
Granted: Yes (4/15)

Simple SummaryContent extracted from patent full text and abstract with AI.

This patent presents an efficient method and device for encoding and decoding multiple audio objects (such as voices or instruments in a sound scene) by using their direction information during audio downmixing, and proposes an optimized way to reproduce high-quality, immersive audio. The system analyzes which objects are most significant in each time-frequency segment, incorporates their direction as metadata, and downmixes the objects accordingly to generate a lower-bitrate signal. At the decoding stage, the system reconstructs the audio scene using optimized matrix techniques that ensure accurate placement of sounds in space and reduce computational complexity.

Use CasesContent extracted from patent full text and abstract with AI.

  • 3D/immersive audio playback in home entertainment systems, cinemas, or VR/AR applications.
  • Efficient transmission and rendering of multi-speaker teleconference calls or live events with spatialized audio.
  • Streaming or broadcasting high-fidelity spatial audio over networks with limited bandwidth, such as mobile or internet radio.
  • Content creation tools for music and film, allowing producers to encode complex multi-object audio scenes for interactive or adaptive playback platforms.
  • Hearing aids or assistive listening devices, where spatial cues help users localize sound sources more naturally.

BenefitsContent extracted from patent full text and abstract with AI.

  • Delivers immersive audio experiences with accurate spatial rendering, enhancing realism and user engagement.
  • Efficiently codes complex sound scenes with multiple objects, using less data than traditional methods, enabling low-bitrate applications.
  • Optimized decoding reduces computational requirements, allowing deployment on devices with limited processing power (like mobile phones or smart speakers).
  • Flexible: works for various numbers of audio objects and target output formats, supporting everything from stereo to sophisticated 3D speaker setups.
  • Improved audio quality compared to prior art, with minimized artifacts (such as those caused by generic decorrelators).
  • Object position (direction) metadata enables interactive or personalized renders (e.g., rotating or customizing sound scenes for the listener).

Technical Classifications (CPCs)

Main Classifications

Electrical & Electronic Tech

Physics & Measurement

Sub Classifications

Computing & Calculating

Electric Communication Technique

Musical Instruments & Acoustics

CPC Codes

G06F3/162G10L19/008G10L19/032G10L25/03H04S3/008H04S7/302

Inventors & Applicants

Applicants

Fraunhofer Ges Forschung

Univ Friedrich Alexander Er

Patent Abstract

An apparatus for encoding a plurality of audio objects and related metadata indicating direction information on the plurality of audio objects, comprises: a downmixer (400) for downmixing the plurality of audio objects to obtain one or more transport channels; a transport channel encoder (300) for encoding one or more transport channels to obtain one or more encoded transport channels; and an output interface (200) for outputting an encoded audio signal comprising the one or more encoded transport channels, wherein the downmixer (400) is configured to downmix the plurality of audio objects in response to the direction information on the plurality of audio objects.

Key Information

Publication No.

WO2022079044A1

Family ID

78087389

Publication Date

2022-04-21

Application No.

EP2021078209W

Application Date

2021-10-12

Priority Date

2020-10-13

Granted

Yes (4/15)

Possible Cooperation

For further information please contact the transfer office.