Apparatus and Method for Perception-Based Clustering of Object-Based Audio Scenes

Publication: EP4346234A1

Published: 2024-04-03

Family Size: 2

Granted: No

Simple SummaryContent extracted from patent full text and abstract with AI.

This patent describes an apparatus and method for clustering multiple audio objects in immersive or object-based audio scenes using perception-based models. The clustering process combines audio objects that are perceptually close to each other, based on human auditory perception metrics like localization accuracy, masking effects, and perceived loudness maps. This reduces the number of audio objects needed to recreate an audio scene, aiming to maintain high perceptual sound quality while improving efficiency in audio processing, storage, and transmission.

Use CasesContent extracted from patent full text and abstract with AI.

Efficient encoding and streaming of immersive 3D audio content for movies, games, and VR/AR experiences.
Real-time audio rendering for virtual reality or gaming, reducing computational demands while maintaining audio quality.
Object-based audio broadcasting and delivery, e.g., over broadband or terrestrial broadcast, especially where bandwidth is limited.
Compression and downmixing of complex audio productions for consumer devices with limited processing power or storage (e.g., mobile phones, smart speakers).
Flexible rendering of object-based sound in smart home entertainment systems, automotive audio, or public spaces.
Automated audio scene optimization and adaptation (e.g., tailoring the mix to specific listening environments or user preferences).

BenefitsContent extracted from patent full text and abstract with AI.

Significantly reduces the number of active audio objects required, saving computational resources and bandwidth.
Maintains high perceived sound quality by leveraging psychoacoustic principles, ensuring that clustered objects are perceptually indistinct to the listener.
Enables real-time or low-latency rendering in resource-constrained environments, such as VR/AR headsets and mobile devices.
Allows for scalable audio delivery, adapting quality and object count to channel capacities or application constraints.
Improves transmission and storage efficiency for immersive audio, supporting applications like streaming, downloads, and broadcast.
Provides flexibility to achieve either constant quality or constant object rate, depending on application requirements.
Reduces redundancy and irrelevance in audio scenes, leading to smarter and more perceptually efficient audio rendering.

Technical Classifications (CPCs)

Main Classifications

Electrical & Electronic Tech

Sub Classifications

Electric Communication Technique

CPC Codes

H04S7/302

Inventors & Applicants

Inventors

Sascha Dick

Jürgen Herre

Applicants

Fraunhofer Ges Forschung

Univ Friedrich Alexander Er

Patent Abstract

An apparatus (100) according to an embodiment is provided The apparatus (100) comprises an input interface (110) for receiving information on three or more audio objects. Moreover, the apparatus (100) comprises a cluster generator (120) for generating two or more audio object clusters by associating each of the three or more audio objects with at least one of the two or more audio object clusters, such that, for each of the two or more audio object clusters, at least one of the three or more audio objects is associated to said audio object cluster, and such that, for each of at least one of the two or more audio object clusters, at least two of the three or more audio objects are associated with said audio object cluster. The cluster generator (120) is configured to generate the two or more audio object clusters depending on a perception-based model.

Key Information

Publication No.

EP4346234A1

Family ID

83508489

Publication Date

2024-04-03

Application No.

EP22198817A

Application Date

2022-09-29

Priority Date

2022-09-29

Granted

Possible Cooperation

For further information please contact the transfer office.

See full document in Espacenet