Apparatus and Method for Harmonic-Percussive-Residual Sound Separation Using a Structure Tensor on Spectrograms
Simple SummaryContent extracted from patent full text and abstract with AI.
This invention relates to an improved apparatus and method for separating audio signals into harmonic, percussive, and residual components using a structure tensor applied to spectrogram images. By analyzing the local orientation and anisotropy in the spectrogram, the approach enables more accurate classification of audio elements, especially for sounds with complex frequency modulations (e.g., vibrato) that previously could not be effectively separated. The method uses mathematical tools from image processing, such as edge and corner detection, to enhance the accuracy of audio component separation.
Use CasesContent extracted from patent full text and abstract with AI.
- Automatic music transcription (detecting melody, rhythm, and noise components)
- Beat and rhythm analysis in music tracks
- Audio source separation for remixing or enhancing audio (e.g., isolating vocals or instruments)
- Real-time equalization of harmonic and percussive components in music production or DJ software
- Spatial upmixing for surround sound systems (assigning different audio components to different audio channels)
- Voice or singing detection in audio streams
- Improving speech and music recognition systems by pre-processing audio data
- Noise reduction in audio recordings by isolating and removing residual (non-harmonic and non-percussive) components
BenefitsContent extracted from patent full text and abstract with AI.
- More accurate separation of audio signals into harmonic, percussive, and residual components, especially for sounds with frequency modulation like vibrato.
- Improved robustness to noise due to the use of structure tensor-based feature extraction and smoothing.
- Enhanced performance in audio analysis applications such as transcription, beat tracking, and source separation compared to traditional methods.
- Enables new audio manipulation capabilities, such as dynamic upmixing and selective component amplification or suppression.
- Applicable to both real-time and offline processing, with implementations possible in software or hardware.
- Optimization of multi-channel audio generation for surround sound formats, leading to richer listening experiences.
- Flexible and configurable for various audio analysis and transformation tasks.
- Potentially reduces the leakage of tonal information into residual components, improving overall sound quality in separated tracks.
Technical Classifications (CPCs)
Main Classifications
Physics & Measurement
Sub Classifications
Computing & Calculating
Musical Instruments & Acoustics
CPC Codes
Inventors & Applicants
Applicants
Fraunhofer Ges Forschung
Friedrich-alexander-universität Erlangen-nürnberg
Patent Abstract
Apparatus and method for analysing a magnitude spectrogram of an audio signal for Harmonic-Percussive Residual Sound Separation HPSS comprising : Determining a change of a frequency for each time-frequency bin of a plurality of time-frequency bins of the magnitude spectrogram of the audio signal; classifying each time-frequency bin into a signal component group depending on the change of the frequency. A structural tensor is applied to the image of the spectogram for preprocessing or feature extraction by edge and corner detection, in particular by calculating predominant orientation angles in the spectrogram.The structure tensor can be considered a black box, where the input is a gray scale image and the outputs are angles n for each pixel corresponding to the direction of lowest change and a certainty or anisotropy measure for this direction for each pixel. A local frequency change is extracted from the angles : It can be determined, whether a time-frequency-bin in the spectrogram belongs to a harmonic component (= low local frequency change) or to a percussive component (= high or infinite local frequency change). Examples of application : (figure 1) Distinguish between harmonic, percussive, and residual signal components by employing this orientation information. (figure 5) Analyse an audio signal for upmixing to five audio output channels front left, center, right, left surround and right surround : - The harmonic weighting factor may be greater for generating the left, center and right output channels compared to the harmonic weighting factor for generating the left surround and right surround output channels. - The percussive weighting factor may be smaller for generating the left, center and right output channels compared to the percussive weighting factor for generating the left surround and right surround output channels. (figure 6) Compute source separation metrics (source to distortion ratio SDR, source to interference ratio SIR, and source to artifacts ratios SAR) in a recorded audio signal. For example : A vibrato in a singing voice has a high instantaneous frequency change rate; an assignment of a bin in the spectrogram to "residual" is dependent on the bin anisotropy.
Key Information
Publication No.
EP3220386A1
Family ID
55646318
Publication Date
2017-09-20
Application No.
EP16161251A
Application Date
2016-03-18
Priority Date
2016-03-18
Granted
Yes (9/19)
Possible Cooperation
For further information please contact the transfer office.