System and Method for Voice Modification
Simple SummaryContent extracted from patent full text and abstract with AI.
This invention provides a system and method for modifying and synthesizing speech by altering the fundamental frequency (F0) trajectory of the voice signal using advanced machine learning techniques. Instead of simply using features directly from the original speech, the system generates new, modified F0 profiles based on extracted linguistic and speaker features, allowing for improved voice modification and anonymization. The design enhances the naturalness and privacy of the resulting speech, making it hard to trace back to the original speaker while maintaining clarity and intelligibility.
Use CasesContent extracted from patent full text and abstract with AI.
- Anonymizing the voices of users interacting with virtual assistants or smart devices to protect privacy.
- Generating anonymous voices for participants in teleconferences, legal proceedings, or whistleblower interviews.
- Creating personalized synthetic voices in the metaverse or other virtual environments, enabling users to adopt new voice characteristics or avatars.
- Producing disguised or pseudonymous voices for entertainment (e.g., dubbing, gaming, voice acting) without revealing the original actor's identity.
- Enhancing compliance with data protection laws (e.g., GDPR) by anonymizing stored or processed voice data.
- Reversibly de-anonymizing voices for authorized personnel where necessary, such as law enforcement or customer service.
- Voice modification for language learning tools to allow users to practice with different vocal characteristics.
BenefitsContent extracted from patent full text and abstract with AI.
- Improved speaker privacy and anonymization by decoupling the original voice features from the output speech signal.
- Enhanced naturalness and intelligibility of the modified/anonymized voice compared to prior art systems, avoiding 'unnatural' synthetic speech.
- Adaptability for both anonymization and de-anonymization scenarios using reversible mappings when required.
- Supports cross-gender and cross-identity voice conversion with more natural-sounding results.
- Reduces word error rate (WER) and maintains speech recognition accuracy.
- Low complexity solution suitable for integration into smart devices, cloud services, and virtual environments.
- Scalable across different languages and voice characteristics due to machine learning-based feature extraction and synthesis.
Technical Classifications (CPCs)
Main Classifications
Physics & Measurement
Sub Classifications
Musical Instruments & Acoustics
CPC Codes
Inventors & Applicants
Applicants
Fraunhofer Ges Forschung
Univ Friedrich Alexander Er
Patent Abstract
A system for conducing voice modification on an audio input signal comprising speech to obtain an audio output signal according to an embodiment is provided. The system comprises a feature extractor (210) for extracting feature information of the speech from the audio input signal. Moreover, the system comprises a fundamental frequencies generator (230) to generate modified fundamental frequency information depending on the feature information, such that the modified fundamental frequency information comprises modified fundamental frequencies being different from real fundamental frequencies of the speech, and/or such that the modified fundamental frequency information indicates a modified fundamental frequency trajectory being different from a real fundamental frequency trajectory of the speech. Furthermore, the system comprises a synthesizer (240) for generating the audio output signal depending on the modified fundamental frequency information and depending on the feature information.
Key Information
Publication No.
EP4318472A1
Family ID
82850208
Publication Date
2024-02-07
Application No.
EP22189150A
Application Date
2022-08-05
Priority Date
2022-08-05
Granted
No
Possible Cooperation
For further information please contact the transfer office.