SSL: Sound Source Localization and Discrimination

Description

We have pioneered the first viable deep learning framework (task definition, network architecture, training paradigm) for solving fundamental auditory tasks such as sound source localization, speaker identification and speech/non-speech classification. The framework is suitable for highly noisy environments and overcomes limitations of previous methods, which heavily relied on idealized sound and environment models and are inadequate for everyday situations with multiple sound sources, background noise, short utterances, and lack of prior knowledge of the number of sound sources. The method learns sound source localization models with limited training resources leveraging simulated and weakly-labeled real audio data.

Publications

He, W. and Motlicek, P. and Odobez, J-M. (2021) Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction. Interspeech 2021
He, W. and Motlicek, P. and Odobez, J-M. (2021) Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing

Links

Video: https://www.youtube.com/watch?v=Cfsc0zXAMVU

Advantages

Robust localization and identification methods
Working in noisy environments

Applications

Any application which needs to localize and identify speakers

Technology Readiness Level

TRL 6

SSL — Sound Source Localization and Discrimination

Contact us for more information

Interested in using our technologies?
Interested to know more about the licensing possibilities and conditions?

Contact us