Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Zifeng Zhao¹, Dongchao Yang¹, Rongzhi Gu¹, Haoran Zhang¹, Yuexian Zou^1,2 1 Peking University 2 Peng Cheng Laboratory

Introduction

This is a demo for our paper Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches. In the following, we will show some cases in which the baseline model comes across with target confusion problem, and compare them with our results.

Examples

Female - Male Mixtures

speech mixture	enrollment utterance	baseline	proposed methods	ground-truth

Male - Male Mixtures

speech mixture	enrollment utterance	baseline	proposed methods	ground-truth

Female - Female Mixtures

speech mixture	enrollment utterance	baseline	proposed methods	ground-truth

Links

[Paper] [Bibtex] [Demo GitHub]

News

2022-06-15 Paper accepted by INTERSPEECH 2022
2022-04-15 Paper available on arXiv

References

[1] Delcroix M, Ochiai T, Zmolikova K, et al. Improving speaker discrimination of target speech extraction with time-domain speakerbeam[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020: 691-695.
[2] Cosentino J, Pariente M, Cornell S, et al. Librimix: An open-source dataset for generalizable speech separation[J]. arXiv preprint arXiv:2005.11262, 2020.