Skip to the content.

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Zifeng Zhao1, Dongchao Yang1, Rongzhi Gu1, Haoran Zhang1, Yuexian Zou1,2
1 Peking University
2 Peng Cheng Laboratory

Introduction

This is a demo for our paper Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches. In the following, we will show some cases in which the baseline model comes across with target confusion problem, and compare them with our results.

Examples

Female - Male Mixtures

speech mixture
enrollment utterance
baseline
proposed methods
ground-truth

Male - Male Mixtures

speech mixture
enrollment utterance
baseline
proposed methods
ground-truth

Female - Female Mixtures

speech mixture
enrollment utterance
baseline
proposed methods
ground-truth

[Paper] [Bibtex] [Demo GitHub]

News

References

[1] Delcroix M, Ochiai T, Zmolikova K, et al. Improving speaker discrimination of target speech extraction with time-domain speakerbeam[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020: 691-695.
[2] Cosentino J, Pariente M, Cornell S, et al. Librimix: An open-source dataset for generalizable speech separation[J]. arXiv preprint arXiv:2005.11262, 2020.