A deep learning approach to integrate convolutional neural networks in speaker recognition

Soufiane Hourri, Nikola S. Nikolov, Jamal Kharroubi

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a novel usage of convolutional neural networks (CNNs) for the problem of speaker recognition. While being particularly designed for computer vision problems, CNNs have recently been applied for speaker recognition by using spectrograms as input images. We believe that this approach is not optimal as it may result in two cumulative errors in solving both a computer vision and a speaker recognition problem. In this work, we aim at integrating CNNs in speaker recognition without relying on images. We use Restricted Boltzmann Machines (RBMs) to extract speakers models as matrices and introduce a new way to model target and non-target speakers, in order to perform speaker verification. Thus, we use a CNN to discriminate between target and non-target matrices. Experiments were conducted with the THUYG-20 SRE corpus under three noise conditions: clean, 9 db, and 0 db. The results demonstrate that our method outperforms the state-of-the-art approaches by decreasing the error rate by up to 60%.

Original languageEnglish
Pages (from-to)615-623
Number of pages9
JournalInternational Journal of Speech Technology
Volume23
Issue number3
DOIs
Publication statusPublished - 1 Sep 2020

Keywords

  • Convolutional neural network
  • Deep learning
  • MFCC
  • Restricted Boltzmann Machine
  • Speaker recognition

Fingerprint

Dive into the research topics of 'A deep learning approach to integrate convolutional neural networks in speaker recognition'. Together they form a unique fingerprint.

Cite this