Search ETDs:
Zhao, Xiaojia

2014, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
As a primary topic in speaker recognition, speaker identification (SID) aims to identify the underlying speaker(s) given a speech utterance. SID systems perform well under matched training and test conditions. In real-world environments, mismatch caused by background noise, room reverberation or competing voice significantly degrades the performance of such systems. Achieving robustness to the SID systems becomes an important research problem. Existing approaches address this problem from different perspectives such as proposing robust speaker features, introducing noise to clean speaker models, and using speech enhancement methods to restore clean speech characteristics. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech from interference by producing a time-frequency mask. This dissertation aims to address the SID robustness problem in the CASA framework.

We first deal with the noise robustness of SID systems. We employ an auditory feature, gammatone frequency cepstral coefficient (GFCC), and show that this feature captures speaker characteristics and performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine these two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios (SNR). In addition, we conduct a systematic investigation on why GFCC shows superior noise robustness and conclude that nonlinear log rectification is likely the reason.

Speech is often corrupted by both noise and reverberation. There have been studies to address each of them, but the combined effects of noise and reverberation have been rarely studied. We address this issue in two phases. We first remove background noise through binary masking using a deep neural network (DNN) classifier. Then we perform robust SID with speaker models trained in selected reverberant conditions, on the basis of bounded marginalization and direct masking. Evaluation results show that the proposed method substantially improves SID performance compared to related systems in a wide range of reverberation time and SNRs.

The aforementioned studies handle mixtures of target speech and non-speech intrusions by taking advantage of their different characteristics. Such methods may not apply if the intrusion is a competing voice, which is of similar characteristics as the target. SID in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a well-known challenge. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This dissertation studies cochannel SID in both anechoic and reverberant conditions. We first investigate GMM-based approaches and propose a combined system that integrates two cochannel SID methods. Secondly, we explore DNNs for cochannel SID and propose a DNN-based recognition system. Evaluation results demonstrate that our proposed systems significantly improve SID performance over recent approaches in both anechoic and reverberant conditions and various target-to-interferer ratios.
DeLiang Wang, Professor (Advisor)
Eric Fosler-Lussier, Professor (Committee Member)
Mikhail Belkin, Professor (Committee Member)
155 p.

Recommended Citations

Hide/Show APA Citation

Zhao, X. (2014). CASA-BASED ROBUST SPEAKER IDENTIFICATION. (Electronic Thesis or Dissertation). Retrieved from

Hide/Show MLA Citation

Zhao, Xiaojia. "CASA-BASED ROBUST SPEAKER IDENTIFICATION." Electronic Thesis or Dissertation. Ohio State University, 2014. OhioLINK Electronic Theses and Dissertations Center. 16 Dec 2017.

Hide/Show Chicago Citation

Zhao, Xiaojia "CASA-BASED ROBUST SPEAKER IDENTIFICATION." Electronic Thesis or Dissertation. Ohio State University, 2014.


Dissertation_XiaojiaZhao.pdf (1.26 MB) View|Download