Doctor of Philosophy, The Ohio State University, 2024, Computer Science and Engineering
Our daily conversations often occur in acoustic environments filled with background noise, reverberation, and competing speech. In such settings, the performance of speech processing systems drastically declines, as they are typically designed to process clean speech. To address this challenge, speaker separation is employed to segregate speech signals. For real-world applications, speaker separation must be talker-independent to accommodate speakers that are not included in the training data. This dissertation focuses on talker-independent speaker separation in conversational or meeting environments, in single- and multi-microphone scenarios.
Conversational speaker separation systems are required to process long audio recordings and handle overlapping speech from a variable number of speakers. Current methods utilize continuous speaker separation (CSS), which divides an audio stream into short, partially overlapped segments of 2-3 seconds, each containing up to two speakers. CSS employs a talker-independent speaker separation model based on deep neural networks (DNN) to process each segment. Training a talker-independent model requires that each output layer of a DNN model associate with a distinct speaker in the mixture. Ambiguity in speaker assignment would lead to conflicting gradients during training. To ensure talker independence, the CSS separation model is trained with permutation invariant training (PIT), exploring all possible output-speaker permutations.
Another approach to processing conversational speech involves combining speaker separation with diarization. Speaker diarization is designed to determine "who spoke when" within an audio stream, and when used in conjunction with speaker separation, it enables the creation of a distinct, clean audio stream for each speaker. This process is closely related to speaker recognition, which seeks to identify "who is speaking."
This dissertation begins by investigating the impact of single- and multi-ch (open full item for complete abstract)
Committee: Donald Williamson (Committee Member); Eric Fosler-Lussier (Committee Member); DeLiang Wang (Advisor)
Subjects: Artificial Intelligence; Computer Science; Electrical Engineering