Doctor of Philosophy, The Ohio State University, 2005, Computer and Information Science
At a cocktail party, we can selectively attend to a single voice and filter out other interferences. This perceptual ability has motivated a new field of study known as computational auditory scene analysis (CASA) which aims to build speech separation systems that incorporate auditory principles. The psychological process of figure-ground segregation suggests that the target signal should be segregated as foreground while the remaining stimuli are treated as background. Accordingly, the computational goal of CASA should be to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. This dissertation investigates four aspects of CASA processing: location-based speech segregation, binaural tracking of multiple moving sources, binaural sound segregation in reverberation, and monaural segregation of reverberant speech. For localization, the auditory system utilizes the interaural time difference (ITD) and interaural intensity difference (IID) between the ears. We observe that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for ITD and IID resulting in a characteristic clustering. Consequently, we propose a supervised learning approach to estimate the ideal binary mask. A systematic evaluation shows that the resulting system produces masks very close to the ideal binary ones and large speech intelligibility improvements. In realistic environments, source motion requires consideration. Binaural cues are strongly correlated with locations in T-F units dominated by one source resulting in channel-dependent conditional probabilities. Consequently, we propose a multi-channel integration method of these probabilities in order to compute the likelihood function in a target space. Finally, a hidden Markov model is employed for forming continuous tracks and automatically detecting the number of act (open full item for complete abstract)
Committee: DeLiang Wang (Advisor)
Subjects: