Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Tao, Sun Accepted Dissertation 3-14-22 Sp 22.pdf (6.25 MB)
ETD Abstract Container
Abstract Header
Time-domain Deep Neural Networks for Speech Separation
Author Info
Sun, Tao
ORCID® Identifier
http://orcid.org/0000-0002-8967-8760
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022
Abstract Details
Year and Degree
2022, Doctor of Philosophy (PhD), Ohio University, Electrical Engineering & Computer Science (Engineering and Technology).
Abstract
Speech separation separates the speech of interest from background noise (speech enhancement) or interfering speech (speaker separation). While the human auditory system has extraordinary speech separation capabilities, designing artificial models with similar functions has proven to be very challenging. Recently, waveform deep neural network (DNN) has become the dominant approach for speech separation with great success. Improving speech quality and intelligibility is a primary goal for the speech separation tasks. Integrating human speech elements into waveform DNNs has proven to be a simple yet effective strategy to boost objective performance (including speech quality and intelligibility) of speech separation models. In this dissertation, three solutions are proposed to integrate human speech elements into waveform speech separation solutions in an effective manner. First, we propose a knowledge-assisted framework to integrate pretrained self-supervised speech representations to boost the performance of speech enhancement networks. To enhance the output intelligibility, we design auxiliary perceptual loss functions that rely on speech representations pretrained on large datasets, to ensure the denoised network outputs sound like clean human speeches. Our second solution is for speaker separation, where we design a speaker-conditioned model that adopts a pretrained speaker identification model to generate speaker embeddings with rich speech information. Our third solution takes a different approach to improve speaker separation solutions. To suppress information of non-target speakers in auxiliary-loss based solutions, we introduce a loss function that can maximize the distance between speech representations of separated speeches and speeches of clean non-target speakers. In this dissertation, we also address a practical issue in frame-based DNN SE solution: frame stitching, where the input context to be observed in a network is often limited, resulting in boundary discontinuities in network outputs. We use recurrent neural network (RNN) to connect depthwise fully convolution networks (FCNs), allowing temporal information to be propagated along the networks on individual frames. Our FCN + RNN model demonstrates excellent smoothing effect on short frames, enabling speech enhancement systems with very short delays.
Committee
Jundong Liu (Advisor)
Razvan Bunescu (Committee Member)
Li Xu (Committee Member)
Avinash Karanth (Committee Member)
Martin J. Mohlenkamp (Committee Member)
Jeffrey Dill (Committee Member)
Pages
101 p.
Subject Headings
Computer Science
Keywords
Speech Separation
;
Deep Neural Networks
;
Self-supervised Learning
;
Speech Enhancement
;
Speaker Separation
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sun, T. (2022).
Time-domain Deep Neural Networks for Speech Separation
[Doctoral dissertation, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022
APA Style (7th edition)
Sun, Tao.
Time-domain Deep Neural Networks for Speech Separation.
2022. Ohio University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022.
MLA Style (8th edition)
Sun, Tao. "Time-domain Deep Neural Networks for Speech Separation." Doctoral dissertation, Ohio University, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ohiou1647344440927022
Download Count:
195
Copyright Info
© 2022, all rights reserved.
This open access ETD is published by Ohio University and OhioLINK.