Time-domain Deep Neural Networks for Speech Separation

Sun, Tao

Keyword Search

School Logo

Tao, Sun Accepted Dissertation 3-14-22 Sp 22.pdf (6.25 MB)

Time-domain Deep Neural Networks for Speech Separation

Author Info

Sun, Tao

ORCID® Identifier

http://orcid.org/0000-0002-8967-8760

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022

Year and Degree

2022, Doctor of Philosophy (PhD), Ohio University, Electrical Engineering & Computer Science (Engineering and Technology).

Abstract

Speech separation separates the speech of interest from background noise (speech enhancement) or interfering speech (speaker separation). While the human auditory system has extraordinary speech separation capabilities, designing artificial models with similar functions has proven to be very challenging. Recently, waveform deep neural network (DNN) has become the dominant approach for speech separation with great success. Improving speech quality and intelligibility is a primary goal for the speech separation tasks. Integrating human speech elements into waveform DNNs has proven to be a simple yet effective strategy to boost objective performance (including speech quality and intelligibility) of speech separation models. In this dissertation, three solutions are proposed to integrate human speech elements into waveform speech separation solutions in an effective manner. First, we propose a knowledge-assisted framework to integrate pretrained self-supervised speech representations to boost the performance of speech enhancement networks. To enhance the output intelligibility, we design auxiliary perceptual loss functions that rely on speech representations pretrained on large datasets, to ensure the denoised network outputs sound like clean human speeches. Our second solution is for speaker separation, where we design a speaker-conditioned model that adopts a pretrained speaker identification model to generate speaker embeddings with rich speech information. Our third solution takes a different approach to improve speaker separation solutions. To suppress information of non-target speakers in auxiliary-loss based solutions, we introduce a loss function that can maximize the distance between speech representations of separated speeches and speeches of clean non-target speakers. In this dissertation, we also address a practical issue in frame-based DNN SE solution: frame stitching, where the input context to be observed in a network is often limited, resulting in boundary discontinuities in network outputs. We use recurrent neural network (RNN) to connect depthwise fully convolution networks (FCNs), allowing temporal information to be propagated along the networks on individual frames. Our FCN + RNN model demonstrates excellent smoothing effect on short frames, enabling speech enhancement systems with very short delays.

Committee

Jundong Liu (Advisor)
Razvan Bunescu (Committee Member)
Li Xu (Committee Member)
Avinash Karanth (Committee Member)
Martin J. Mohlenkamp (Committee Member)
Jeffrey Dill (Committee Member)

Pages

101 p.

Subject Headings

Computer Science

Keywords

Speech Separation; Deep Neural Networks; Self-supervised Learning; Speech Enhancement; Speaker Separation

Sun, T. (2022). Time-domain Deep Neural Networks for Speech Separation [Doctoral dissertation, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022
APA Style (7th edition)
Sun, Tao. Time-domain Deep Neural Networks for Speech Separation. 2022. Ohio University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022.
MLA Style (8th edition)
Sun, Tao. "Time-domain Deep Neural Networks for Speech Separation." Doctoral dissertation, Ohio University, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1647344440927022
Chicago Manual of Style (17th edition)

Document number:

ohiou1647344440927022

Download Count:

195

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Time-domain Deep Neural Networks for Speech Separation

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Time-domain Deep Neural Networks for Speech Separation

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations