Novel Architectures for Human Voice and Environmental Sound Recognition
using Machine Learning Algorithms

Dhakal, Parashar

Keyword Search

School Logo

Parashar_Dhakal_Thesis.pdf (1.37 MB)

Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms

Author Info

Dhakal, Parashar

ORCID® Identifier

http://orcid.org/0000-0002-7306-4430

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278

Year and Degree

2018, Master of Science, University of Toledo, Electrical Engineering.

Abstract

Real-time voice recognition and environmental sound detection play an important role in the fields of security, home control systems, robotics, and speech forensics. The advantages and its potential need in these industries have been a great motivation behind this work. The task of voice recognition and environmental sound detection is challenging due to high variability in sound signals. Furthermore, the presence of environmental noise makes the task of recognition even more difficult. Various methods and architectures have been introduced for both voice and sound recognition till date. However, due to some limitations in these architectures, we came up with two different architectures for both voice recognition and background sound detection. Through these architectures, we try to overcome the limitations seen in the previous architectures proposed by various researchers. In this work for environmental sound detection, we present a real-time method in which features are extracted using standard signal processing techniques and classification is done using the standard ML based classifier. The extracted features are time domain features like ZCR and STE and frequency domain features like SC, SR, and SF. The Pitch was determined using Average Magnitude Difference Function (AMDF). For the classification, we used some robust and accurate ML techniques like SVM, RF, and DNN. Similarly, for voice recognition, we present a novel pipelined real-time end-to-end voice recognition architecture that enhances the performance of voice recognition by exploiting the advantages of GF and CNN. This architecture has been developed to provide a voice-user interface and aid in voice-based authentication and integration with an existing NLP system. Gaining secure access to existing NLP systems also served as one of the primary goals. Initially, in this work, we identify challenges related to real-time voice recognition and highlight the up-to-date research in the field. Further, we analyze the functional requirements of a voice recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, our work discusses the effect of different mechanisms such as CNN, GF, and statistical parameters in feature extraction. For the classification, standard classifiers such as SVM, RF, and DNN are investigated. To verify the validity and effectiveness of the proposed architecture, we compared different parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture.

Committee

Vijay Devabhaktuni (Committee Chair)
Ahmad Javaid (Committee Co-Chair)
Richard Molyet (Committee Member)

Pages

88 p.

Subject Headings

Computer Engineering; Electrical Engineering

Keywords

classifiers; end-to-end architecture; feature extraction; machine learning; speaker recognition; voice interface; background sound identification

Dhakal, P. (2018). Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms [Master's thesis, University of Toledo]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278
APA Style (7th edition)
Dhakal, Parashar. Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms. 2018. University of Toledo, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278.
MLA Style (8th edition)
Dhakal, Parashar. "Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms." Master's thesis, University of Toledo, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278
Chicago Manual of Style (17th edition)

Document number:

toledo1531349806743278

Download Count:

963

Copyright Info

Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms by Parashar Dhakal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by University of Toledo and OhioLINK.

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations