Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Parashar_Dhakal_Thesis.pdf (1.37 MB)
ETD Abstract Container
Abstract Header
Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms
Author Info
Dhakal, Parashar
ORCID® Identifier
http://orcid.org/0000-0002-7306-4430
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278
Abstract Details
Year and Degree
2018, Master of Science, University of Toledo, Electrical Engineering.
Abstract
Real-time voice recognition and environmental sound detection play an important role in the fields of security, home control systems, robotics, and speech forensics. The advantages and its potential need in these industries have been a great motivation behind this work. The task of voice recognition and environmental sound detection is challenging due to high variability in sound signals. Furthermore, the presence of environmental noise makes the task of recognition even more difficult. Various methods and architectures have been introduced for both voice and sound recognition till date. However, due to some limitations in these architectures, we came up with two different architectures for both voice recognition and background sound detection. Through these architectures, we try to overcome the limitations seen in the previous architectures proposed by various researchers. In this work for environmental sound detection, we present a real-time method in which features are extracted using standard signal processing techniques and classification is done using the standard ML based classifier. The extracted features are time domain features like ZCR and STE and frequency domain features like SC, SR, and SF. The Pitch was determined using Average Magnitude Difference Function (AMDF). For the classification, we used some robust and accurate ML techniques like SVM, RF, and DNN. Similarly, for voice recognition, we present a novel pipelined real-time end-to-end voice recognition architecture that enhances the performance of voice recognition by exploiting the advantages of GF and CNN. This architecture has been developed to provide a voice-user interface and aid in voice-based authentication and integration with an existing NLP system. Gaining secure access to existing NLP systems also served as one of the primary goals. Initially, in this work, we identify challenges related to real-time voice recognition and highlight the up-to-date research in the field. Further, we analyze the functional requirements of a voice recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, our work discusses the effect of different mechanisms such as CNN, GF, and statistical parameters in feature extraction. For the classification, standard classifiers such as SVM, RF, and DNN are investigated. To verify the validity and effectiveness of the proposed architecture, we compared different parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture.
Committee
Vijay Devabhaktuni (Committee Chair)
Ahmad Javaid (Committee Co-Chair)
Richard Molyet (Committee Member)
Pages
88 p.
Subject Headings
Computer Engineering
;
Electrical Engineering
Keywords
classifiers
;
end-to-end architecture
;
feature extraction
;
machine learning
;
speaker recognition
;
voice interface
;
background sound identification
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Dhakal, P. (2018).
Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms
[Master's thesis, University of Toledo]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278
APA Style (7th edition)
Dhakal, Parashar.
Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms.
2018. University of Toledo, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278.
MLA Style (8th edition)
Dhakal, Parashar. "Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms." Master's thesis, University of Toledo, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
toledo1531349806743278
Download Count:
963
Copyright Info
© 2018, some rights reserved.
Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms by Parashar Dhakal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by University of Toledo and OhioLINK.