Doctor of Philosophy (PhD), Wright State University, 2021, Biomedical Sciences PhD
Computational models may assist in identification and prioritization of large chemical libraries. Recent experimental and data curation efforts, such as from the Tox21 consortium, have contributed towards toxicological datasets of increasing numbers of chemicals and toxicity endpoints, creating a golden opportunity for the exploration of multi-label learning and deep learning approaches in this thesis. Multi-label classification (MLC) methods may improve model predictivity by accounting for label dependence. However, current measures of label dependence, such as correlation coefficient, are inappropriate for datasets with extreme class imbalance, often seen in toxicological datasets. In this thesis, we propose a novel label dependence measure that directly models the conditional probability of a label-pair and displays greater sensitivity than correlation coefficient for labels with low prior probabilities. MLC models using data-driven label partitioning based on this measure was generally non-inferior to MLC models using random label partitioning.
Marginal improvements in model predictivity have prompted toxicology modelers to shy away from deep learning and resort to ‘simpler' models, such as k-nearest neighbors, for its greater explainability. Given the prevalence of local, linear quantitative structure-activity relationship (QSAR) models in computational toxicology, we hypothesize that toxicological datasets have locally-linear data structures, resulting in heterogeneous classification spaces that challenges the basic assumptions of most machine learning algorithms. We propose the locality-sensitive deep learner, a modification of deep neural networks which uses attention mechanism to learn datapoint locality. On carefully-constructed synthetic data with extremely unbalanced classes (10% active) and (60%) cluster-specific noise, the locality-sensitive deep learner with learned feature weights retained high test performance (AUC>0.9), while the feed-forward n (open full item for complete abstract)
Committee: Michael L. Raymer Ph.D. (Advisor); David R. Cool Ph.D. (Committee Member); Lynn K. Hartzler Ph.D. (Committee Member); Travis E. Doom Ph.D. (Committee Member); Courtney E.W. Sulentic Ph.D. (Committee Member)
Subjects: Computer Science; Toxicology