Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 609)

Mini-Tools

 
 

Search Report

  • 1. SUN, HAN High-dimensional Variable Selection: A Novel Ensemble-based Method and Stability Investigation

    Doctor of Philosophy, Case Western Reserve University, 2025, Epidemiology and Biostatistics

    Variable selection in high-dimensional data analysis poses substantial methodological challenges. While numerous penalized variable selection methods and machine learning approaches exist, many demonstrate instability in real-world applications. This thesis makes two primary contributions: developing a novel ensemble algorithm for variable selection in competing risks modeling and conducting a comprehensive stability analysis of established variable selection methods. The first component introduces the Random Approximate Elastic Net (RAEN), an innovative methodology that offers a stable and generalizable solution for large-p-small-n variable selection in competing risks data. RAEN's flexible framework enables its application across various time-to-event regression models, including competing risks quantile regression and accelerated failure time models. We demonstrate that our computationally-intensive algorithm substantially improves both variable selection accuracy and parameter estimation in a numerical study. We have implemented RAEN in a user-friendly R package, freely available for public use. To demonstrate its practical utility, we apply RAEN to a cancer study, successfully identifying influential genes associated with mortality and disease progression in bladder cancer patients. The second component comprises a systematic evaluation of eight variable selection methods' stability under varying conditions. Through comprehensive numerical studies, we examine how factors such as sample sizes, number of predictors, correlation levels, and signal strength influence performance. Based on these findings, we provide evidence-based recommendations for implementing variable selection methods in real-world data analysis.

    Committee: Xiaofeng Wang (Advisor); John Barnard (Committee Member); Mark Schluchter (Committee Member); William Bush (Committee Chair) Subjects: Bioinformatics; Biostatistics; Genetics; Statistics
  • 2. Winget, Aaron Bayesian Optimization of Rare Earth Element Lennard-Jones Force Field Parameters

    Master of Science (M.S.), University of Dayton, 2024, Materials Engineering

    Rare earth elements (REEs) are essential to many modern-day technological applications. Due to their difficult and environmentally harmful refining methods, many of these REEs are imported to the U.S. from various other countries. With countries like China dominating the market, the U.S. supply chain is at risk. A potential solution to this issue would involve the use of proteins to extract these REEs in an environmentally sustainable manner. Custom proteins would be designed to extract specific REEs from their mixed metal ores through computer simulations, namely molecular dynamics. Currently the design process is stymied by the lack of working force fields for REEs within many molecular dynamics programs. This work seeks to address this issue by creating custom force fields designed around replicating basic experimental properties the REE ions have with water, counterions, and REE binding proteins. This is done utilizing a Bayesian optimization algorithm which can efficiently and accurately choose new parameters to test and verify for a wide variety of systems.

    Committee: Kevin Hinkle (Advisor); Michael Elsass (Committee Member); Rajiv Berry (Committee Member) Subjects: Biochemistry; Bioinformatics; Materials Science; Molecular Biology; Molecular Chemistry; Molecular Physics
  • 3. Nwogu, Onyekachi Use of Antibody Structural Information in Disease Prediction Models Reveals Antigen Specific B Cell Receptor Sequences in Bulk Repertoire Data

    PhD, University of Cincinnati, 2024, Medicine: Biomedical Informatics

    Antibodies are secreted proteins forms of B cell receptors (BCR) that can detect, bind and neutralize antigens. A person's BCR repertoire contains immune information of the antigens they have been exposed to. A substantial amount of modern high-throughput sequencing technologies can be applied to sequencing, monitoring and characterization of antibodies, thereby improving our understanding of how antibodies respond to disease antigens and the antibody compartment responsible for pathogen neutralization. With the vast amount of antibody sequence data created via high-throughput technologies and the advancement in computational methods, there is increasing interest in using machine learning to identify patterns within the BCR repertoire, aiming to leverage these insights for disease classification and predictive diagnostics. However, there exists complexities hindering the success of these goals, including the presence of multiple immune states per individual and the fact that deciphering the relationship between the BCR sequence and its antigen is hard to uncover. Convergent antibodies are highly similar antibodies elicited in multiple individuals in response to the same antigen. Convergent antibodies provide insight into the shared immunological responses and show great promise as diagnostic biomarkers. They have typically been identified using amino acid sequence similarity and used in machine learning models for HIV infection status prediction with high accuracy. However, antibodies with similar sequences can have low structural similarity and with structure linked to specificity, the sequence similarity approach at identifying convergent antibodies has limitations. In this thesis, I extend the definition of convergent antibodies to use isotype and structural information and benchmarked their performance by their ability to predict disease status. Additionally, I obtained a reduced set of highly predictive convergent antibody groups and explored the feature (open full item for complete abstract)

    Committee: Krishna Roskin PhD (Committee Chair); Corey Watson Ph.D. (Committee Member); Sandra Andorf Ph.D. (Committee Member); Jaroslaw Meller Ph.D. (Committee Member) Subjects: Bioinformatics
  • 4. Tursi, Amanda Application and Development of Computational Tools for Cytometry

    PhD, University of Cincinnati, 2024, Medicine: Biomedical Informatics

    The advent of single-cell profiling allows for large-scale cellular characterization at a detailed level that was unthinkable only two decades ago. Various omics technologies have flourished and researchers use them in conjuncture with a variety of biomedical fields. Notably, the integration of omics approaches with immunology research has been beneficial in uncovering the full complexity of the immune system. Proteomic profiling of cells has proven particularly useful in characterizing immune cell types. While different instruments and methods exist for protein detection on or within cells, a particularly popular tool in enabling single-cell profiling is flow cytometry. Although conventional flow cytometers have been used since well before the turn of the 20th century, offshoot technologies such as spectral flow cytometry and mass cytometry were formed within the last two decades. Advancements in cytometers have increased the number of measurable parameters, improved speed, and reduced costs. Consequently, cytometry is performed in an increasingly high-throughput and high-dimensional manner. This results in more complex data that require novel computational approaches to aid in processing and interpretation. The research presented here integrates programmatic approaches to advance immunological research through cell profiling via flow and mass cytometry. The viability of a computational workflow is exhibited through its usage on two separate research initiatives. One project aimed to elucidate the relationship between food allergy and vitamin D levels in a cohort of infants. The cytometry data and corresponding demographic information was also described and publicly shared to promote data reuse. The second study described here examined PFAS levels in adults and their impact on the immune system. Each study used computational methods combined with immunological knowle (open full item for complete abstract)

    Committee: Sandra Andorf Ph.D. (Committee Chair); Tamara Tilburgs Ph.D. (Committee Member); Yan Xu Ph.D. (Committee Member); Krishna Roskin PhD (Committee Member) Subjects: Bioinformatics
  • 5. Meng, Guanqun STATISTICAL CONSIDERATIONS IN CELL TYPE DECONVOLUTION AND CELL-TYPE-SPECIFIC DIFFERENTIAL EXPRESSION ANALYSIS

    Doctor of Philosophy, Case Western Reserve University, 2024, Epidemiology and Biostatistics

    Interpreting sequencing data precisely is often the primary task in genomic research, aiming to uncover gene expression alterations associated with various phenotypes. Biopsy or tissue samples collected in clinical and research settings are typically a mosaic of at least several pure cell types. The observed changes in gene expression could be caused by variations in cell type compositions or differentially expressed (DE) genes within specific cell types. Therefore, cellular deconvolution is a critical step before the cell-type-specific Differentially Expressed (csDE) gene study. Many statistical approaches have been proposed for csDE studies. However, a systematic review that examines the assumptions underlying these models and how these assumptions influence their performances under different scenarios has not yet been conducted. Additionally, there is a lack of statistical tools to assess the powers of csDE studies. Furthermore, current deconvolution methods largely depend on the assumption that all subjects share an identical population-level reference panel, which ignores inter-subject heterogeneities. This may compromise the validity of results, especially in studies that involve repetitive and longitudinal measurements. Moreover, while machine learning and deep learning-based deconvolution methods have been extensively developed for bulk transcriptomic data such as RNA-seq and microarrays, their application to imaging data, such as Immunohistochemistry (IHC), remains unexplored. We first benchmarked a few popular statistical models for detecting csDE genes between different phenotype-of-interests. Based on our comprehensive and flexible data simulation pipelines, we developed a power evaluation toolbox, cypress, to guide researchers in designing experiments for csDE studies. cypress can conduct extensive simulations using existing or provided parameters, model biological/technical variations, and provide thorough assessments by multiple metrics. Additio (open full item for complete abstract)

    Committee: Hao Feng (Advisor); Fredrick R. Schumacher (Committee Chair); Qian Li (Committee Member); Jenný Brynjarsdóttir (Committee Member); Lijun Zhang (Committee Member) Subjects: Bioinformatics; Biostatistics; Genetics; Public Health; Statistics
  • 6. Pan, Yiheng Leveraging Real-world Patient Data for Health Outcome Research

    Doctor of Philosophy, Case Western Reserve University, 2025, EECS - Computer and Information Sciences

    This dissertation explores the potential of real-world data (RWD) to advance health outcomes research, focusing on disease network analysis and treatment evaluation with diverse datasets such as FAERS, UK Biobank, TriNetX, and Explorys. In the domain of disease networks, association rule mining identified hypothyroidism, hyperthyroidism, and type 2 diabetes as novel comorbidities of opioid use disorder (OUD), validated with electronic health records. Causal network analysis highlighted the impact of unobserved confounding on complex disease interactions, emphasizing the need for advanced approaches and integrating other rich health data. In treatment evaluation, findings indicate moderate associations between gabapentin, pregabalin, and cardiovascular risks in diabetic neuropathy and fibromyalgia patients. Ketamine's impact on reducing suicidal ideation in major depressive disorder (MDD) patients was also demonstrated. These findings highlight the unique value of RWD in uncovering treatment outcomes often unattainable in controlled clinical settings. The findings are expected to have implications for both artificial intelligence (AI) development and public health. By advancing data-driven methods to manage complex, heterogeneous datasets, this research emphasizes the need of the refinement of AI models designed for real-world healthcare scenarios. In public health, the insights gained from RWD analyses can inform policy decisions and promote personalized medicine.

    Committee: Rong Xu (Advisor) Subjects: Bioinformatics; Computer Science
  • 7. Hoskins, Emily Leveraging multi-omics and big data to detect and describe rare genomic alterations in cancer that can potentially be targeted with precision therapies

    Doctor of Philosophy, The Ohio State University, 2024, Biomedical Sciences

    Cancer is a complex disease that arises from acquired mutations in normal cells, resulting in uncontrolled cell growth. The body harbors defensive mechanisms to prevent cancer cells from proliferating by repairing mistakes in DNA replication and eradicating abnormal pre-cancerous cells. However, cancer cells can acquire mutations that allow them to surpass this defensive barrier and continue to develop into a malignant disease. Since cancer develops from diverse alterations, the tumor biology of each patient is unique, facilitating a need for customizable treatments. Fortunately, genomic sequencing has enabled researchers and clinicians to identify and target the genomic alterations driving individual patients' cancers. Targeted therapy, including immunotherapy and tyrosine kinase inhibitors, have significantly improved the overall survival and quality of life of cancer patients. Established biomarkers associated with good response to a specific treatment help guide treatment. For example, established biomarkers for immune checkpoint inhibitor therapy, a type of immunotherapy, includes tumor mutational burden, microsatellite instability, and immunohistochemistry of programmed cell death ligand 1 (PD-L1). Many gene fusions and short variants involving kinase genes, including ALK, ROS1, FGFR1, FGFR2, FGFR3, RET, NTRK1, NTRK2, and NTRK3, can be targeted with clinically-approved tyrosine kinase inhibitors (TKIs). However, not all patients are eligible for targeted therapy. In this work, we strive to expand this eligibility by identifying and describing oncogenic alterations that may be clinically targetable, with a focus on specific type of genomic alteration. Structural variations, which are rare chromosomal rearrangements that can lead to carcinogenesis, include copy number variations, large deletions, tandem duplications, large insertions, and translocations, otherwise known as gene fusions. Here, leveraging large data sources, we identify and describe rare oncogeni (open full item for complete abstract)

    Committee: Sameek Roychowdhury (Advisor); Daniel Stover (Committee Member); Lianbo Yu (Committee Member); Robert Baiocchi (Committee Member); Amanda Toland (Committee Member) Subjects: Bioinformatics; Biology; Genetics; Medicine
  • 8. Penaloza, Jacqueline Exploring Long non-coding RNAs in Congenital Heart Disease Etiology through Multiomic Computational Techniques

    Doctor of Philosophy, The Ohio State University, 2024, Biomedical Sciences

    This dissertation explores the role of long non-coding RNAs in congenital heart disease (CHD), the most common birth defect worldwide, where many genetic contributors remain unknown. We investigate lncRNAs in two key areas: copy number variants (CNVs) and single nucleotide variants (SNVs), aiming to identify novel lncRNA candidates and develop new tools for predicting their pathogenicity. While CNVs are known to contribute to CHD, the involvement of lncRNAs within these regions has been largely unexplored. Additionally, no existing computational tools specifically assess the pathogenic potential of heart-specific lncRNA variants, presenting a critical gap in CHD research. The first study focuses on identifying lncRNAs within CNVs that are linked to CHD. By combining CNV data with transcriptomic information from human heart development, we discover lncRNA candidates that may play significant regulatory roles in heart formation. This study provides a reproducible platform for investigating lncRNAs in the context of CNVs, advancing our understanding of their contributions to CHD. The second study introduces HeartiLNC, a novel machine learning-based score designed to predict the pathogenicity of SNVs in lncRNAs. This tool integrates heart- specific lncRNA expression profiles, population frequency data, and RNA secondary structure analysis. HeartiLNC offers a unique approach to identifying potentially deleterious variants that may contribute to CHD. This method represents a significant advancement in computational genomics, offering new insights into the genetic regulation of heart development. Together, these studies contribute to the understanding of lncRNAs in CHD, providing new tools for genetic analysis. This work highlights the importance of investigating both CNVs and SNVs in the non-coding genome to uncover the complex genetic architecture underlying CHD. We hope that one day these findings will lead to improvement in CHD diagnosis (open full item for complete abstract)

    Committee: Peter White (Advisor); Kim McBride (Advisor); Michelle Wedemeyer (Committee Member); Ralf Bundschuh (Committee Member); Dawn Chandler (Committee Member) Subjects: Bioinformatics; Biomedical Research
  • 9. Habegger, Alexander Quantum Computing in Protein Folding: Integrating Lattice Models and Energy Functions

    Master of Science in Computer Science, Miami University, 2024, Computer Science and Software Engineering

    Protein folding is an NP-Hard problem in computational biology due to the three-dimensional nature of proteins and the vast conformational space. Quantum computing shows promise in addressing this challenge by using unique quantum algorithms to explore protein folding landscapes more efficiently. Key contributions of this thesis include: the Relative Normalized Movement Score, a novel metric for evaluating protein structure fidelity in a distance-independent manner, novel energy models, such as the Hydrophobic-Polar-Acidic-Basic (HPAB) model, and the development and application of CA-2-HCOMB, a Protein Chain Lattice Fitting (PCLF) program, which simplifies protein structures by fitting alpha-carbon traces into discrete lattice models to reduce computational complexity while maintaining structural fidelity. Building upon previous research, we formulated a novel energy encoding for both the Triangular Prismatic Honeycomb (HCOMB-8) lattice model and the Tetrahedral-Octahedral Honeycomb (HCOMB-12) lattice model, also known as the face-centered cubic (FCC) lattice. These lattice models enable more precise protein structure modeling and can be reduced to a Quadratic Unconstrained Binary Optimization (QUBO) problem. To demonstrate the proof of concept, we folded a small peptide sequence using the HCOMB-8 and HCOMB-12 models with the LeapHybrid Algorithm on D-Wave's quantum computer.

    Committee: Khodakhast Bibak (Advisor); Chun Liang (Committee Member); James Kiper (Committee Member) Subjects: Bioinformatics; Biology; Computer Science
  • 10. Burkey, Carren Biocontrol of Pythium Pathogens in Hydroponic Greenhouse Systems: "Water, Just Perfect for Water Molds"

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2024, Biological Sciences

    Due to the drastic climatic changes, the fresh market production of vegetables like spinach, lettuce, and arugula is shifting to hydroponic greenhouse operations. In hydroponic systems, the Pythium species is a problematic plant pathogen and can be introduced on airborne dust particles from neighboring farm fields. It causes root rot that results in stunting or yellowing of leaves. Hydroponically farmed vegetables are mostly eaten raw, so chemical control is not an option for controlling Pythium infections. In this study, I sought to investigate microbial biocontrol as a safer way to manage this challenge. To identify bacterial antagonists of Pythium, I surveyed a collection of 192 pseudomonads from a Lake Erie diatom bloom, also known to contain oomycetes, and 96 bacterial strains from soils around Wood County. Using a high throughput competitive plate assay, I have identified and sequenced nine strains of Pseudomonas fluorescens that exhibit contact-dependent killing of Pythium dissotocum (A1, A2, SP3, SP2), P. oopapillum, P. ultimum, P. heterothalicum, Saprolegnia parasitica, and several other yet to be identified Pythium isolates from commercial greenhouses around the US [TBL (isolate from butter lettuce), TLC (isolate from leaf lettuce), TA (isolate from arugula), E1 and CAL] at a lowest concentration of 4000 CFUs/ml. To address the utility of LE6_D7 as a biopesticide in a complex hydroponic mixture, 50 ml of bacterial (LE6_D7) culture (OD600 2.5) was added to a 5 L tub of contaminated nutrient solution from our experimental hydroponic system and incubated for 24 hours. Aliquots of the experimental nutrient solution treated with LE6_D7 post 24 hours were filtered and the filters were grown overnight on antibiotic V8 agar plates. LE6_D7 kills over 90% of the Pythium propagules in a complex hydroponic solution containing algae, other bacteria, and Pythium. Bioinformatic analysis of the P. fluorescens (LE6_D7) sequence that was used in mutation experiments, indica (open full item for complete abstract)

    Committee: Paul Morris Ph.D. (Committee Chair); Julia Halo Ph.D. (Committee Member); Vipaporn Phuntumart Ph.D. (Committee Member); Christopher Ward Ph.D. (Committee Member); Deborah Wooldridge Ph.D. (Other) Subjects: Bioinformatics; Biology; Microbiology; Molecular Biology
  • 11. Luu, Hoang What Will Our Forests Look Like in the Future? Modeling Regeneration Dynamics and Their Effects on Species Composition and Management Practices Under Climate Change

    Doctor of Philosophy (PhD), Ohio University, 2024, Plant Biology (Arts and Sciences)

    This dissertation enhances a forest gap model (ForClim) by incorporating seed production and seedling establishment processes, addressing a critical gap in understanding forest regeneration under climate change. The regeneration of forests in the Pacific Northwest (PNW) is a key driver of biodiversity, shaping species composition and ecosystem structure, and climate change is expected to significantly alter these processes, leading to shifts in both biodiversity and timber productivity. Simulations in this study revealed that seedling survival plays a more critical role than seed production in determining future species composition, particularly as climate variability increases. Resilient species like Pseudotsuga menziesii and Pinus ponderosa may sustain or increase their dominance, while species such as Abies grandis and Tsuga mertensiana face declines due to reduced seedling survival. Additionally, current forest management practices may need adjustment, with "no management" maximizing harvest volume for Coastal Douglas fir, while Mountain Douglas fir may experience reduced yields under future extreme climate scenarios. These findings highlight the importance of integrating regeneration processes into forest models to predict forest biodiversity and timber industry outcomes.

    Committee: Rebecca Snell (Advisor) Subjects: Applied Mathematics; Bioinformatics; Biology; Biostatistics; Ecology; Environmental Management; Environmental Science; Environmental Studies; Natural Resource Management; Plant Biology
  • 12. Wang, Chao The dysregulation of repetitive elements in human cancers and their role in circular RNA formation

    Doctor of Philosophy, Miami University, 2024, Cell, Molecular and Structural Biology (CMSB)

    The dissertation is structured into five chapters. Chapter 1: I provided an overarching introduction to the dissertation, including the dysregulation of repetitive elements (REs) in human cancer genomes and the importance of repeat-derived reverse complementary matches (RCMs) in forming a new type of RNA: circular RNA. Chapter 2: I investigated the dysregulation of transposable elements (TEs) in osteosarcoma (OS) cancer patients by integrative analysis of RNA-seq, whole-genome sequence (WGS), and methylation data. I found that TEs, including LINE-1, Alu, SVA, and HERV-K, are significantly up-regulated in OS tumors at the subfamily level. By filtering polymorphic TE insertions, I discovered that most OS patient-specific TE insertions (3175 out of 3326) are germline insertions associated with genes critical for cancer development. In addition to 68 TE-affected cancer genes, I found recurrent germline TE insertions in 72 non-cancer genes with high frequencies among patients. I also found reduced LINE-1 (young) and Alu methylation levels in OS tumor samples. Finally, with TE activities in OS tumors, I showed that higher TE insertions are associated with a longer event-free survival time. Chapter 3: I determined the differentially expressed REs at locus-specific levels stratified by their genomic context (i.e., genic or intergenic REs) among 12 common cancer types. I found uniquely dysregulated genic REs associated with distinct biological functions and intergenic REs containing important information to cluster different sample types. In addition, I found that genes associated with recurrently up-regulated REs are involved in the cell cycle process, whereas the extracellular matrix is associated with recurrently down-regulated REs. Furthermore, 4 out of 5 REs consistently down-regulated across 12 cancer types are located in the same intronic region of a tumor suppressor gene: TMEM252. TMEM252 is down-regulated in 10 out of 12 cancer types. Finally, with the DNA met (open full item for complete abstract)

    Committee: Chun Liang (Advisor); Tereza Jezkova (Committee Chair); Haifei Shi (Committee Member); Michael J. O'Connell (Committee Member); Michael Robinson (Committee Member) Subjects: Bioinformatics
  • 13. Olanrewaju, Gbolaga Integrated Omics Investigation of the Gravitropic Signaling Pathway in Arabidopsis thaliana: Insights From Spaceflight and Ground-Based Experiments

    Doctor of Philosophy (PhD), Ohio University, 2024, Molecular and Cellular Biology (Arts and Sciences)

    Gravity is a fundamental driving force of plant evolution, profoundly influencing numerous developmental and growth processes in plants. Gravity's most evident impact is the provision of directional cues to germinating seeds, guiding the roots downward and shoots upward. Known as gravitropism, this directional response to gravity is crucial to plants' overall health and productivity. Although biochemical and physiological studies have identified key ionic, chemical, and genetic factors involved in gravitropic signaling, the coordination of these actors remains poorly understood. Recent advances in omics technologies and the emergence of ways to isolate the effects of gravity on plants such as spaceflight experiments aboard the International Space Station (ISS), ground-based simulated gravities using clinostat and random positioning machines, and simple reorientation experiments have provided opportunities to investigate the molecular intricacies of this signaling cascade. Hence, this dissertation utilized both transcriptomics and proteomics to investigate gravitropic signaling in Arabidopsis plants during spaceflight in the Biological Research In Canister – Light Emitting Diode (BRIC LED) hardware. The results revealed key adaptive responses to the spaceflight environment, including destabilization and rearrangement of cell wall components, increased metabolic energy demands, and hypersensitivity of the photosystem. These adaptations were accompanied by a lack of direct correlation between transcriptomics and proteomics datasets, prompting further analysis using statistical and machine learning models. It was found that comparisons at the metabolic pathways level provided more comprehensive insights than simple gene-to-protein correlations. In addition, a meta-analysis of four existing plant proteomics datasets from spaceflight experiments aboard the ISS was conducted to assess variability. Factors such as spaceflight hardware, seedling age, li (open full item for complete abstract)

    Committee: Sarah Wyatt (Advisor); Michael Held (Committee Member); Erin Murphy (Committee Member); Allan Showalter (Committee Member) Subjects: Bioinformatics; Cellular Biology; Molecular Biology; Plant Biology
  • 14. Xu, Wanying Deciphering Regulatory Circuits in Mammalian Cells

    Doctor of Philosophy, Case Western Reserve University, 2025, Genetics

    Gene regulation is an extremely essential process including transcription, post-transcription and translation which happens all the time within each individual cell. Different cell types show distinctive gene expression pattern. Dysregulation of any critical genes would cause various human diseases or defects in human development. Understanding the mechanism behind gene regulation has becoming a crucial question waited to be addressed. Genetics variant happening on either gene coding or non-coding region has shown to be associated with gene expression which led to various of downstream effects. For example, genome-wide associated analysis (GWAS) study has provided over 70,000 genetic variants associated with human diseases. Moreover, histone modification changes, such as enhancer and promoter signal, has a great chance to disrupt gene expression as well. In this case, it turns to be difficult to draw a common mechanism to explain how a given gene is regulated among different tissues and cell types. In recent, 3D genome interaction as another layer information within nuclei including compartment, topological associated domains (TADs) and chromatin loops mainly mediated by architectural protein CTCF and cohesin has been shown to be very essential to gene regulation. Therefore, we firstly explored how STAG2 truncated mutation influence glioblastoma pathology since STAG2 as a subunit of cohesin complex is the most commonly mutated protein in a wide range of cancer. Our results suggested that STAG2 has minor effects on 3D genome interactions and downstream targets of STAG2 mutation are various from cell line to cell line. We observed that polycomb signaling gets affected by STAG2 mutation. Hence, we assumed that glioblastoma may be associated with STAG2-mediated polycomb signaling changes. Other than cohesin, we explored how insulator protein CTCF govern gene regulation process within mouse embryonic stem cells (mESCs). In this case, we annotated all functional insul (open full item for complete abstract)

    Committee: Fulai Jin (Advisor); Ming Hu (Committee Member); Zhenghe Wang (Committee Member); Thomas LaFramboise (Committee Chair) Subjects: Bioinformatics; Biomedical Research
  • 15. Fries, Brian Mass Spectrometric Investigations of Phosphoproteins in Cell Culture and Primary Colon Cancer Samples

    Doctor of Philosophy, The Ohio State University, 2024, Chemistry

    Colon cancer is projected to become the third leading cause of death amongst Americans by 2024. Early onset colorectal cancer (CRC) is also increasing, with predictions showing CRC to be the leading cause of cancer mortality in people between the ages of 20 and 49 in the United States (US) by 2030. Increasing our understanding of this disease will allow quicker, less invasive, and more accurate diagnosis and treatment options. One such tool to deepen our chemical understanding of CRC is mass spectrometry. Mass spectrometry (MS) has allowed us to broadly survey the proteins (proteomics), lipids (lipidomics), and metabolites (metabolomics) within a biological sample. All of these -omics disciplines using MS are able to identify a peptide and/or small molecule by matching various tandem MS spectra to either previously collected or in silico generated library of tandem MS spectra. Statistical analysis of the normalized intensities of these molecules are compared to control samples to determine if a particular analyte is differentially expressed. A literature review of these topics is described in Chapter 1 of this thesis. Starting in Chapter 2, this thesis describes how MS-based omics has been used to deepen our understanding of the biology of a three-dimensional CRC cell culture model when exposed to various different chemotherapeutics. Chapters 2 and 3 describe the distinct molecular differences between two different CRC cell lines inhibited with two generations of Fatty Acid Synthase (FAS) inhibitors. FAS is the enzyme responsible for synthesizing the 16- chain saturated fatty acid (FA) palmitate, supplying the cellular FA pool with palmitate to be used to synthesize more complex lipids. FAS is also observed to be upregulated in various cancers, increasing the interest in drugging this target for therapy. It was observed that the first generation inhibitor caused drastic morphological changes to CRC spheroids. Using untargeted (open full item for complete abstract)

    Committee: Amanda Hummon (Advisor); Abraham Badu-Tawiah (Committee Member); Vicki Wysocki (Committee Member) Subjects: Analytical Chemistry; Bioinformatics; Chemistry
  • 16. Yan, Ming Elucidating the Under-explored Genomic Diversity and Metabolic Potential of the Rumen Microbiome through Multi-Omics Approaches

    Doctor of Philosophy, The Ohio State University, 2024, Animal Sciences

    The rumen hosts a diverse array of prokaryotic (bacteria and archaea) and eukaryotic (fungi and protozoa) microbes. Collectively, they hydrolyze complex plant cell wall materials into simple sugars, which are further fermented into VFA, representing a substantial source of the host's energy needs. By incorporating inorganic ammonia generated from feed protein and urea, rumen microbes also provide a significant portion of the host's protein requirements. As regulators of the microbial ecosystem, rumen viruses (bacteriophages and eukaryotic viruses) also influence rumen fermentation and microbial protein synthesis. They achieve this by directly lysing microbes, thereby modulating microbial composition or by modifying the metabolism of infected bacterial cells. Additionally, they drive co-evolution between microbes and viruses, acting as vectors for horizontal gene transfer or through dynamic defense and counterdefense interactions with microbes. The anaerobic microbial cultivation techniques developed by Robert Hungate enable rumen microbiologists to explore the diverse spectrum of rumen microbial physiology and metabolism. However, despite continuous efforts in anaerobic cultivation, the culturable rumen microbes (including viruses) represent only a limited fraction of the overall diversity. Moreover, microbial cultures, whether monocultures or mixed cultures, fail to fully replicate the intricate microbial interactions observed in vivo, such as cross-feeding and predatory conditions. Fortunately, multi-omics technologies complement traditional culture-dependent analyses, enabling us to explore microbial ecology by uncovering the genomes (via genome-resolved metagenomics) and metabolism (via metatranscriptomics, metaproteomics and enzymatic activities) of the unculturable majority. Utilizing advancements in multi-omics and bioinformatics, this research aims to bridge the gap in rumen microbial genomics within the context of microbial ecology and rumen fermentation (open full item for complete abstract)

    Committee: Zhongtang Yu Dr. (Advisor); Jeffrey Firkins Dr. (Committee Member); Tansol Park Dr. (Committee Member); Chanhee Lee Dr. (Committee Member); Alejandro Relling Dr. (Committee Member) Subjects: Animal Sciences; Bioinformatics; Microbiology
  • 17. Parsons, Danielle Big data in biodiversity science: using specimen-based biodiversity data to elucidate hidden diversity

    Doctor of Philosophy, The Ohio State University, 2024, Evolution, Ecology and Organismal Biology

    While over 1 million species have been formally described, this number likely represents just 1-10% of the actual number of existing species. Because species are the basic units of biological classification, the challenges posed by this discrepancy span beyond the field of systematic biology, with implications in conservation, biodiversity, evolutionary theory, and the prevention of zoonotic disease. Our ability to effectively address these challenges is hindered by the presence of cryptic species, defined as two or more distinct species that have been misclassified as a single species due to physical similarity. Fortunately, while the common notion of species discovery usually involves expeditions to remote regions, many undescribed cryptic species are already present in natural history collections. These collections provide critical data for the biological sciences by housing not only physical specimens, but also a wide variety of associated metadata (e.g., locality, life history traits, genetic sequences, etc.). My dissertation work will facilitate species discovery by providing a framework through which this information can be utilized to answer longstanding questions about cryptic diversity. In Chapter 2, I use publicly available genetic data to estimate levels of cryptic diversity in mammals (order: Mammalia) and develop a machine learning model using trait data to identify characteristics that make certain mammals more likely to harbor cryptic species. I find that small-bodied taxa with large, climatically variable ranges are most likely to contain cryptic diversity. In Chapter 3, I apply this framework to salamanders (clade: Caudata), a group which differs from mammals in several key aspects, including species richness and sampling intensity. While I was not able to pinpoint a specific predictor variable as the most important for predicting undescribed diversity, results of the machine learning model suggest that cryptic diversity in salamanders is likel (open full item for complete abstract)

    Committee: Bryan Carstens (Advisor); Ryan Norris (Committee Member); Andreas Chavez (Committee Member) Subjects: Bioinformatics; Biology; Genetics; Molecular Biology; Museum Studies; Organismal Biology; Zoology
  • 18. SARPONG, DAVID Characterizing a Toxin-Antitoxin Locus in Shigella flexneri

    Doctor of Philosophy (PhD), Ohio University, 2024, Molecular and Cellular Biology (Arts and Sciences)

    Members of the genus Shigella are Gram-negative, non-spore-forming bacilli that cause shigellosis, a severe bacillary dysentery in humans, mostly affecting children under the age of five, immunocompromised individuals, and people living in developing countries. With an estimated 27,000 annual cases of antibiotic-resistant Shigella infections in the US alone, and no success in vaccine development, Shigella infections pose a serious health concern, creating the need for more targeted therapeutics. Key to the development of novel therapeutics against Shigella is a comprehensive understanding of the diverse molecular strategies that underlie the pathogen's physiology. An emerging phenomenon in bacterial gene expression is that of Toxin-antitoxin (TA) systems. TA systems are dual component genetic loci in bacteria, producing two genes, a toxin gene whose expression is lethal to the organism producing it, and an antitoxin which protects the organism from the unwanted expression of the toxin. There are currently VIII TA systems studied in bacteria, differing based on whether the toxin or antitoxin is a protein or an sRNA, as well as mechanism of action of either the toxin or antitoxin. This dissertation characterizes a Toxin-Antitoxin (TA) system in Shigella flexneri, focusing on the ryf locus, which includes the toxin-encoding ryfA gene and two small RNAs (sRNAs), ryfB and ryfB1, with distinct regulatory functions. The 305-nucleotide toxin RNA, ryfA, inhibits bacterial growth by inducing membrane lysis and ATP depletion. ryfB, approximately 100 nucleotides in length, neutralizes ryfA's toxicity via nucleic acid complementarity, without reducing its transcript abundance. ryfB1, although 77% identical to ryfB, does not function as an antitoxin but modulates global gene expression, mostly metabolism. Initial in silico analyses identified key genetic elements within the ryf locus, including promoters, open reading frames, and Shine-Dalgarno sequences, as well as potential tar (open full item for complete abstract)

    Committee: Erin Murphy (Advisor); Tingyue Gu (Committee Chair); Peter Coschigano (Committee Member); Nathan Weyand (Committee Member) Subjects: Bioinformatics; Biology; Biomedical Research; Microbiology; Molecular Biology
  • 19. Powell, Joseph Integration of Digital Health Resources for Deep Phenotypic Remote Monitoring of Patient Health

    Doctor of Philosophy, Case Western Reserve University, 2024, Systems Biology and Bioinformatics

    The rapid advancement of personal wearable devices has allowed for the inception of novel applications of deep phenotyping for characterization of disease. The need to advance deep phenotyping and analysis methods for personalized wearable devices is crucial to the advancement of personalized remote patient monitoring. We developed an end-to-end digital health infrastructure designed for fast, secure, and effective patient recruitment, data collection, and analysis reporting. We analyzed the efficacy of patient recruitment through our end-to-end patient interface and found that recruitment methods from traditional means such as through clinical sources and university sources resulted in more consents ([0.015, 0.030]; p << 0.001) and more active patients initially (2 = 23.65; p < 0.005). Additionally, we noted that online recruitment through Facebook advertising and Google advertising produced a more ethnically diverse population compared to regional clinical recruitment (2 = 231.47; p < 0.001). We investigated the use of the previously reported NightSignal algorithm, originally developed for SARS-CoV-2 detection, on the detection of abnormal resting heart rate observations for cardiothoracic surgical patients collected through our infrastructure. We found The NightSignal algorithm had a sensitivity of 81%, a specificity of 75%, a negative predictive value of 97%, and a positive predictive value of 28% for the detection of postoperative events. When compared to patients who did not experience a postoperative event, patients who did experience a postoperative event had a significantly higher proportion of red alerts issued by the NightSignal algorithm during the first 30 days after surgery (0.325 vs. 0.063; p<0.05)]. Finally, we then investigated the potential for latent subgroup identification using physiological parameters generated from personal wearable devices. We found latent subgroups at 30-days, 60-days, and 90-days post-operatively. Each latent group was we (open full item for complete abstract)

    Committee: Mark Cameron (Committee Chair); Jing Li (Committee Member); Wai Hong Wilson Tang (Committee Member); Xiao Li (Advisor) Subjects: Bioinformatics; Biomedical Research
  • 20. Mercer, Heather The Role of ADAR editing in Parkinson's Disease

    PHD, Kent State University, 2024, College of Arts and Sciences / Department of Biological Sciences

    Parkinson's Disease (PD) is a multifactorial disease with heterogenous phenotypes that vary across individuals, as well as by age and sex. Therefore, it is likely that multiple interacting factors, such as environmental influences and aging, as well as genetic factors, including dynamic RNA editing, via ADARs (Adenosine Deaminases Acting on RNA), may play a role in PD pathology. Here we explored changes in ADAR editing in PD in two datasets: one consisting of skeletal muscle transcriptomes from a small cohort of male PD patients and controls, including those that engaged in a rehabilitative exercise training program, and a second dataset of 317 transcriptomes of healthy controls, PD and prodromal patients aged 65 years or older, from the Parkinson's Project Markers Initiative dataset. We observed differences in ADAR expression, number of putative ADAR edits, editing index, and the number of high and moderate impact edits between control groups and diseased samples, between sexes, and between PD samples pre- and post-exercise, particularly when ADAR editing is associated with nonsense-mediated decay (NMD). Likewise, differentially expressed genes between comparison groups were linked to NMD-related pathways. NMD is an important process in detecting deleterious nonsense sequences in mRNA transcripts and eliminating them from the cell. Thus, NMD regulation serves an important role in neurodevelopment, neural differentiation, and neural maturation. RNA misprocessing, which includes dysregulation of NMD, is known to play an important role in neurodegenerative diseases such as amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia. Our results suggest that NMD may also be an important factor in PD physiology.

    Committee: Helen Piontkivska Ph.D. (Advisor) Subjects: Bioinformatics; Biology; Genetics