Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 76)

Mini-Tools

 
 

Search Report

  • 1. Keymanesh, Moniba Adapative Summarization for Low-resource Domains and Algorithmic Fairness

    Doctor of Philosophy, The Ohio State University, 2022, Computer Science and Engineering

    The wealth of data available at a single click often adds to the information overload problem. Summarization is an intuitive way to address this problem by constructing a condensed equivalent of the available data. However, the content of interest and the desired format or length are user-dependent. Most of the existing summarization systems yield generic summaries disconnected from users' preferences and agnostic about the salience of information in the target domain. Moreover, the neural summarization models require a large training corpus which is not available in many domains. Motivated by these limitations, we focus on controllable summarization that allows users to control different aspects of the generated summaries. (i) To enable users to control the length of summaries, we propose a multi-level summarizer(MLS), a supervised approach to construct abstractive summaries at controllable lengths. Following an extract-then-compress paradigm, we develop the Pointer-Magnifier network– a length-aware, encoder-decoder network that constructs length-constrained summaries by shortening or expanding a prototype summary inferred from the document. The key enabler of this network is an array of semantic kernels with clearly defined human-interpretable syntactic/semantic roles in constructing the summary given a desired length. We discuss this architecture in Chapter 2. (ii) We acknowledge that many recent advancements in summarization research, including sequence-to-sequence models, cannot be adopted in many domains due to the scarcity of training data for summarization. Legal contracts are considered a low-resource domain for the automatic text summarization task as the available training data is limited in this domain. On the other hand, unsupervised methods rely on structural features of documents, such as lexical repetition to identify and extract important content. These heuristics showed poor empirical performance on a few low-resource domains. In (open full item for complete abstract)

    Committee: Srinivasan Parthasarathy Prof. (Advisor); Tanya Berger-Wolf Prof. (Committee Member); Micha Elsner Prof. (Committee Member); Meow Hui Goh Prof. (Other) Subjects: Artificial Intelligence; Computer Science; Information Systems; Linguistics
  • 2. Uduehi, Oseremen Computational Models for Transformation-Based Combinational Creativity

    Doctor of Philosophy (PhD), Ohio University, 2024, Electrical Engineering & Computer Science (Engineering and Technology)

    Computational Creativity is an interdisciplinary field dedicated to developing systems that generate innovative ideas or artifacts, capturing the essence of human creativity in a form that machines can emulate and enhance. The goal of this dissertation is to contribute to the advancement of this field by introducing computational models and methodologies that can automatically produce creative artifacts from data. The core approach utilized involved learning transformations from data distributions and applying them to new, often out-of-distribution data, leading to the emergence of creative outputs. Models for generating creative artifacts are presented and explored across various domains. In the realm of symbolic representations, binary bit sequences and text are examined. Bit sequences are transformed to generate surprising patterns, while for text, a model is introduced for crafting creative short narratives. For raw representations, images of simple geometric shapes are transformed to produce novel outputs. In this dissertation, I also introduce methods for the detection of creative artifacts. First, distribution-based measurement approaches that evaluate and quantify the level of creativity in text by measuring their degree of surprise are presented. Next, an expectation-realization model is introduced to detect creative usage of words, specifically metaphors. The model achieves this by analyzing the deviation of the realized meaning of words in context from the expected literal words for the same context. Through detailed experiments, I demonstrate the practical implementations of the developed methodologies across various domains and datasets, presenting empirical evidence of their effectiveness in modeling creativity.

    Committee: Razvan Bunescu (Advisor); Liu Jundong (Advisor); Chang Edmond (Committee Member); Zhewei Wang (Committee Member); Rida Benhaddou (Committee Member); David Juedes (Committee Member) Subjects: Computer Science; Electrical Engineering; Technology
  • 3. Sridhar, Sarikaa Towards Sustainable Knowledge Gap Identification with Tiny Machine Learning Techniques

    Master of Science, The Ohio State University, 2024, Computer Science and Engineering

    Identifying the lack of cognitive capabilities in artificially intelligent systems has been a growing field and a necessary step. Knowledge gaps (KG) are a lack of insufficient information which may lead to poor cognitive capabilities. Knowledge gap identification can help predict where intelligent systems go wrong. This work proposes methods to identify knowledge gaps in Visual Question Answering (VQA) datasets. We created a model to automatically classify questions and image pairs into different knowledge gap categories that can later be used to resolve shortcomings of VQA models. Additionally, artificially intelligent systems often require several days to train for the system to learn complex features to provide the most accurate predictions. Testing or inferencing with trained models also requires huge amounts of energy and emits massive amounts of CO2. Thus, this work also aims to train a classification model which is the Knowledge Gap Identification (KGI) model in resource-constrained environments using TinyML (Tiny Machine Learning) techniques proposed by previous research. The two main techniques implemented are: Quantization-aware scaling and Sparse Update. Finally, this work aims to compare the original model with its tiny version (Sustainable KGI) using accuracy, processing time, energy consumed and estimation of carbon emission as evaluation metrics.

    Committee: John Paparrizos (Committee Member); Srinivasan Parthasarathy (Advisor) Subjects: Computer Engineering; Computer Science; Sustainability
  • 4. Maneriker, Pranav Ravindra The Role of Structure in Building Adaptive Machine Learning

    Doctor of Philosophy, The Ohio State University, 2024, Computer Science and Engineering

    The success of neural networks and the advent of specialized hardware such as GPUs has led to larger models with increasingly large unstructured datasets in machine learning. Curating and assembling a large, high-quality dataset is a time-consuming process. Further, training models on these datasets requires expensive computing resources. Some of these issues are alleviated with the advent of paradigms such as self-supervised and transfer learning. However, when the data drift and change over time, models must be periodically retrained to keep up. Graph structures, both implicit and explicit, are ubiquitous in Natural Language Processing. Implicit structures can be derived from language morphology, syntax, and semantics and expressed using attributed tree graphs. External structures capture world knowledge and semantics using knowledge graphs and ontologies. Additionally, textual data may have associated metadata in external graphs, such as network structure for social media interactions. In this dissertation, we posit that an abundance of associated structural information needs to be utilized for scaling and adaptation. The prevalence of these structures behooves us to utilize them to improvise, adapt, and overcome the challenges posed by scaling and drifts in data. In our work, we focus on three broad directions for using these structures: augmenting existing text models with structure, exploring the role of structure in creating adversarial testing samples, and structured-enhanced monitoring of model performance over time. The first direction that we explore is the impact of incorporating structure into text representation learning pipelines. In our first contribution, we study how the implicit structure of text data (here, URLs) can be used to design domain-specific losses and adversarial attacks to build a state-of-the-art system for phishing URL detection. This work comprehensively analyzes transformer models on the phishing URL detection ta (open full item for complete abstract)

    Committee: Srinivasan Parthasarathy (Advisor); Andrew Perrault (Committee Member); Micha Elsner (Committee Member); Amy Sheneman (Committee Member) Subjects: Computer Science
  • 5. Smith, Michael IDENTIFYING TOXIC EVENTS IN TIME

    MS, Kent State University, 2024, College of Arts and Sciences / Department of Computer Science

    Online communities have long suffered from issues caused by a lack of accountability for participants exhibiting toxic behaviors. Difficulty with providing effective moderation, sufficiently dissuading would-be offenders, identifying problem users, and mitigating toxic activity in real-time has led to an unwelcoming environment for users. It's difficult to effectively police communication networks to provide safe environment's when participants are both anonymous and cannot be sufficiently identified as problematic. Our study employs temporal multivariate data mining and pattern analysis, and natural language processing techniques to examine organic conversations across a large collection of online gaming communities' messages. By analyzing instances of toxic behavior, arguments, and profane conversation, our objective is to identify the distinct features that characterize toxicity in digital environments. Our study analyzed conversational data extracted from four video game focused Discord communities. The dataset encompasses a rich collection of 685,432 public messages. Using the Perspective API, messages were classified against six metrics relating to toxicity. To elucidate the temporal dynamics and complex patterns of these interactions, we employed Temporal Multidimensional Scaling and utilized a Shannon Entropy Visualization method. Additionally, manual review was performed on a subset of 140,000 comments' worth of toxic events. We then leveraged BERTopic for cluster analysis to deduce related thematic concerns. For a nuanced representation of these themes, we customized the topic modeling using OpenAI's GPT-3.5 Turbo language model, enriching our understanding of the contextual underpinnings of toxicity in online gaming discourse. Our study found that toxic events occurred without warning and rapidly dissipated as the conversation went on. Toxicity is extremely rare relative to the general activity of the community and is largely contributed by eith (open full item for complete abstract)

    Committee: Ruoming Jin (Advisor) Subjects: Artificial Intelligence; Computer Science
  • 6. Agarwal, Ankita Data-Driven Strategies for Disease Management in Patients Admitted for Heart Failure

    Doctor of Philosophy (PhD), Wright State University, 2023, Computer Science and Engineering PhD

    Heart failure is a syndrome which effects a patient's quality of life adversely. It can be caused by different underlying conditions or abnormalities and involves both cardiovascular and non-cardiovascular comorbidities. Heart failure cannot be cured but a patient's quality of life can be improved by effective treatment through medicines and surgery, and lifestyle management. As effective treatment of heart failure incurs cost for the patients and resource allocation for the hospitals, predicting length of stay of these patients during each hospitalization becomes important. Heart failure can be classified into two types: left sided heart failure and right sided heart failure. Left sided heart failure can be further divided into two types: systolic heart failure or heart failure with reduced ejection fraction (HFrEF) and diastolic heart failure or heart failure with preserved ejection fraction (HFpEF). As right sided heart failure develops as a result of left sided heart failure, it is important to predict the two types of heart failures categorized based on their ejection volume to manage heart failure. Electronic Health Records (EHRs) of the patients contain information about the diagnostic codes, procedure reports, physiological vitals, medications administered, and discharge summary for each hospitalization. These EHRs can be leveraged to build predictive models to predict outcomes like length of stay and type of heart failure (HFrEF or HFpEF) in the patients. However, these predictive models can be demographically biased and so can lead to unfair decisions. Thus, it is necessary to mitigate these biases in the predictive models without impacting their performance on downstream tasks. In this regard, first I used diagnostic codes and procedure reports of the heart failure during each hospitalization to identify their clinical phenotypes through a probabilistic framework, using Latent Dirichlet Allocation (LDA). I found 12 clinical phenotypes in the form of (open full item for complete abstract)

    Committee: Tanvi Banerjee Ph.D. (Committee Co-Chair); William L. Romine Ph.D. (Committee Co-Chair); Krishnaprasad Thirunarayan Ph.D. (Committee Member); Lingwei Chen Ph.D. (Committee Member); Mia Cajita Ph.D. (Committee Member) Subjects: Computer Engineering; Computer Science
  • 7. Sain, Joy Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries

    Doctor of Philosophy (PhD), Wright State University, 2023, Computer Science and Engineering PhD

    Information Extraction (IE) techniques are essential to gleaning valuable information about entities and their relationships from unstructured text and creating a structured representation of the text for downstream Natural Language Processing (NLP) tasks including question answering, text summarization, and knowledge graph construction. Supervised Machine Learning (ML) techniques have been widely used in IE. While the resulting extraction algorithms are very effective, they require a large amount of annotated data, which can be expensive to acquire and time-consuming to create. Additionally, creating high-quality gold-standard annotations can be challenging, particularly when dealing with new domains or languages that lack sufficient resources to facilitate annotations. This dissertation develops minimally-supervised approaches to extract Named Entities (NEs) from text, specifically addressing the challenges arising from using distantly-supervised techniques for NE extraction from the text in which domain-specific dictionaries are used to automatically match and assign labels to data, which can subsequently be used to train an ML model for the extraction task. A key challenge in learning an effective ML model for distant learning techniques is the incompleteness of the dictionaries being used which can result in incomplete, partial, or noisy annotations. In case of incomplete or missing annotations, training a sequence labeling model for NER may result in suboptimal learning. To address these challenges, in this dissertation, I propose novel approaches to improve dictionary coverage that utilize a state-of-the-art phrase extraction technique and domain-specific dictionary to extract phrases from unlabeled text data. Leveraging the lexical, syntactic, and contextual features of the entities present in the initial dictionaries, I propose headword and span-based classification approaches to categorize the extracted phrases into corresponding entity classes. Th (open full item for complete abstract)

    Committee: Michael Raymer Ph.D. (Advisor); Krishnaprasad Thirunarayan Ph.D. (Advisor); Tanvi Banerjee Ph.D. (Committee Member); Charese Smiley Ph.D. (Committee Member) Subjects: Artificial Intelligence; Computer Science
  • 8. Okpala, Izunna Perception Analysis: A Knowledge Discovery and Inference Generation Approach to Crisis Informatics

    PhD, University of Cincinnati, 2023, Education, Criminal Justice, and Human Services: Information Technology

    This study explores perception analysis via the lenses of machine learning and natural language processing. To reflect the efficacy of machines in analyzing crisis situations, the research incorporated the concepts of knowledge discovery and inference generation through three interconnected studies. The first study demonstrates the capability of an intelligent system to capture human perception toward COVID-19. This intelligent system features text preprocessing, identification of cues in a sentence structure, and the use of the TextBlob tool to gauge the perception of the identified cues. The second study tackled one of the limitations of the first research - the issue of negation and multiple negatives in natural language processing. To solve this problem, the study employed a negation disambiguation framework when dealing with negations and multiple negatives. The third study combines the benefits of the first two studies in gauging human perception by taking into account sentence structure, contextualization, text transformation, and, most importantly, people's views with respect to a causative agent (entity). The three studies show a transition from one solution to another, an indication that an automated system can achieve better accuracy in classifying crisis-related data and ultimately improve response techniques available to various stakeholders.

    Committee: Jess Kropczynski Ph.D. (Committee Chair); Chengcheng Li Ph.D. (Committee Member); Shane Halse Ph.D. (Committee Member); Kelly Cohen Ph.D. (Committee Member) Subjects: Information Technology
  • 9. AlSlaiman, Muhanned Effective Systems for Insider Threat Detection

    Doctor of Philosophy (PhD), Wright State University, 2023, Computer Science and Engineering PhD

    Insider threats to information security have become a burden for organizations. Understanding insider activities leads to an effective improvement in identifying insider attacks and limits their threats. This dissertation presents three systems to detect insider threats effectively. The aim is to reduce the false negative rate (FNR), provide better dataset use, and reduce dimensionality and zero padding effects. The systems developed utilize deep learning techniques and are evaluated using the CERT 4.2 dataset. The dataset is analyzed and reformed so that each row represents a variable length sample of user activities. Two data representations are implemented to model extracted features in gray encoding (GE) and kernel density estimator (KDE) with cumulative distribution function (CDF). Additionally, sentiment analysis and unique coding are assigned to each category of user activities so that the detection model can distinguish all activities, the correlation between activities, and the temporal characteristics of the activities. The first detection system is a Long-Short-Term Memory (LSTM) network. The first detection system reduced FNR, but the performance degraded as the dataset's size increased. The second detection system combines convolutional neural networks (CNN) and LSTM networks. Processing and modeling of the dataset created two problems that hindered the performance of the previous two detection systems (1) dimensionality and (2) vanishing short rows due to zero padding. The last detection system aims to reduce the curse of dimensionality and short rows vanishing. Two neural models are utilized, embedding layer and autoencoder. The embedding layer removes padded zeros and produces dense embedded output. The autoencoder compresses the input data samples to a shorter length and feeds the processed data samples to the detection model. All detection systems presented a high performance in classifying users' activities and detecting insider threats. The first (open full item for complete abstract)

    Committee: Bin Wang Ph.D. (Advisor); Soon M. Chung Ph.D. (Committee Member); Meilin Liu Ph.D. (Committee Member); Zhiqiang Wu Ph.D. (Committee Member) Subjects: Artificial Intelligence; Computer Engineering; Computer Science; Engineering; Information Science; Information Technology
  • 10. Paudel, Prashish Improved Multimodal Data Acquisition and Synchronization through NLP Enabled Event Detection in Simulation-Based Medical Education

    Master of Science, University of Toledo, 2023, Engineering (Computer Science)

    Significant advancements have been made in the field of education due to the introduction of innovative technologies and methodologies. Notably, simulation-based learning has had a profound impact on various learning domains, including Healthcare, Aviation and Aerospace, Military, and Emergency Services, among others. The adoption of Simulation-Based Medical Education (SBME) in healthcare has proven effective for training and evaluating Healthcare Professionals (HCPs). Multimodal data from various levels such as the instructor, learner, and training environment is crucial for a comprehensive assessment of learners within SBME. Currently, these assessments are conducted using either paper-based scales or standard checklists. A platform that provides multimodal assessment capabilities at each of these levels is necessary. This research aims to enhance the data fidelity and availability of a novel multimodal assessment platform (PREPARE) that is used for learner assessment and performance monitoring during training and real-world events. Currently, the platform provides multimodal data acquisition; however, data collected at the instructor and training environment levels is not always synchronized with learner-level data. This research aims to address some of these limitations by incorporating Natural Language Processing (NLP). The goal is to detect the occurrence of key events occurring during training (via processing audio data collected at the training environment level) and to synchronize instructor (observer-based) assessment with learner-level performance data. We also introduce a foundation for automated performance assessment which is intended to measure learner performance that includes derivation of objective performance measures such as time to diagnosis, time to treatment/intervention, etc. The NLP-based module added to this existing platform has the potential to revolutionize the assessment process in SBME, providing more accurate and timely feedback (open full item for complete abstract)

    Committee: Liang Cheng (Committee Chair); Devinder Kaur (Committee Member); Scott Pappada (Committee Co-Chair) Subjects: Computer Engineering; Computer Science
  • 11. Jiang, Nanjiang The Why and How of Label Variation in Natural Language Inference

    Doctor of Philosophy, The Ohio State University, 2023, Linguistics

    Given a pair of sentences, a premise and a hypothesis, the task of natural language inference (NLI) consists of identifying whether the hypothesis is true (Entailment), false (Contradiction), or neither (Neutral), assuming that the premise is true. NLI is arguably one of the most important tasks for natural language understanding. Datasets have been collected in which pairs of sentences are annotated by multiple annotators with one of the three labels. However, it has been shown that annotation disagreement, or human label variation (Plank, 2022), is prevalent and systematic for NLI – human annotators sometimes do not give the same label for the same pair of sentences (Pavlick and Kwiatkowski, 2019, i.a.). Label variation questions the widespread assumption in natural language processing that each item has a single ground truth label and casts doubt on the validity of measuring models' ability to produce such ground truth labels. In this dissertation, I investigate the question of why there is label variation in NLI and how to build models to capture it. First I analyze the reasons for label variation from the perspective of linguists, by developing a taxonomy of reasons for label variation. I found that NLI label variation can arise out of a wide range of reasons: some are due to uncertainty in the sentence meaning, while others are inherent to the NLI task definition. However, it is unclear how well the perspective of linguists reflect that of linguistically-uninformed annotators. Therefore, I collect annotators' explanations for the NLI labels they chose, creating the LiveNLI dataset containing ecologically valid explanations. I found that the annotators' reasons for label variation are similar to the taxonomy across the board, but some other reasons also emerged. Explanations also reveal that there exists within-label variation: annotators can choose the same label for different reasons. There is thus a wide range of variation that NLI models should capture. (open full item for complete abstract)

    Committee: Marie-Catherine de Marneffe (Advisor); Michael White (Committee Member); Chenhao Tan (Committee Member); Micha Elsner (Committee Member) Subjects: Computer Science; Linguistics
  • 12. Chang, Shuaichen Reliable Natural Language Interfaces to Heterogeneous Structured Data

    Doctor of Philosophy, The Ohio State University, 2023, Computer Science and Engineering

    A vast amount of human knowledge and information has been structured and stored in heterogeneous formats, such as tables, relational databases, visualization images, etc. The development of natural language interfaces (NLIs) to structured data makes it easy and efficient for people to retrieve desired information from massive structured data by asking natural language questions. Such systems are required to understand the information from the data modality and text modality (natural language questions). Despite the impressive performance of machine learning models on various datasets, a good in-dataset result does not necessarily indicate model reliability. In the context of this dissertation, reliability encompasses two key aspects: (1) robustness to data perturbations and (2) the ability to generalize across various domains. This lack of robustness and generalizability in NLI systems prevents them from being widely adopted in real-world applications. Two main obstacles are preventing us from building a reliable system: (1) the limited benchmarks to evaluate the reliability of models, and (2) the mismatch between the text modality and data modality. In this dissertation, we present our work for improving the reliability of natural language interfaces to structured data. We focus on three common structured data types: tables, relational databases, and choropleth maps. To address the lack of evaluation data toward model reliability, we curate evaluation data by resplitting existing data or applying perturbations. We have three attempts in this direction. First, we resplit a table text-to-SQL dataset based on the frequency of table schemas. We find the shortcomings of models on unfamiliar table schemas, especially in a zero-shot setting which consists of schemas that models have never encountered before. We examine model generalizability that is covered by the existing evaluation setting. Second, we evaluate the robustness of relational database text-to-SQL m (open full item for complete abstract)

    Committee: Eric Fosler-Lussier (Advisor); Micha Elsner (Committee Member); Michael White (Committee Member) Subjects: Artificial Intelligence; Computer Engineering; Computer Science
  • 13. Bhandari, Nabin Speech-To-Model: A Framework for Creating Software Models Using Voice Commands

    Master of Science, Miami University, 2023, Computer Science and Software Engineering

    Traditionally, software modeling has relied on conventional input devices such as keyboards and mice. However, as new interaction methods become more popular, development environments must adapt to these evolving needs. Moreover, nontraditional interfaces offer the potential for improved accessibility. This thesis introduces an innovative framework for intelligent voice-driven software modeling. The framework leverages advanced technologies, including speech-to-text conversion, natural language processing, and domain-specific input commands. By combining these elements, this research presents a powerful and intuitive system that allows users to create software models through voice commands. The framework's effectiveness has been evaluated primarily via two different user studies and secondarily using cross-validation to evaluate trained machine learning models. The evaluation of the final implementation of the framework resulted in an average command accuracy of 79.4% and an average overall rating of 7.85 out of 10 from the participants. Overall, this study demonstrates the viability and potential of voice commands as an effective interface for software modeling. By embracing voice-driven interactions, this thesis aims to improve accessibility, user experience, and overall efficiency in software engineering.

    Committee: Eric Rapos (Advisor); Christopher Vendome (Committee Member); Xianglong Feng (Committee Member) Subjects: Computer Science; Engineering
  • 14. Zitu, Md Muntasir Adverse Drug Event Detection from Clinical Narratives of Electronic Medical Records Using Artificial Intelligence.

    Doctor of Philosophy, The Ohio State University, 0, Biomedical Sciences

    Electronic Health Records (EHRs) clinical narratives provide longitudinal information about drug-induced adverse events. However, it is time and labor-expensive to manually review those clinical narratives and extract adverse drug events (ADEs). A robust automated system needs to be included in current clinical settings for early detection of ADEs. So, building an automated system that uses Artificial Intelligence (AI) to process those clinical narratives and extract ADEs is in demand. Moreover, a generalized system will work on different types of clinical notes, thus reducing the technical dependencies and associated costs. Natural Language Processing (NLP), a field of AI, can automatically process free texts and extract semantic information. So, the central hypothesis of this research is that NLP models can automatically detect ADEs from unstructured EHRs. The long-term goal is to build an automated system in clinical settings for the early detection of ADEs. This dissertation has three aims that are connected to each other to accomplish the long-term goal. Aim 1 focuses on the generalizability of the NLP model to identify drug-induced ADEs from different EHR sources. The primary objective of Aim 1 is to evaluate the applicability of the NLP model in determining drug-induced ADEs across various EHR systems. To facilitate this goal, we also created a novel gold standard corpus. Aim 2 develops an ADE detection model to identify drug-induced adverse events at the patient level. Aim 3: Identify drug discontinuation information to develop a temporal model for the novel causal drug-ADE relation discovery.

    Committee: Lang Li (Advisor) Subjects: Bioinformatics; Biomedical Research; Oncology
  • 15. Guo, Feng Revisiting Item Semantics in Measurement: A New Perspective Using Modern Natural Language Processing Embedding Techniques

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2023, Psychology/Industrial-Organizational

    Language understanding plays a crucial role in psychological measurement and so it is important that semantic cues should be studied for more effective and accurate measurement practices. With advancements in computer science, natural language processing (NLP) techniques have emerged as efficient methods for analyzing textual data and have been used to improve psychological measurement. This dissertation investigates the application of NLP embeddings to address fundamental methodological challenges in psychological measurement, specifically scale development and validation. In Study 1, a word embedding-based approach was used to develop a corporate personality measure, which resulted in a three-factor solution closely mirroring three dimensions out of the Big Five framework (i.e., Extraversion, Agreeableness, and Conscientiousness). This research furthers our conceptual understanding of corporate personality by identifying similarities and differences between human and organizational personality traits. In Study 2, the sentence-based embedding model was applied to predict empirical pairwise item response relationships, comparing its performance with human ratings. This study also demonstrated the effectiveness of fine-tuned NLP models for classifying item pair relationships into trivial/low or moderate/high empirical relationships, which provides preliminary validity evidence without collecting human responses. The research seeks to enhance psychological measurement practices by leveraging NLP techniques, fostering innovation and improved understanding in the field of social sciences.

    Committee: Michael Zickar Ph.D. (Committee Chair); Neil Baird Ph.D. (Other); Richard Anderson Ph.D. (Committee Member); Samuel McAbee Ph.D. (Committee Member) Subjects: Psychological Tests; Psychology; Quantitative Psychology
  • 16. Williams, Scott Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning

    Master of Science (MS), Wright State University, 2023, Computer Science

    Obtaining accurate inferences from deep neural networks is difficult when models are trained on instances with conflicting labels. Algorithmic recognition of online hate speech illustrates this. No human annotator is perfectly reliable, so multiple annotators evaluate and label online posts in a corpus. Labeling scheme limitations, differences in annotators' beliefs, and limits to annotators' honesty and carefulness cause some labels to disagree. Consequently, decisive and accurate inferences become less likely. Some practical applications such as social research can tolerate some indecisiveness. However, an online platform using an indecisive classifier for automated content moderation could create more problems than it solves. Disagreements can be addressed in training by using the label a majority of annotators assigned (majority vote), training only with unanimously annotated cases (clean filtering), and representing training labels as probabilities (soft labeling). This study shows clean filtering occasionally outperforming majority voting, and soft labeling outperforming both.

    Committee: Krishnaprasad Thirunarayan Ph.D. (Advisor); Shu Schiller Ph.D. (Committee Member); Michael Raymer Ph.D. (Committee Member) Subjects: Computer Science
  • 17. Heng, E Jinq A Cloud Computing-based Dashboard for the Visualization of Motivational Interviewing Metrics

    Master of Science (MS), Wright State University, 2022, Computer Science

    Motivational Interviewing (MI) is an evidence-based brief interventional technique that has been demonstrated to be effective in triggering behavior change in patients. To facilitate behavior change, healthcare practitioners adopt a nonconfrontational, empathetic dialogic style, a core component of MI. Despite its advantages, MI has been severely underutilized mainly due to the cognitive overload on the part of the MI dialogue evaluator, who has to assess MI dialogue in real-time and calculate MI characteristic metrics (number of open-ended questions, close-ended questions, reflection, and scale-based sentences) for immediate post-session evaluation both in MI training and clinical settings. To automate dialogue assessment and produce instantaneous feedback several technology-assisted MI (TAMI) tools like ReadMI based on Natural Language Processing (NLP) have been developed on mobile computing platforms like Android. These tools, however, are ill-equipped to support remote work and education settings, a consequence of the COVID-19 pandemic. Furthermore, these tools lack data visualization features to intuitively understand and track MI progress. In this thesis, to address the aforementioned shortcomings in the current landscape of TAMI, a web-based MI data visualization dashboard tool ReadMI.org has been designed and developed. The proposed dashboard leverages the highperformance computing capacity of cloud-based Amazon Web Service (AWS) to implement the NLP-based dialogue assessment functionality of ReadMI and a vibrant data visualization capability to intuitively understand and track MI progress. Additionally, through a simple Uniform Resource Locator (URL) address, ReadMI.org allows MI practitioners and trainers to access the proposed dashboard anywhere and anytime. Therefore, by leveraging the high-performance computing and distribution capability of cloud computing services, ReadMI.org has the potential to reach the growing population of MI practitioner (open full item for complete abstract)

    Committee: Ashutosh Shivakumar Ph.D. (Committee Chair); Yong Pei Ph.D. (Committee Co-Chair); Thomas Wischgoll Ph.D. (Committee Member); Paul J. Hershberger Ph.D. (Committee Member) Subjects: Behavioral Psychology; Computer Engineering; Computer Science
  • 18. Lloyd, Benjamin An Investigation Into ALM as a Knowledge Representation Library Language

    Master of Computer Science, Miami University, 2022, Computer Science and Software Engineering

    Text parsing and natural language processing are well-researched and investigated areas of natural language, however, story understanding and natural language understanding are less so. In an attempt to create an efficient and effective way for computers to gain a better understanding of stories, we aim to give an analysis of the current state and effectiveness of ALM as a knowledge representation language. We wish to develop a library of commonsense knowledge on verbs that surpasses the existing libraries in breadth and effectiveness. The library will utilize the modular action language ALM, and draw inspiration from evaluations on COREALMLib and VERBNET. In addition, the tool available for analysis of ALM, TEXT2ALM, will be reviewed on its usefulness, modularity, and accuracy of execution.

    Committee: Daniela Inclezan (Advisor); Alan Ferrenberg (Committee Member); Norm Krumpe (Committee Member) Subjects: Computer Science
  • 19. Zhen, Wang Toward Knowledge-Centric Natural Language Processing: Acquisition, Representation, Transfer, and Reasoning

    Doctor of Philosophy, The Ohio State University, 2022, Computer Science and Engineering

    Past decades have witnessed the great success of modern Artificial Intelligence (AI) via learning incredible statistical correlations from large-scale data. However, a knowledge gap still exists between the statistical learning of AI and the human-like learning process. Unlike machines, humans can first accumulate enormous background knowledge about how the world works and then quickly adapt it to new environments by understanding the underlying concepts. For example, given the limited life experience with mammals, a child can quickly learn the new concept of a dog to infer knowledge, like a dog is a mammal, a mammal has a heart, and thus, a dog has a heart. Then the child can generalize the concept to new cases, such as a golden retriever, a beagle, or a chihuahua. However, an AI system trained on a large-scale mammal but not dog-focused dataset cannot do such learning and generalization. AI techniques will fundamentally influence our everyday lives, and bridging this knowledge gap to empower existing AI systems with more explicit human knowledge is both timely and necessary to make them more generalizable, robust, trustworthy, interpretable, and efficient. To close this gap, we seek inspiration from how humans learn, such as the ability to abstract knowledge from data, generalize knowledge to new tasks, and reason to solve complex problems. Inspired by the human learning process, in this dissertation, we present our research efforts to address the knowledge gap between AI and human learning with a systematic study of the full life cycle of how to incorporate more explicit human knowledge in intelligent systems. Specifically, we need first to extract high-quality knowledge from the real world (knowledge acquisition), such as raw data or model parameters. We then transform various types of knowledge into neural representations (knowledge representation). We can also transfer existing knowledge between neural systems (knowledge transfer) or perform human-like co (open full item for complete abstract)

    Committee: Huan Sun (Advisor); Wei-Lun Chao (Committee Member); Yu Su (Committee Member); Srinivasan Parthasarathy (Committee Member) Subjects: Computer Science; Language; Linguistics
  • 20. Bhowmik, Kowshik Leveraging Degree of Isomorphism to Improve Cross-Lingual Embedding Space for Low-Resource Languages

    PhD, University of Cincinnati, 2022, Engineering and Applied Science: Computer Science and Engineering

    Distributed representation of words, or word embeddings, have been successfully utilized in many Natural Language Processing (NLP) tasks. However, not all monolingual embedding spaces are trained with the same amount of data. Interest in transferring knowledge across languages, especially from languages rich with resources to low-resource ones, has given rise to cross-lingual word embeddings(CLWE). CLWEs represent words belonging to different languages in a shared semantic space. In this joint embedding space, vector representations of semantically equivalent words share a low distance, irrespective of which language they belong to. CLWEs form the basis of Bilingual Lexicon Induction(BLI) and Machine Translation(MT) as they make comparing word meanings across languages possible. The similar geometric arrangement of similar concepts in monolingual word embeddings of different languages has led to the learning of linear, and more specifically, orthogonal transformation from one embedding space to another. Mapping-based methods of learning CLWEs hinged on the premise that there exists invariance among languages resulting in their embedding spaces being isomorphic. This assumption significantly weakens for etymologically distant language pairs and/or those disparate in terms of their available resources. This weak assumption has been utilized to measure the degree of isomorphism between monolingual embedding space pairs and has also been used to measure their typological distance. In this dissertation, we propose to first cluster a set of monolingual embedding spaces based on their pairwise degrees of isomorphism. We present a qualitative analysis of the comparative impact of typological relations among the languages and the size of the embedding spaces. The goal is to determine the combination of clustering algorithm and measure of isomorphism that is able to cluster related languages together. Low-resource languages in the cluster are then enabled to leverage related (open full item for complete abstract)

    Committee: Anca Ralescu Ph.D. (Committee Member); Kenneth Berman Ph.D. (Committee Member); Dan Ralescu Ph.D. (Committee Member); James Lee (Committee Member); David Musgrave Ph.D. (Committee Member); Chia Han Ph.D. (Committee Member) Subjects: Artificial Intelligence