Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 32)

Mini-Tools

 
 

Search Report

  • 1. Wang, Fan SEEDEEP: A System for Exploring and Querying Deep Web Data Sources

    Doctor of Philosophy, The Ohio State University, 2010, Computer Science and Engineering

    A popular trend in data dissemination involves online data sources that are hidden behind query forms, thus forming what is referred to as the deep web. Deep web data is stored in hidden databases. Hidden data can only be acessed after a user submits a query by filling an online form. Currently, hundreds of large, complex and in many cases, related and/or overlapping, deep web data sources have become available. The number of such data sources is still increasing rapidly every year. The emergence of the deep web is posing many new challenges in data integration and query answering. First, the metadata of the deep web and the data records stored in deep web databases are hidden from the data integration system. Second, Multiple deep web data sources may have data redundancy. Furthermore, similar data sources may provide data with different data quality and even conflicting data. Therefore, data source selection is of great importance for a data integration system. Third, deep web data sources in a domain often have inter-dependencies, i.e., the output from one data source may be the input of another data source. Thus, answering a query over a set of deep web data sources often involving accessing a sequence of inter-dependent data sources in an intelligent order. Fourth, the common way of accessing data in deep web data sources is through standardized input interfaces. These interfaces, on one hand, provide a very simple query mechanism. On the other hand, these interfaces significantly constrain the types of queries that could be automatically executed. Finally, all deep web data sources are network based. Both the data source servers and network links are vulnerable to congestion and failures. Therefore, handling with fault tolerance issue is also necessary for a data integration system. In our work, we propose SEEDEEP, an automatic system for exploring and querying deep web data sources. The SEEDEEP system is able to integrate deep web data sources in a particular (open full item for complete abstract)

    Committee: Gagan Agrawal PhD (Advisor); Feng Qin PhD (Committee Member); P Sadayappan PhD (Committee Member) Subjects: Computer Science
  • 2. Janga, Prudhvi Integration of Heterogeneous Web-based Information into a Uniform Web-based Presentation

    PhD, University of Cincinnati, 2014, Engineering and Applied Science: Computer Science and Engineering

    With the continuing explosive growth of the world wide web, a wealth of information has become available online. The web has become one of the major sources of information for both individual users and large organizations. To find the information, individual users can either use search engines or navigate to a particular website following links. The former method returns links to vast amounts of data in seconds while the latter one could be tedious and time consuming. The presentation of results using the former method is usually a web page with links to actual web data sources (or websites). The latter method takes the user to the actual web data source itself. Using the two most popular forms of web data presentation/retrieval, web data can hardly be queried, manipulated and analyzed easily even though it is publicly and readily available. Many companies also use web for information whose challenge is to build web-based analytical and decision support systems, often referred to as web data warehouses. However, the information present on the web is extremely complex and heterogeneous which brings along with it a challenge in integrating and presenting retrieved web data in a uniform format. Hence, there is a need for different web data integration frameworks that can integrate and present web data in a uniform format. To achieve a homogeneous representation of web data we need a framework that extracts relevant structured and semi-structured web data from different web data sources, generates schemas from structured as well as semi-structured web data, and integrates schemas generated from different structured and semi-structured web data sources into a merged schema, populates it with data and presents it to the end user in a uniform format. We propose a modular framework for homogeneous presentation of web data. This framework consists of different standalone modules that can also be used to create independent systems that solve other schema unification problem (open full item for complete abstract)

    Committee: Karen Davis Ph.D. (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Hsiang-Li Chiang Ph.D. (Committee Member); Ali Minai Ph.D. (Committee Member); Carla Purdy Ph.D. (Committee Member) Subjects: Computer Science
  • 3. Jain, Prateek Linked Open Data Alignment & Querying

    Doctor of Philosophy (PhD), Wright State University, 2012, Computer Science and Engineering PhD

    The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 295 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud, as we will illustrate,are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short. This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.

    Committee: Amit Sheth PhD (Advisor); Pascal Hitzler PhD (Committee Member); Krishnaprasad Thirunarayan PhD (Committee Member); Kunal Verma PhD (Committee Member); Peter Yeh PhD (Committee Member) Subjects: Computer Science
  • 4. Pschorr, Joshua SemSOS : an Architecture for Query, Insertion, and Discovery for Semantic Sensor Networks

    Master of Science (MS), Wright State University, 2013, Computer Science

    With sensors, storage, and bandwidth becoming ever cheaper, there has been a drive recently to make sensor data accessible on the Web. However, because of the vast number of sensors collecting data about our environment, finding relevant sensors on the Web and then interpreting their observations is a non-trivial challenge. The Open Geospatial Consortium (OGC) defines a web service specification known as the Sensor Observation Service (SOS) that is designed to standardize the way sensors and sensor data are discovered and accessed on the Web. Though this standard goes a long way in providing interoperability between sensor data producers and consumers, it is predicated on the idea that the consuming application is equipped to handle raw sensor data. Sensor data consuming end-points are generally interested in not just the raw data itself, but rather actionable information regarding their environment. The approaches for dealing with this are either to make each individual consuming application smarter or to make the data served to them smarter. This thesis presents an application of the latter approach, which is accomplished by providing a more meaningful representation of sensor data by leveraging semantic web technologies. Specifically, this thesis describes an approach to sensor data modeling, reasoning, discovery, and query over richer semantic data derived from raw sensor descriptions and observations. The artifacts resulting from this research include: - an implementation of an SOS service which hews to both Sensor Web and Semantic Web standards in order to bridge the gap between syntactic and semantic sensor data consumers and that has been proven by use in a number of research applications storing large amounts of data, which serves as - an example of an approach for designing applications which integrate syntactic services over semantic models and allow for interactions with external reasoning systems. As more sensors and observations move o (open full item for complete abstract)

    Committee: Krishnaprasad Thirunarayan Ph.D. (Advisor); Amit Sheth Ph.D. (Committee Member); Bin Wang Ph.D. (Committee Member) Subjects: Computer Science; Geographic Information Science; Information Systems; Remote Sensing; Systems Design; Web Studies
  • 5. Krishnan, Niranjan Rao A Web-Based Software Platform for Data Processing Workflows and its Applications in Aerial Data Analysis

    MS, University of Cincinnati, 2019, Engineering and Applied Science: Computer Science

    Given the rapid advances in development of unmanned aerial vehicles (UAV), employment of drones in various business functions becomes reasonable and affordable. Usage of drones as data collection tools will give us access to a new set of geo-referenced images and videos that were not easily accessible in the past. The ultimate objective of this web based platform for data processing workflows, from here on referred to as the common operating platform, is to enable users to archive, process and visualize aerial data without a need for advanced hardware and software locally. This work details the development of the the common operating platform which consist of a web-based frontend and a backend. The frontend is a web app developed based on Django, Twitter Bootstrap and Javascript where the user authenticates, uploads data, submit processing tasks and visualizes the results. The back end, developed using Python 3, is where data is being stored and various processing tasks are being done based on commercial(Pix4D), open source(OpenDroneMap) and custom(Traffic Monitoring) processing engines. First, the intricacies of the data processing workflow is discussed and this includes diving into detail the various steps related to processing workflows using proprietary, open software and custom software. Second, procedures for integrating commercial processing engines as well as development of the in house traffic parameter extraction system will be shown and the result of running various case studies and their processing performance will be discussed. And finally, the system architecture design and implementation will be detailed given the scalability, modularity,extensibility and reliablity requirements will be discussed. The idea is to have a secure system, which is accessible to a broad audience, that can receive and service all of their processing requirements. In doing so, it dismisses the need for an uninitiated audience to install highly specialized software on their pe (open full item for complete abstract)

    Committee: Arthur Helmicki Ph.D. (Committee Chair); Victor Hunt Ph.D. (Committee Member); Nan Niu Ph.D. (Committee Member) Subjects: Computer Science
  • 6. Krisnadhi, Adila Ontology Pattern-Based Data Integration

    Doctor of Philosophy (PhD), Wright State University, 2015, Computer Science and Engineering PhD

    Data integration is concerned with providing a unified access to data residing at multiple sources. Such a unified access is realized by having a global schema and a set of mappings between the global schema and the local schemas of each data source, which specify how user queries at the global schema can be translated into queries at the local schemas. Data sources are typically developed and maintained independently, and thus, highly heterogeneous. This causes difficulties in integration because of the lack of interoperability in the aspect of architecture, data format, as well as syntax and semantics of the data. This dissertation represents a study on how small, self-contained ontologies, called ontology design patterns, can be employed to provide semantic interoperability in a cross-repository data integration system. The idea of this so-called ontology pattern- based data integration is that a collection of ontology design patterns can act as the global schema that still contains sufficient semantics, but is also flexible and simple enough to be used by linked data providers. On the one side, this differs from existing ontology-based solutions, which are based on large, monolithic ontologies that provide very rich semantics, but enforce too restrictive ontological choices, hence are shunned by many data providers. On the other side, this also differs from the purely linked data based solutions, which do offer simplicity and flexibility in data publishing, but too little in terms of semantic interoperability. We demonstrate the feasibility of this idea through the actual development of a large scale data integration project involving seven ocean science data repositories from five institutions in the U.S. In addition, we make two contributions as part of this dissertation work, which also play crucial roles in the aforementioned data integration project. First, we develop a collection of more than a dozen ontology design patterns that capture the key noti (open full item for complete abstract)

    Committee: Pascal Hitzler Ph.D. (Advisor); Krzysztof Janowicz Ph.D. (Committee Member); Khrisnaprasad Thirunarayan Ph.D. (Committee Member); Michelle Cheatham Ph.D. (Committee Member) Subjects: Computer Science; Information Systems; Information Technology; Logic
  • 7. Jayapandian, Catherine Cloudwave: A Cloud Computing Framework for Multimodal Electrophysiological Big Data

    Doctor of Philosophy, Case Western Reserve University, 2014, EECS - Computer and Information Sciences

    Multimodal electrophysiological data, such as electroencephalography (EEG) and electrocardiography (ECG), are central to effective patient care and clinical research in many disease domains (e.g., epilepsy, sleep medicine, and cardiovascular medicine). Electrophysiological data is an example of clinical 'big data' characterized by volume (in the order of terabytes (TB) of data generated every year), velocity (gigabytes (GB) of data per month per facility) and variety (about 20-200 multimodal parameters per study), referred to as '3Vs of Big Data.' Current approaches for storing and analyzing signal data using desktop machines and conventional file formats are inadequate to meet the challenges in the growing volume of data and the need for supporting multi-center collaborative studies with real-time and interactive access. This dissertation introduces a web-based electrophysiological data management framework called Cloudwave using a highly scalable open-source cloud computing approach and hierarchical data format. Cloudwave has been developed as a part of the National Institute of Neurological Disorders and Strokes (NINDS) funded multi-center project called Prevention and Risk Identification of SUDEP Mortality (PRISM). The key contributions of this dissertation are: 1. An expressive data representation format called Cloudwave Signal Format (CSF) suitable for data-interchange in cloud-based web applications; 2. Cloud based storage of CSF files processed from EDF using Hadoop MapReduce and HDFS; 3. Web interface for visualization of multimodal electrophysiological data in CSF; and 4. Computational processing of ECG signals using Hadoop MapReduce for measuring cardiac functions. Comparative evaluations of Cloudwave with traditional desktop approaches demonstrate one order of magnitude improvement in performance over 77GB of patient data for storage, one order of magnitude improvement to compute cardiac measures for signal-channel ECG data, and 20 times improv (open full item for complete abstract)

    Committee: Guo-Qiang Zhang PhD (Committee Chair); Satya Sahoo PhD (Committee Member); Xiang Zhang PhD (Committee Member); Samden Lhatoo MD, FRCP (Committee Member) Subjects: Bioinformatics; Biomedical Research; Computer Science; Neurosciences
  • 8. Ho, Chia-Ling The impact of the presence of voting capability and weblong on a website on the public's perceived interactivity and relationship with a website /

    Master of Arts, The Ohio State University, 2007, Graduate School

    Committee: Not Provided (Other) Subjects:
  • 9. Yilmaz, Serhan Robust, Fair and Accessible: Algorithms for Enhancing Proteomics and Under-Studied Proteins in Network Biology

    Doctor of Philosophy, Case Western Reserve University, 2023, EECS - Computer and Information Sciences

    This dissertation presents a comprehensive approach to advancing proteomics and under-studied proteins in network biology, emphasizing the development of reliable algorithms, fair evaluation practices, and accessible computational tools. A key contribution of this work is the introduction of RoKAI, a novel algorithm that integrates multiple sources of functional information to infer kinase activity. By capturing coordinated changes in signaling pathways, RoKAI significantly improves kinase activity inference, facilitating the identification of dysregulated kinases in diseases. This enables deeper insights into cellular signaling networks, supporting targeted therapy development and expanding our understanding of disease mechanisms. To ensure fairness in algorithm evaluation, this research carefully examines potential biases arising from the under-representation of under-studied proteins and proposes strategies to mitigate these biases, promoting a more comprehensive evaluation and encouraging the discovery of novel findings. Additionally, this dissertation focuses on enhancing accessibility by developing user-friendly computational tools. The RoKAI web application provides a convenient and intuitive interface to perform RoKAI analysis. Moreover, RokaiXplorer web tool simplifies proteomic and phospho-proteomic data analysis for researchers without specialized expertise. It enables tasks such as normalization, statistical testing, pathway enrichment, provides interactive visualizations, while also offering researchers the ability to deploy their own data browsers, promoting the sharing of findings and fostering collaborations. Overall, this interdisciplinary research contributes to proteomics and network biology by providing robust algorithms, fair evaluation practices, and accessible tools. It lays the foundation for further advancements in the field, bringing us closer to uncovering new biomarkers and potential therapeutic targets in diseases like cancer, Alzheimer' (open full item for complete abstract)

    Committee: Mehmet Koyutürk (Committee Chair); Mark Chance (Committee Member); Vincenzo Liberatore (Committee Member); Kevin Xu (Committee Member); Michael Lewicki (Committee Member) Subjects: Bioinformatics; Biomedical Research; Computer Science
  • 10. Markle, Scott INVESTIGATORY ANALYSIS OF BIG DATA'S ROLE AND IMPACT ON LOCAL ORGANIZATIONS, INSTITUTIONS, AND BUSINESSES' DECISION-MAKING AND DAY-TO-DAY OPERATIONS

    MS, Kent State University, 2023, College of Arts and Sciences / Department of Computer Science

    With the employment of Big Data techniques and technologies in a variety of industries and sectors burgeoning with each passing day, it is critical for institutions of higher education to maintain a comprehensive understanding of Big Data's current usage in said fields. To aid in this ongoing need, this thesis project contacted, via electronic survey, a broad range of institutions and businesses in the greater Northeast Ohio area, spanning numerous industries to identify if, first, Big Data techniques and technologies were practiced by said organizations, and second, how they are used (if at all). Built upon research into said organizations before survey distribution, the institutions were divided into two frames: groups and companies that were most likely utilizing Big Data in some capacity, and groups and companies that possibly utilize Big Data in some capacity, though clear indication of usage could not be discerned from preliminary research. Businesses and institutions in the former frame received a survey greater curtailed to the likelihood of Big Data usage than in the latter frame, though response weight to the research findings was equal regardless of frame membership. From the information provided by respondents, web scraping was performed on each of their web pages to determine if similar or identical answers to the aforementioned information could be identified. The primary conclusion drawn from this research was that Big Data, its techniques, and technologies, are indeed in active usage across multiple industry sectors, though its usage compared to traditional data analysis is much more limited. While some survey respondents disclosed their usage of Big Data, as well as its active impact upon day-to-day operations and related decision-making endeavors performed by their organizations or institutions, other informants stated that Big Data has no active role in their organization to the best of their knowledge. Regardless of response, web scraping was per (open full item for complete abstract)

    Committee: Ruoming Jin PhD (Committee Member); Jong-Hoon Kim PhD (Committee Member); Omar De La Cruz Cabrera PhD (Advisor) Subjects: Computer Science
  • 11. Clunis, Julaine Semantic Analysis Mapping Framework for Clinical Coding Schemes: A Design Science Research Approach

    PHD, Kent State University, 2021, College of Communication and Information

    The coronavirus disease 2019 (COVID-19) pandemic has revealed challenges and opportunities for data analytics, semantic interoperability, and decision making. The sharing of COVID-19 data has become crucial for leveraging research, testing drug effectiveness and therapeutic strategies, and developing policies for control, intervention, and potential eradication of this disease. Translating healthcare data between various clinical coding schemes is critical to their functioning, and semantic mappings must be established to ensure interoperability. Using design science research methodology as a guide, this work explains 1) how an ETL (Extract Transform Load) workflow tool could support the task of clinical coding scheme mapping, 2) how the mapping output from such a tool could support or affect annotation of clinical trials, particularly those used in COVID-19 research and 3) whether aspects of the socio-technical model could be leveraged to explain and assess mapping to achieve semantic interoperability in clinical coding schemes. Research outcomes include a reproducible and shareable artifact, that can be utilized beyond the domain of biomedicine in addition to observations and recommendations from the knowledge gained during the design and evaluation process of the artifact development.

    Committee: Marcia Zeng (Advisor); Athena Salaba (Committee Member); Mary Anthony (Committee Member); Yi Hong (Committee Member); Rebecca Meehan (Committee Member) Subjects: Bioinformatics; Information Science
  • 12. Saraf, Nikita Sandip Leveraging Commercial and Open Source Software to Process and Visualize Advanced 3D Models on a Web-Based Software Platform

    MS, University of Cincinnati, 2020, Engineering and Applied Science: Computer Science

    Today, most successful business models widely use software programs to bridge the gap between data and business requirements. Changes in business strategies also require software programs to adapt with it. As a result, the available software products are continuously evolving, and are rapidly changing with new technologies and user requirements. Earlier in 2017, the Ohio Department of Transportation (ODOT) and University of Cincinnati started developing a web application, called the Common Operating Platform (COP), to remotely process the drone-captured images into 3D models using commercial (Pix4D) and open-source (OpenDroneMap). The idea is to engage shared hardware and software resources to perform such complex tasks. The platform immediately gained popularity and actively used by the personnel at ODOT. Preliminary study shows that the Common Operating Platform has a lot of room to incorporate more features. Hence, this thesis introduces the Common Operating Platform v11.0 that comes more complex 3D modeling and visualization workflows. The purpose of this work is to enhance functionality, reliability, efficiency, and usability of the Common Operating Platform. Initially, this document enlists shortcomings of the existing system and proposes new solutions to eliminate these shortcomings. Secondly, the proposed system architecture is compared against the existing architecture. In the final stage, the proposed enhancements are implemented by leveraging commercial (Pix4D) and open-source (MeshLabJS) software tools. Other miscellaneous features to improve system performance, efficiency and reliability are also discussed.

    Committee: Arthur Helmicki Ph.D. (Committee Chair); Victor Hunt Ph.D. (Committee Member); Nan Niu Ph.D. (Committee Member) Subjects: Computer Science
  • 13. Partin, Michael Scalable, Pluggable, and Fault Tolerant Multi-Modal Situational Awareness Data Stream Management Systems

    Master of Science in Computer Engineering (MSCE), Wright State University, 2020, Computer Engineering

    Features and attributes that describe an event (disasters, social movements, etc.) are heterogeneous in nature. For virtually all events that impact humans, technology enables us to capture a large amount and variety of data from many sources, including humans (i.e., social media) and sensors/internet of things (IoTs). The corresponding modalities of data include text, imagery, voice and video, along with structured data such as gazetteers (i.e., location-based data) and government and statistical data. However, even though there is often an abundance of information produced, this information is fragmented across the various modalities and sources. The DisasterRecord system aims to provide a way to combine (interlink and integrate) data streams in different modalities in a meaningful way, with the in-depth use case of flood events. The DisasterRecord project was originally developed as a demo to showcase the efforts of the team at Kno.e.sis in the area of combining and analyzing multimodal data for the IBM CallForCode challenge in 2018. This thesis represents extensive follow-on work in the areas of deployability, flexibility, and reliability. Specific topics addressed are: a method that utilizes current technologies to easily deploy into cloud infrastructure; the modifications made to add flexibility to add and modify the multimodal analysis pipeline; and reliability improvements to make it a stable and reliable system.

    Committee: Amit Sheth Ph.D. (Advisor); Krishnaprasad Thirunarayan Ph.D. (Committee Member); Valerie Shalin Ph.D. (Committee Member) Subjects: Computer Engineering; Computer Science; Web Studies
  • 14. Chittella, Rama Someswar Leveraging Schema Information For Improved Knowledge Graph Navigation

    Master of Science (MS), Wright State University, 2019, Computer Science

    Over the years, the semantic web has emerged as a new generation of the world wide web featuring advanced technologies and research contributions. It has revolutionized the usage of information by allowing users to capture and publish machine-understandable data and expedite methods such as ontologies to perform the same. These ontologies help in the formal representation of a specified domain and foster comprehensive machine understanding. Although, the engineering of ontologies and usage of logic have been an integral part of the web semantics, new areas of research such as the semantic web search, linking and usage of open data on the web, and the subsequent use of these technologies in building semantic web applications have also become significant in recent times. One such research contribution that we are going to focus on is the browsing of linked RDF data. Semantic web advocates the methodology of linked data to publish structured data on the web. Most of the linked data is available as browsable RDF data which is built using triples that define statements in the form of subject-predicate-object. These triples can be tabulated by sorting the three parts into separate columns. To browse the linked data of semantic web, several web browsers such as CubicWeb, VisiNav and Pubby were designed. These browsers provide the users with a tabular browsing experience displaying the data in nested tables. Also, they help users navigate through various subjects and their respective objects with the help of links associated with them. Several other browsers such as Tabulator were developed which enable real-time editing of semantic web resources\cite{berners2008tabulator} However, with the tabulated interface, users may sometimes find it difficult to realize the relationships between the various documents. Also navigating using the links between subjects and its predicates inside the documents is more time consuming which makes the overall user experience tedious. To i (open full item for complete abstract)

    Committee: Pascal Hitzler Ph.D. (Advisor); Mateen M. Rizki Ph.D. (Committee Member); Yong Pei Ph.D. (Committee Member) Subjects: Computer Science
  • 15. Emeka-Nweze, Chika ICU_POC: AN EMR-BASED POINT OF CARE SYSTEM DESIGN FOR THE INTENSIVE CARE UNIT

    Doctor of Philosophy, Case Western Reserve University, 2017, EECS - Computer Engineering

    In this era of technological transformation in medicine, there is need to revolutionize the approach and procedures involved in the treatment of diseases to have a restructured understanding of the role of data and technology in the medical industry. Data is a key factor in diagnosis, management, and treatment of patients in any medical institution. Proper management and usage of patient's data will go a long way in helping the society save money, time and life of the patient. Having data is one thing and providing a system or means of translating the data is another issue. This dissertation is proposing a design of a Point of Care system for the Intensive Care Unit (a.k.a ICU_POC), which is a system that integrates the capabilities of the bedside monitors, bedside eFlowsheet and the Electronic Medical Records in such a manner that the clinicians interact with one another in real time from different locations, to view, analyze, and even make necessary diagnoses on patients' ailment based on their medical records. It demonstrates how patient data from the monitors can be imported, processed, and transformed into meaningful and useful information, stored, reproduced and transferred automatically to all necessary locations securely and efficiently without any human manipulation. ICU_POC will grant physicians the remote capability in managing patients properly by providing accurate patient data, easy analysis and fast diagnosis of patient conditions. It creates an interface for physicians to query historical data and make proper assumptions based on previous medical conditions. The problem lies in managing data transfer securely between one hospital EMR database and the other for easy accessibility of data by the physicians. This work is challenged by designing a system that could provide a fast, accurate, secure and effective (FASE) diagnosis of medical conditions of the patients in the ICU. The proposed system has the potential of reducing patients' length of stay i (open full item for complete abstract)

    Committee: Kenneth Loparo (Advisor); Farhad Kaffashi (Committee Member); Vira Chankong (Committee Member); Michael Degeorgia (Committee Member) Subjects: Computer Engineering; Computer Science; Engineering
  • 16. Gunaratna, Kalpa Semantics-based Summarization of Entities in Knowledge Graphs

    Doctor of Philosophy (PhD), Wright State University, 2017, Computer Science and Engineering PhD

    The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored. In this dissertation, we focus on creating both \textit{comprehensive} and \textit{concise} entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important (open full item for complete abstract)

    Committee: Amit Sheth Ph.D. (Committee Co-Chair); Krishnaprasad Thirunarayan Ph.D. (Committee Co-Chair); Keke Chen Ph.D. (Committee Member); Gong Cheng Ph.D. (Committee Member); Edward Curry Ph.D. (Committee Member); Hamid Motahari Nezhad Ph.D. (Committee Member) Subjects: Computer Science
  • 17. Yu, Andrew NBA ON-BALL SCREENS: AUTOMATIC IDENTIFICATION AND ANALYSIS OF BASKETBALL PLAYS

    Master of Computer and Information Science, Cleveland State University, 2017, Washkewicz College of Engineering

    The on-ball screen is a fundamental offensive play in basketball; it is often used to trigger a chain reaction of player and ball movement to obtain an effective shot. All teams in the National Basketball Association (NBA) employ the on-ball screen on offense. On the other hand, a defense can mitigate its effectiveness by anticipating the on-ball screen and its goals. In the past, it was difficult to measure a defender's ability to disrupt the on-ball screen, and it was often described using abstract words like instincts, experience, and communication. In recent years, player motion-tracking data in NBA games has become available through the development of sophisticated data collection tools. This thesis presents methods to construct a framework which can extract, transform, and analyze the motion-tracking data to automatically identify the presence of on-ball screens. The framework also provides assistance for NBA players and coaches to adjust their game plans regarding the on-ball screen using trends from past games. With the help of support vector machines, the framework identifies on-ball screens with an accuracy of 85%, which shows considerable improvement from the current published results in existing literature.

    Committee: Sunnie Chung Ph.D. (Committee Chair); Yongjian Fu Ph.D. (Committee Member); Nigamanth Sridhar Ph.D. (Committee Member) Subjects: Artificial Intelligence; Computer Science
  • 18. Miracle, Jacob De-Anonymization Attack Anatomy and Analysis of Ohio Nursing Workforce Data Anonymization

    Master of Science in Cyber Security (M.S.C.S.), Wright State University, 2016, Computer Engineering

    Data generalization (anonymization) is a widely misunderstood technique for preserving individual privacy in non-interactive data publishing. Easily avoidable anonymization failures are still occurring 14 years after the discovery of basic techniques to protect against them. Identities of individuals in anonymized datasets are at risk of being disclosed by cyber attackers who exploit these failures. To demonstrate the importance of proper data anonymization we present three perspectives on data anonymization. First, we examine several de-anonymization attacks to formalize the anatomy used to conduct attacks on anonymous data. Second, we examine the vulnerabilities of an anonymous nursing workforce survey to convey how this attack anatomy can still be applied to recently published anonymous datasets. We then analyze the impact proper generalization techniques have on the nursing workforce data utility. Finally, we propose the impact emerging technologies will have on de-anonymization attack sophistication and feasibility in the future.

    Committee: Michelle Cheatham Ph.D. (Committee Chair); John Gallagher Ph.D. (Committee Member); Thomas Wischgoll Ph.D. (Committee Member); Robert Fyffe Ph.D. (Other); Mateen Rizki Ph.D. (Other) Subjects: Computer Engineering; Computer Science; Information Science; Information Technology
  • 19. Marupudi, Surendra Brahma Framework for Semantic Integration and Scalable Processing of City Traffic Events

    Master of Science (MS), Wright State University, 2016, Computer Science

    Intelligent traffic management requires analysis of a large volume of multimodal data from diverse domains. For the development of intelligent traffic applications, we need to address diversity in observations from physical sensors which give weather, traffic flow, parking information; we also need to do the same with social media, which provides live commentary of various events in a city. The extraction of relevant events and the semantic integration of numeric values from sensors, unstructured text from Twitter, and semi- structured data from city authorities is a challenging physical-cyber-social data integration problem. In order to address the challenge of both scalability and semantic integration, we developed a semantics-enabled distributed framework to support processing of multimodal data gushing in at a high volume. To semantically integrate traffic events related complementary data from multimodal data streams, we developed a Traffic Event Ontology consistent with a Semantic Web approach. We utilized Apache Spark and Parquet data store to address the volume issue and to build the scalable infrastructure that can process and extract traffic events from historical as well as streaming data from 511.org (sensor data) and Twitter (textual data). We present the large-scale evaluation of our system on real-world traffic-related data from the San Francisco Bay Area over one year with promising results. Our scalable approach was able to decrease the processing time of the test case we present in this work from two months to less than 24 hours. We evaluated our scalability method by varying input data loads and the system showed stability in the performance. Additionally, we evaluated the performance of our semantic integration method by answering questions related to traffic anomalies using multimodal data.

    Committee: Amit P. Sheth Ph.D. (Advisor); Krishnaprasad Thirunarayan Ph.D. (Committee Member); Tanvi Banerjee Ph.D. (Committee Member) Subjects: Computer Science
  • 20. Li, Yuanxu HealthyLifeData Analytics: A DATA ANALYTICS TOOL FOR THE HealthyLifeHRA HEALTH RISK ASSESSMENT SYSTEM

    Master of Sciences, Case Western Reserve University, 2016, EECS - Computer and Information Sciences

    Traditional HRA (Health Risk Assessment) tools mostly focus on providing questionnaires and generating reports to the users. However, due to the need for more detailed information on the relationships between people's lifestyles and health risks, a new data analytics tool for HRA is necessary. This thesis proposes and implements a family of data analytics tool, as part of HealthyLife HRA application. It consists of Data Analytics A – Population and Range Based Aggregation and Visualization Queries, Data Analytics B – Time Series Queries, and Data Analytics C – Single User Targeted Time Series Queries. The tool has three front-end graphical interfaces for each family member and a back-end execution engine. It enables the users to specify general and time-series queries in a simple yet expressive way, without any previous knowledge of the database structure and SQL query. Visualization functionality is also provided as part of the tool.

    Committee: Gultekin Ozsoyoglu (Advisor) Subjects: Computer Science; Health Care