Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 42)

Mini-Tools

 
 

Search Report

  • 1. Umapathy, Prashanth An Analysis of GPT API for Wrangling Web Scraping Data

    Master of Science, The Ohio State University, 2024, Computer Science and Engineering

    In my thesis, I investigate three methods to extract product data such as Brand, flavor, strain, units, thc and cbd levels from online cannabis product stores, aiming to find the most effective approach. The process starts with using Python's regex capabilities, a method that's quite precise but needs a lot of specific rules to be set up. This technique involves pulling out product details from websites using patterns, but it can get complicated as each unique piece of data format requires a unique rule. After discussing regex, I introduce the use of the GPT LLM API, an artificial intelligence natural language processing tool that reads and understands product descriptions from raw product website data to extract information automatically. The goal here is to see if this AI can do the job as well or better than the manual methods or the rule-based regex approach. It's a way to potentially streamline the process, reducing the need for so many specific rules. Then, I describe how we also used a manual method, where people collect the data by hand. This serves as a standard to measure the other methods against, providing a benchmark for accuracy and completeness. A significant part of my thesis is dedicated to explaining how I clean and organize the data from these methods, which is crucial for making it usable and reliable. I detail the strengths and limitations of the GPT API in this context, clarifying what it can handle and where it might need help. Furthermore, I thoroughly document all the procedures and rules used in the study. This is important for transparency and allows others to replicate or build on this work. In the end, I present two datasets, one corrected and extracted by humans and the other through the GPT extraction method. As the results, I showcase the different levels of accuracy obtained through these comprehensive approaches. Through this thesis, I shed light on the future of data extraction in specialized fields, for a shift towards more (open full item for complete abstract)

    Committee: Jian Chen (Advisor); Ce Shang (Committee Member) Subjects: Computer Science
  • 2. Li, Haoyu Efficient Visualization for Machine-Learning-Represented Scientific Data

    Doctor of Philosophy, The Ohio State University, 2024, Computer Science and Engineering

    Recent progress in high-performance computing now allows researchers to run extremely high-resolution computational models, simulating detailed physical phenomena. Yet, efficiently analyzing and visualizing the extensive data from these simulations is challenging. Adopting machine learning models to reduce the storage cost of or extract salient features from large scientific data has proven to be a successful approach to analyzing and visualizing these datasets effectively. Machine learning (ML) models like neural networks and Gaussian process models are powerful tools in data representation. They can capture the internal structures or ``features'' from the dataset, which is useful in compressing the data or exploring the subset of data that is of interest. However, applying machine learning models to scientific data brings new challenges to visualization. Machine learning models are usually computationally expensive. Neural networks are expensive to reconstruct on a dense grid representing a high-resolution scalar field and Gaussian processes are notorious for their cubic time complexity to the number of data points. If we consider other variables in the data modeling, for example, the time dimension and the simulation parameters in the ensemble data, the curse of dimensionality will make the computation cost even higher. The long inference time for the machine learning models puts us in a dilemma between the high data storage cost of the original data representation and the high computation cost of the machine learning representation. The above challenges demonstrate a great need for techniques and algorithms that increase the speed of ML model inference. Despite many generic efforts to increase ML efficiency, for example, using better hardware acceleration or designing more efficient architecture, we tackle a more specific problem of how to query the ML model more efficiently with a specific scientific visualization task. In this dissertation, we c (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Hanqi Guo (Committee Member); Raphael Wenger (Committee Member) Subjects: Computer Engineering; Computer Science
  • 3. Hassan, Wael Comparing Geomorphometric Pattern Recognition Methods for Semi-Automated Landform Mapping

    Master of Science (MS), Ohio University, 2020, Geography (Arts and Sciences)

    Landscape regions and hydrological features such as wetlands, rivers, and lakes are frequently mapped and stored digitally as features. Their boundary can be mapped and identified at the physically observable wetland-dryland interface. However, landforms such as mountains, hills, mesas, valleys, which are cognized as component features of or objects attached to the terrestrial surface are not easily delineated due to the lack of clear or unambiguous criteria for defining their boundaries. It is quite challenging to determine where the boundary of the mountain, hill, or valley starts and ends because terrain type, culture, language, and other subjective factors greatly affect how the same portion of the terrestrial surface maybe discretized, classified, labeled, and characterized by people. Cartographers have traditionally used point and line symbols as labels to describe landforms in a map, but this approach ignores the problem of representing the possible physical shape and extension of landforms. This thesis advanced prior work in the fields of geomorphometry and geographic information science to test the viability of existing semi-automated terrain analysis methods for mesoscale landforms that are easily recognized by people because of local topographic and cultural salience. The focus was on finding methods that can help automate the extraction of three broad categories of landforms: non-linear eminences (e.g., peak, mount, pillar, mountain, hill, mesa, butte), linear eminences (e.g., ridge and spur) and linear depressions (e.g., channel, valley, and hollow). Three methods proposed by Wood (1996), Jasiewicz and Stepinski (2013), and Weiss (2001) were selected because they are popular in terrain characterization, have shown promising results for mapping discrete terrain features that are intended to resemble landforms recognized intuitively by people, and because they are easily available for experimentation in freely available software. These methods require onl (open full item for complete abstract)

    Committee: Gaurav Sinha Associate Professor (Committee Chair); Dorothy Sack Professor (Committee Member); Timothy Anderson Associate Professor (Committee Member) Subjects: Geography
  • 4. Dai, Honghao Unsupervised Learning Using Change Point Features Of Time-Series Data For Improved PHM

    PhD, University of Cincinnati, 2023, Engineering and Applied Science: Mechanical Engineering

    Prognostics and health management (PHM), which aims to convert preventive maintenance (periodical maintenance) into predictive maintenance (condition-based maintenance), has gained increasing attention in the current era of the Internet of Things (IoT), Industry 4.0, and Industrial AI. A significant amount of research has been conducted using a variety of signal processing, statistical analysis, and machine learning algorithms to develop different PHM systems. Feature learning is a crucial task in bridging the gap between data and models. Time-series data in sensor environments exhibit continuous changes and drifts, which require PHM models to balance static and time-independent uncertainty for feature learning. In this dissertation, a novel deep autoencoder with time-lagged regularization is proposed. This method can learn features from the time-domain and frequency-domain and detect underlying weak-sense stationarity. A change point detection strategy is developed by combining the time-lagged autoencoder with a dissimilarity-based anomaly detector. The effectiveness of the proposed change point detection algorithm is validated using public benchmarking datasets, fault detection and prognostics of ion milling etching machine data, non-artificial segments recognition, and long-term assessment of intracranial pressure signals. The proposed methodology is compared with state-of-the-art benchmark approaches and found to establish an improved PHM model with sustainable performance in discovering change point features in time-series signals.

    Committee: Jay Lee Ph.D. (Committee Chair); Brandon Foreman M.D. (Committee Member); Jing Shi Ph.D. (Committee Member); Jay Kim Ph.D. (Committee Member); Xiaodong Jia Ph.D. (Committee Member) Subjects: Mechanical Engineering
  • 5. Joly, Genevieve Comparing Semi-Automated Feature Extraction Methods for Mapping Topographic Eminences

    Master of Science (MS), Ohio University, 2023, Geography (Arts and Sciences)

    In current maps and geospatial datasets, representations of landforms such as mountains, hills, and ridgelines are unable to be drawn to their full extent. Due to the lack of a clearly observable boundary, the visualizations of these features are often limited to a singular point or line feature. This representation does not allow for an understanding of the true extent of landforms or the potential hierarchies that exist within the landscape. While manual attempts to delineate the extents of such features is always possible, it cannot be scaled for large areas with tens of thousands of features. In any case, there is no prescriptive way to delimit landforms, so no single set of delineated features can be considered sufficient for all people and contexts. In addition, the delineation of landforms depends on what type of landform is being searched for and the scale of delineation. Thus, this is not a deterministic process and needs to be context dependent. There needs to be flexibility and the focus should be on customizable methods, rather than canonical representations of landforms. The author of this thesis builds upon previous work within the field of geomorphometry and semi-automated feature extraction approaches by exploring and testing the applicability of several methods for delineating landforms within a range of study areas. The goal is to assess which methods produce linear (ridges) and non-linear eminences (peaks, summits, mountains) that match common sense expectations for what these features should look like in the real world, and by extension on maps. The six methods explored within this research were proposed by Wood (1996), Jasiewicz and Stepinski (2013), Lundblad et al. (2006), Chaudry and Mackaness (2008), Sinha (2008), and Miliaresis and Argialas (1999). The methods were selected based on their popularity within the research community and/or the author's judgment of the potential of the method for providing accurate mappings of terrain features (open full item for complete abstract)

    Committee: Gaurav Sinha (Advisor); Timothy Anderson (Committee Member); Dorothy Sack (Committee Member) Subjects: Geographic Information Science; Geography; Geomorphology
  • 6. Si, Gaoshoutong Improving the Quality of LiDAR Point Cloud Data for Greenhouse Crop Monitoring

    Master of Science, The Ohio State University, 2022, Food, Agricultural and Biological Engineering

    Crop monitoring is of great interest in improving production efficiency, especially in a controlled environment where high-value crops are grown. The advent of small unmanned aerial systems (sUAS) provides an opportunity to acquire high-quality spatial and temporal information for crop monitoring using Light Detection and Ranging (LiDAR) collected point cloud data. However, the point clouds collected using LiDAR can have several limitations, such as occlusion, low point cloud density, outliers, and geometrical distortion, before they can be used effectively for further applications. It is necessary to preprocess the data, so the extracted information from the point cloud is accurate and reliable. It also becomes critical to collect multiple point clouds from different viewing perspectives. Hence, the point clouds need to be stitched to address concerns related to occlusion and low point cloud density. This study addressed the challenges of adapting the Iterative Closest Point (ICP) algorithm for a greenhouse environment application. A pipeline for point cloud registration was established and evaluated to process the LiDAR data collected in a greenhouse. An experiment was conducted in a commercial greenhouse in which point cloud data of crops were collected using a LiDAR mounted on an sUAS. The pipeline identifies the ground floor boundary as a key subset and uses it to improve the initial condition called coarse registration. Then the ICP algorithm is performed to achieve a fine registration. This pipeline was applied to a different combination of point cloud data collected from multiple viewing perspectives. The performance of point cloud registration was evaluated using metrics including visualization, Root of Mean Square Error (RMSE), estimation of the volume of reference objects, and the distribution of point cloud density. This study finds that point cloud registration is affected by several factors, including the overlapped ratio between point clouds, quality (open full item for complete abstract)

    Committee: Khanal Sami (Advisor); Peter Ling (Advisor) Subjects: Agricultural Engineering
  • 7. Xu, Jiayi Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism

    Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

    Extracting and visualizing features from scientific data can help scientists derive valuable insights. An extraction and visualization pipeline usually includes three steps: (1) scientific feature detection, (2) union-find for features' connected component labeling, and (3) visualization and analysis. As the scale of scientific data generated by experiments and simulations grows, it becomes a common practice to use distributed computing to handle large-scale data with data-parallelism, where data is partitioned and distributed over parallel processors. Three challenges arise for feature extraction and visualization on scientific applications. First, traditional feature detectors may not be effective and robust enough to capture features of interest across different scientific settings, because scientific features usually are highly nonlinear and recognized by domain scientists' soft knowledge. Second, existing union-find algorithms are either serial or not scalable enough to deal with extreme-scale datasets generated in the modern era. Third, existing parallel feature extraction and visualization algorithms fail to automatically reduce communication costs when optimizing the performance of processing units. This dissertation studies scalable scientific feature extraction and visualization to tackle the three challenges. First, we design human-centric interactive visual analytics based on scientists' requirements to address domain-specific feature detection and tracking. We focus on an essential problem in earth sciences: spatiotemporal analysis of viscous and gravitational fingers. Viscous and gravitational flow instabilities cause a displacement front to break up into finger-like fluids. Previously, scientists mainly detected the finger features using density thresholding, where scientists specify certain density thresholds and extract super-level sets from input density scalar fields. However, the results of density thresholding are sensitive to the select (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Rephael Wenge (Committee Member); Jian Chen (Committee Member) Subjects: Computer Engineering; Computer Science
  • 8. Nemati, Mohammadreza Machine Learning Approaches in Kidney Transplantation Survival Analysis using Multiple Feature Representations of Donor and Recipient

    Master of Science, University of Toledo, 2020, Engineering (Computer Science)

    Kidney transplantation is the therapy of choice for many people suffering from end-stage renal disease (ESRD). A successful kidney transplant can enhance your standards of living and diminish your risk of dying. Also, people who go through kidney transplantation do not require hours of dialysis treatment on a regular basis. Although this treatment is an optimal treatment of choice, the transplanted kidneys do not work perpetually, and a kidney re-transplantation is required. So, there is a high demand for kidneys, and an ever-increasing number of people have to wait to get kidneys. Consequently, fewer people will wait for a kidney if the average kidney survival times can be increased. One of the critical factors that can impact the survival times is Human Leukocyte Antigen (HLA) matching between donors and recipients. By using machine learning (ML) based predictive survival analysis algorithms, this research carries out an analysis for patients with ESRD by taking into account a novel representation of clinical features to measure the relation between the clinical covariates and graft survival time. The results of four survival algorithms on four feature representations suggest that the gradient boosting (GB) has the highest accuracy in predicting post-transplant kidney survival time. Moreover, comparison of the basic feature representation with other three representations including mismatches, HLA types, and HLA pairs, shows that by incorporating them into the proposed models, they can contribute to enhance the prediction power. Moreover, by preventing a drop in prediction accuracy, the pairs' obtained information will enable a novel HLA pair analysis method. Furthermore, the results of HLA pair analysis indicate that some HLA pairs can have an advantageous or disadvantageous impact on kidney graft survival time beyond the number of mismatches.

    Committee: Kevin Xu (Committee Chair); Stanislaw Stepkowski (Committee Member); Ahmad Javaid (Committee Member) Subjects: Computer Science
  • 9. Tian, Runfeng An Enhanced Approach using Time Series Segmentation for Fault Detection of Semiconductor Manufacturing Process

    MS, University of Cincinnati, 2019, Engineering and Applied Science: Mechanical Engineering

    The semiconductor etching process is an essential and complex manufacturing process, in which the degradation is unobservable. Due to issues related to data quality and limited data quantity, the fault detection of the semiconductor etching process remains difficult. Dozens of studies in the past focused on developing algorithms based on local models to adapt to process drift. However, the issues mentioned above have not been solved completely. In order to improve the data and feature quality, an enhanced feature extraction approach using time series segmentation could be implemented. This approach absorbs the advantages of both statistical features and structural features. Meanwhile, selecting a suitable time series segmentation algorithm for feature extraction during fault detection is also important. This thesis focuses on the fault detection of the semiconductor etching process and the implementation method of the enhanced feature extraction algorithm using time series segmentation, as well as comparison of three residual based algorithms and benchmark of three time series segmentation algorithms. Performances of different algorithms are evaluated, and the results are discussed. The enhanced feature extraction algorithm is based on time series segmentation instead of conventional feature extraction. The implementation of time series segmentation algorithm requires the utilization of dynamic time warping and other techniques. Performances of different algorithms are evaluated, and the results are discussed. By implementing time series segmentation for feature extraction, improvement of model performance is observed during this study, with a high fault detection rate and a low false alarm rate, in comparison to the results using conventional feature extraction methods.

    Committee: Jay Lee Ph.D. (Committee Chair); Janet Jiaxiang Dong Ph.D. (Committee Member); Jay Kim Ph.D. (Committee Member) Subjects: Mechanical Engineering
  • 10. Bharadwaj, Akshay A Perception Payload for Small-UAS Navigation in Structured Environments

    Master of Science (MS), Ohio University, 0, Electrical Engineering & Computer Science (Engineering and Technology)

    Unmanned Aircraft System (UAS) are proving to be increasingly favorable in military and commercial applications. The range of applications include surveillance, aerial photography, environmental observations, search and rescue, mapping, forestry, agricultural survey, law enforcement among many. The small size unmanned multi-copters are highly capable and cost effective for low altitude operations and have extended access to dangerous and hazardous environments which were previously unavailable. Irrespective of the applications, a position and navigation solution are necessary to fly the UAS completely autonomous or even to manually control it easily. The Global Navigation Satellite System (GNSS) has become one of the most dependable solution for position and navigation outdoors but does not perform well in the indoor environment as the signal is obstructed by the roof and the walls. Hence, there is a need for non-Global Positioning System (GPS) position and navigation solution methods for indoors. Simultaneous Localization and Mapping (SLAM) and feature-based integrated navigation are two methods that can be used for this purpose, using various types of sensors like ranging sensors, cameras, and Inertial Measurement Unit (IMU). This thesis will focus on integrating depth imagery, Short Wave Infrared (SWIR) imagery and Long Wave Infrared (LWIR) imagery with an IMU to obtain and estimate of both the position and the map of the environment. In this discussion, the region of operation is restricted to structured environments and would be extended to unstructured environments in the future. This work will include preliminary flight test results from a small-size Blackout quadcopter operated in a structured indoor environment for maintenance purposes. The quadcopter has been equipped with a 3DR Pixhawk flight controller and an Odroid XU4 onboard computer running Ubuntu. The Robotics Operating System (ROS) is used to interface with and integrate all the sensors and control (open full item for complete abstract)

    Committee: Maarten Uijt de Haag (Advisor); Frank Van Graas (Committee Member); Jim Zhu (Committee Member); Martin J Mohlenkamp (Committee Member) Subjects: Electrical Engineering
  • 11. Dhakal, Parashar Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms

    Master of Science, University of Toledo, 2018, Electrical Engineering

    Real-time voice recognition and environmental sound detection play an important role in the fields of security, home control systems, robotics, and speech forensics. The advantages and its potential need in these industries have been a great motivation behind this work. The task of voice recognition and environmental sound detection is challenging due to high variability in sound signals. Furthermore, the presence of environmental noise makes the task of recognition even more difficult. Various methods and architectures have been introduced for both voice and sound recognition till date. However, due to some limitations in these architectures, we came up with two di fferent architectures for both voice recognition and background sound detection. Through these architectures, we try to overcome the limitations seen in the previous architectures proposed by various researchers. In this work for environmental sound detection, we present a real-time method in which features are extracted using standard signal processing techniques and classification is done using the standard ML based classi fier. The extracted features are time domain features like ZCR and STE and frequency domain features like SC, SR, and SF. The Pitch was determined using Average Magnitude Di fference Function (AMDF). For the classifi cation, we used some robust and accurate ML techniques like SVM, RF, and DNN. Similarly, for voice recognition, we present a novel pipelined real-time end-to-end voice recognition architecture that enhances the performance of voice recognition by exploiting the advantages of GF and CNN. This architecture has been developed to provide a voice-user interface and aid in voice-based authentication and integration with an existing NLP system. Gaining secure access to existing NLP systems also served as one of the primary goals. Initially, in this work, we identify challenges related to real-time voice recognition and highlight the up-to-date research in the fiel (open full item for complete abstract)

    Committee: Vijay Devabhaktuni (Committee Chair); Ahmad Javaid (Committee Co-Chair); Richard Molyet (Committee Member) Subjects: Computer Engineering; Electrical Engineering
  • 12. Warrier, Gayathri Multi-Data Correlation in Papillary Thyroid Cancer

    Master of Science, The Ohio State University, 2017, Public Health

    Multi-model data integration has large scope in cancer research, in diagnosis, therapy and prognosis. The integration and analysis of different data types as one has wide impact in personalized medicine too. In papillary thyroid cancer, the most common but least aggressive type of thyroid cancer, the integration of histopathology features to gene expression profiles, survival data and tumor stage information can help improve prognosis of the condition. The study indicated that certain nuclear features extracted from slides are significantly associated with several genes. To be noted was the fact that many features related most significantly with genes involved in immune responses and general immunity.

    Committee: Kevin Coombes (Committee Member); Kun Huang (Advisor); Courtney Hebert (Committee Member); Susan Olivo-Marston (Committee Member) Subjects: Bioinformatics; Public Health
  • 13. Deshpande, Sagar Semi-automated Methods to Create a Hydro-flattened DEM using Single Photon and Linear Mode LiDAR Points

    Doctor of Philosophy, The Ohio State University, 2017, Geodetic Science

    LiDAR pulses are mostly absorbed by water bodies, thereby creating voids. The LiDAR points available over water surfaces are not reliable due to near water surface features such as: ripple, waves, or near surface ground objects. A bare ground DEM surface, created using such points, result in an uneven water surface which appears unnatural and cartographically unpleasing. Contours, created using such surface, are not consistent with the USGS contours, which are produced using traditional methods. Hence, the LiDAR point cloud needs to be hydro-flattened to produce a bare ground surface, consistent with the traditionally produced DEMs. Hydro-flattening is the process of creating a LiDAR-derived DEM where the water surfaces appear and behave as they would in a traditional topographic DEM generated from photogrammetric digital terrain models (DTMs). Hydro-flattened DEMs, created using LiDAR data, exclude LiDAR points over water bodies and include three-dimensional (3D) bank shorelines. In this dissertation, a methodology for creating hydro-flattened bare ground surfaces using linear mode (LM) or Single Photon (SP) LiDAR point clouds is presented. First the properties of both the sensors are compared and the need of hydro-flattening is discussed. Then, the method is described in detail for both the sensors. LiDAR point cloud and an approximate stream centerline are the primary data for this process. In the first step, a continuous bare ground surface (CBGS) is created by eliminating non-ground LiDAR points and adding artificial underwater points. In the second step, the lowest elevation from the LiDAR point cloud, within a radius distance from the river centerline is used to create a virtual water surface (VWS). This VWS is revised to consider water surface undulations such as ripples or waves, protruding underwater objects, etc. The revised VWS is then intersected with the CBGS to locate the two-dimensional (2D) bank shorelines. The 2D shorelines are assigned the (open full item for complete abstract)

    Committee: Alper Yilmaz PhD (Advisor); Alan Saalfeld PhD (Committee Member); Charles Toth PhD (Committee Member) Subjects: Civil Engineering; Geographic Information Science
  • 14. Geary, Kevin Color Feature Integration with Directional Ringlet Intensity Feature Transform for Enhanced Object Tracking

    Master of Science (M.S.), University of Dayton, 2016, Electrical Engineering

    Object tracking, both in wide area motion imagery (WAMI) and in general use cases, is often subject to many different challenges, such as illumination changes, background variation, rotation, scaling, and object occlusions. As WAMI datasets become more common, so too do color WAMI datasets. When color data is present, it can offer very strong potential features to enhance the capabilities of an object tracker. A novel color histogram-based feature descriptor is proposed in this thesis research to improve the accuracy of object tracking in challenging sequences where color data is available. The use of a three dimensional color histogram is explored, and various color spaces are tested. It is found to be effective but overly costly in terms of calculation time when comparing reference features to the test features. A reduced, two dimensional histogram is proposed, created from three channel color spaces by removing the intensity/luminosity channel before calculating the histogram. The two dimensional histogram is also evaluated as a feature for object tracking, and it is found that the HSV two dimensional histogram performs significantly better than other color space histograms, and that the two dimensional histogram performs at a level very near that of the three dimensional histogram, but an order of magnitude less complex in the feature distance calculation. The proposed color feature descriptor is then integrated with the Directional Ringlet Intensity Feature Transform (DRIFT) object tracker. The two dimensional HSV color histogram is enhanced further by making use of the DRIFT Gaussian ringlets as a mask for the histogram, resulting in a set of weighted histograms as the color feature descriptor. This is calculated alongside the existing DRIFT features of intensity and Kirsch mask edge detection. The distance scores for the color feature and DRIFT features are calculated separately, given the same weight, and then added together to form the final hybrid featu (open full item for complete abstract)

    Committee: Vijayan Asari Ph.D. (Committee Member); Eric Balster Ph.D. (Committee Member); Theus Aspiras Ph.D. (Committee Member) Subjects: Computer Engineering
  • 15. Madeti, Preetham Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow

    Master of Computing and Information Systems, Youngstown State University, 2016, Department of Computer Science and Information Systems

    Monitoring posts quality on the Stack Overflow website is of critical importance to make the experience smooth for its users. It strongly disapproves unproductive discussion and un-related questions being posted. Questions can get closed for several reasons ranging from questions that are un-related to programming, to questions that do not lead to a productive answer. Manual moderation of the site's content is a tedious task as approximately seventeen thousand new questions are posted every day. Therefore, leveraging machine learning algorithms to identify the bad questions would be a very smart and time-saving method for the community. The goal of this thesis is to build a machine learning classifier that could predict if a question will be closed or not, given the various textual and post related features. A training model was created using Apache Spark's Machine Learning Libraries. This model could not only predict the closed questions with good accuracy, but computes the result in a very small time-frame.

    Committee: Alina Lazar PhD (Advisor); Bonita Sharif PhD (Committee Member); Yong Zhang PhD (Committee Member) Subjects: Computer Science; Information Systems
  • 16. Doo, Seung Ho Analysis, Modeling & Exploitation of Variability in Radar Images

    Doctor of Philosophy, The Ohio State University, 2016, Electrical and Computer Engineering

    This dissertation explores the variability in radar measurements that arises due to small changes in target aspect angle, proposes a target modeling approach with augmented point scatterers and developes aspect invariant feature extraction techniques. First, the causes of the measurement variability are evaluated in quantitive and qualitative manners and attributed to four scatterer types in addition to the pulse compression process. Based on this analysis, an augmented point scatterer model is proposed that allows fast generation of realistic radar data. The proposed model is applied to targets used in the MSTAR data, and its results are compared with actual target measurements. In addition, the analysis of the measurement variability is also used to design feature vectors for target classification that are invariant to target aspect angle. The proposed features are designed to reduce and to exploit variability information in target data. A novel grid cell structure is designed for efficient target information extraction that shows high angular stability by considering the physical structures of potential targets. Lastly, target classification is undertaken, using the proposed feature vectors, demonstrating their utility when working with measured radar data.

    Committee: Graeme Smith (Advisor); Joel Johnson (Committee Member); Robert Burkholder (Committee Member) Subjects: Electrical Engineering
  • 17. Coleman, Ashley Feature Extraction using Dimensionality Reduction Techniques: Capturing the Human Perspective

    Master of Science (MS), Wright State University, 2015, Computer Science

    The purpose of this paper is to determine if any of the four commonly used dimensionality reduction techniques are reliable at extracting the same features that humans perceive as distinguishable features. The four dimensionality reduction techniques that were used in this experiment were Principal Component Analysis (PCA), Multi-Dimensional Scaling (MDS), Isomap and Kernel Principal Component Analysis (KPCA). These four techniques were applied to a dataset of images that consist of five infrared military vehicles. Out of the four techniques three out of the five resulting dimensions of PCA matched a human feature. One out of five dimensions of MDS matched a human feature. Two out of five dimensions of Isomap matched a human feature. Lastly, none of the resulting dimensions of KPCA matched any of the features that humans listed. Therefore PCA was the most reliable technique for extracting the same features as humans when given a set number of images.

    Committee: Pascal Hitzler Ph.D. (Advisor); Mateen Rizki Ph.D. (Committee Member); John Gallagher Ph.D. (Committee Member) Subjects: Computer Science
  • 18. Bhattacharya, Arindam Gradient Dependent Reconstruction from Scalar Data

    Doctor of Philosophy, The Ohio State University, 2015, Computer Science and Engineering

    Computed Tomography (CT) is widely accepted as an important tool in medicine. Increasingly, CT is finding a wide variety of application in material science and engineering fields. CT is being used for non-destructive inspection and characterization of aeronautic and automobile components. These components have wide variations in geometry and material characteristics, from single solid piece of metals such as aluminum to exotic composite materials and from micro scale engine parts to large scale airplane tail fins. Unlike organic parts, machine parts often have `sharp' features. Consequently, feature sensitive reconstruction from volume data has seen sporadic but critical work in the recent years. A number of these papers present algorithms to construct isosurfaces with sharp edges and corners from Hermite data, i.e. data containing the exact surface normals at the exact intersection of the surface and grid edges. Such surface normals are not available with CT data. In this thesis, we discuss some fundamental problems with the previous algorithms and the difficulties in using these algorithms on real CT (scalar data) and further describe a new approach to feature reconstruction from volume data. Feature sensitive reconstruction is based on the ability to approximate surface normals from scalar field gradients. Change in gradients can also be used to measure local directions of geometric structures in CT data . We describe a method to extract fiber bundle directions from industrial CT of fiber composites using gradients. Specifically this dissertation proposes, 1) a method to reconstruct isosurfaces from scalar data while preserving sharp features (edges and corners) given a scalar grid and the gradients at the grid locations; 2) a method to select the correct gradients at the grid locations which will be used as input to the above algorithm; 3) Finally, a method for extracting and visualizing fiber bundles in fiber reinforced composites scanned with X-r (open full item for complete abstract)

    Committee: Wenger Rephael (Advisor); Shen Han Wei (Committee Member); Dey Tamal (Committee Member) Subjects: Computer Science
  • 19. Krieger, Evan Directional Ringlet Intensity Feature Transform for Tracking in Enhanced Wide Area Motion Imagery

    Master of Science (M.S.), University of Dayton, 2015, Electrical Engineering

    Object tracking in the wide area motion imagery (WAMI) data may be subjected to many challenges including object occlusion, rotation, scaling, illumination changes, and background variations. In addition, the target objects for tracking in WAMI data are typically low resolution and captured in complex lighting conditions. A novel feature extraction method along with a preprocessing stage is proposed in this thesis research to improve the accuracy of object tracking in these challenging environments. The local preprocessing algorithm performs illumination and spatial enhancement. The illumination enhancement algorithm utilizes a self-tunable transformation function (STTF), which is a nonlinear inverse sine transform, along with an improved color restoration and a halo reduction technique. The spatial enhancement algorithm is a single image super resolution technique that uses Fourier phase features and an adaptive kernel regression technique on the intensity channel. An intelligent methodology for integrating the intensity and spatial enhancement algorithms is developed to improve the tracking performance in the WAMI data. A robust feature-based tracking solution based on the Gaussian ringlet intensity distribution (GRID) feature extraction method is proposed in this thesis. GRID uses Gaussian ring histograms to create features that are robust to object rotation, illumination, and partial object occlusion. However, certain conditions, such as background variations and object structural information distortions, continue to cause feature mismatching. Hence, a new ringlet masking strategy that utilizes the rotational invariance of the Gaussian ringlet and directional edge information of the Kirsch kernel is proposed as the feature descriptor. The new Directional Ringlet Intensity Feature Transform (DRIFT) descriptor weighs the intensity and edge information of the reference object with the ringlet features to achieve robustness to object distortions and background va (open full item for complete abstract)

    Committee: Vijayan Asari (Committee Chair); Eric Balster (Committee Member); Russell Hardie (Committee Member) Subjects: Electrical Engineering
  • 20. Mora, Omar Morphology-Based Identification of Surface Features to Support Landslide Hazard Detection Using Airborne LiDAR Data

    Doctor of Philosophy, The Ohio State University, 2015, Civil Engineering

    Landslides are natural disasters that cause environmental and infrastructure damage worldwide. In order to reduce future risk posed by them, effective detection and monitoring methods are needed. Landslide susceptibility and hazard mapping is a method for identifying areas suspect to landslide activity. This task is typically performed in a manual, semi-automatic or automatic form, or a combination of these, and can be accomplished using different sensors and techniques. As landslide hazards continue to impact our environment and impede the lives of many, it is imperative to improve the tools and methods of effective and reliable detecting of such events. Recent developments in remote sensing have significantly improved topographic mapping capabilities, resulting in higher spatial resolution and more accurate surface representations. Dense 3D point clouds can be directly obtained by airborne Light Detection and Ranging (LiDAR) or created photogrammetrically, allowing for better exploitation of surface morphology. The potential of extracting spatial features typical to landslides, especially small scale failures, provides a unique opportunity to advance landslide detection, modeling, and prediction process. This dissertation topic selection was motivated by three primary reasons. First, 3D data structures, including data representation, surface morphology, feature extraction, spatial indexing, and classification, in particular, shape-based grouping, based on LiDAR data offer a unique opportunity for many 3D modeling applications. Second, massive 3D data, such as point clouds or surfaces obtained by the state-of-the-art remote sensing technologies, have not been fully exploited for landslide detection and monitoring. Third, unprecedented advances in LiDAR technology and availability to the broader mapping community should be explored at the appropriate level to assess the current and future advantages and limitations of LiDAR-based detection and modeling of land (open full item for complete abstract)

    Committee: Dorota Grejner-Brzezinska (Advisor); Charles Toth (Advisor); Tien Wu (Committee Member) Subjects: Civil Engineering