Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 5)

Mini-Tools

 
 

Search Report

  • 1. Cunningham, James Efficient, Parameter-Free Online Clustering

    Master of Science, The Ohio State University, 2020, Computer Science and Engineering

    As the number of data sources and dynamic datasets grows so does the need for efficient online algorithms to process them. Clustering is a very important data exploration tool used in various fields from data science to machine learning. Clustering finds structure unsupervised among indecipherable collections of data. Online clustering is the process of grouping samples in a data stream into meaningful collections as they appear over time. As more data is collected in these streams online clustering becomes more important than ever. Unfortunately, the number of available efficient online clustering algorithms is limited due to the difficulty of their design that often requires significant trade-offs in clustering performance for efficiency. Those that do exist require expert domain knowledge of the data space to set hyperparameters for the algorithm such as the desired number of clusters. This domain knowledge is often unavailable, so resources must be spent tuning hyperparameters to get acceptable performance. In this thesis we present an online modification to FINCH, the recent state-of-the-art parameter-free clustering algorithm by Sarfraz et al. called Stream FINCH (S-FINCH). We examine the stages of the FINCH algorithm and leverage key insights to produce an algorithm which reduces the online update complexity of FINCH. We then compare the performance of S-FINCH and FINCH over several toy and real-world datasets. We show theoretically and empirically that our S-FINCH algorithm is more efficient than the FINCH algorithm in the online domain and has reasonable real-time update performance. We also present several alternative cluster representatives which can be used to build different agglomerative cluster hierarchies using the S-FINCH algorithm. We compare the cluster quality and clustering time performance of these new representatives with the original FINCH algorithm. The S-FINCH algorithm presented in this thesis allows for fast and efficient online (open full item for complete abstract)

    Committee: James Davis PhD. (Advisor); Juan Vasquez PhD. (Committee Member); Kyle Tarplee PhD. (Committee Member); Wei-Lun Chao PhD. (Committee Member) Subjects: Computer Science; Information Science; Information Technology
  • 2. Dornala, Maninder Visualization of Clustering Solutions for Large Multi-dimensional Sequential Datasets

    Master of Computing and Information Systems, Youngstown State University, 2018, Department of Computer Science and Information Systems

    The research presented is focused on the development of algorithms targeted towards analyzing data in the form of categorical time series or sequences. The wide availability of mobile devices and sensors connected to the Internet, makes it easier to collect datasets to model long-term user behavior. Nevertheless, performing fundamental analytical operations, such as clustering for grouping these data based on similarity patterns, has proved challenging due to the categorical nature of the data, the multiple variables to consider and their corruption by missing values. The classical metric type similarity distances have to be replaced with "edit" type distances, such as Optimal Matching. We developed this approach with the aim of studying the effect of similarity measure choice on clustering and dimensionality reduction methods applied to long-term life cycle trajectories. The discovered patterns can help providing better decision making and public policy design.

    Committee: Alina Lazar PhD (Advisor); Bonita Sharif PhD (Committee Member); John Sullins PhD (Committee Member) Subjects: Computer Science; Information Systems; Information Technology
  • 3. Fitton, N Why and How to Report Distributions of Optima in Experiments on Heuristic Algorithms

    MS, University of Cincinnati, 2001, Engineering : Computer Science

    The goal of this thesis is to make good science easier to do by promoting and facilitating the comparison of algorithms. Algorithms for solving difficult problems, and in particular heuristic algorithms for intractable problems, often do not lend themselves to standard methods of theoretical analysis: thus a researcher must make an empirical, statistical argument for or against an algorithm's quality. This thesis discusses the design and evaluation of experiments in this sometimes slippery area of study. We contend that a contribution to science requires more than the reporting of an algorithm's optimal results: researchers should choose sample sizes rationally; they should then report, at minimum, both the distribution of results overall and the distribution of a set of optimal values, which may involve further sampling to reach desired goals; they should account for sampling itself by including confidence intervals for their claims; and if they compare results of different algorithms, then their comparisons should be statistically sound and rigorous. We offer an experimental procedure with guidelines for each of these steps, and within the procedure, we offer a new method for sampling and comparing sets of optimal values, which we call k% optima. We also present a demonstration experiment in which we apply our procedure to report and compare the performance of three heuristic algorithms for VLSI circuit partitioning.

    Committee: Dr. Carla Purdy (Advisor) Subjects:
  • 4. Gadde, Srimanth Graph Partitioning Algorithms for Minimizing Inter-node Communication on a Distributed System

    Master of Science in Electrical Engineering, University of Toledo, 2013, College of Engineering

    Processing large graph datasets represents an increasingly important area in computing research and applications. The size of many graph datasets has increased well beyond the processing capacity of a single computing node, thereby necessitating distributed approaches. As these datasets are processed over a distributed system of nodes, this leads to an inter-node communication cost problem (also known as inter-partition communication), negatively affecting the system performance. This research proposes new graph partitioning algorithms to minimize the inter-node communication by achieving a sufficiently balanced partition. Initially, an intuitive graph partitioning algorithm using Random Selection method coupled with Breadth First Search is developed for reducing inter-node communication by achieving a sufficiently balanced partition. Second, another graph partitioning algorithm is developed using Particle Swarm Optimization with Breadth First Search to reduce inter-node communication further. Simulation results demonstrate that the inter-node communication using PSO with BFS gives better results (reduction of approximately 6% to 10% more) compared to the RS method with BFS. However, both the algorithms minimize the inter-node communication efficiently in order to improve the performance of a distributed system.

    Committee: Robert Green (Committee Chair); Vijay Devabhaktuni (Committee Co-Chair); William Acosta (Committee Member); Mansoor Alam (Committee Member) Subjects: Computer Engineering; Computer Science
  • 5. MUPPIDI, SRINIVAS GENETIC ALGORITHMS FOR MULTI-OBJECTIVE PARTITIONING

    MS, University of Cincinnati, 2004, Engineering : Computer Engineering

    Circuit partitioning is the process of dividing up a large circuit netlist into a set of smaller net lists while maintaining the design integrity. With the increasing complexity of micro-electronic circuits, it has become difficult to fit large design in a single chip. Instead of a single large chip for a design, it is easy and economical to fit the design in several smaller chips. It is imperative to look at the various design requirements as well as the cutset minimization while partitioning a design. Effective utilization of available resources is impeded by several factors: interchip wiring, area requirements of the components, timing delay between various components and basic I/O requirements are some of these factors. When we try to meet or optimize all of the constraints simultaneously while partitioning a design, it is called multi-objective partitioning. Move based algorithms like Kernighnan-Lin algorithm, Fiduccia-Mattheyses algorithm and Clustering algorithms are widely used for partitioning purposes. But these are iterative algorithms which work on refining a solution by slight modifications, rather than a wide search of solutions. However, evolution based algorithms like Simulated Annealing and Genetic Algorithms (GAs) have the ability to search for solutions over a wide range of search space and can refine these solutions too. This thesis looks at the use of GAs for multi-objective partitioning. GAs can be classified into various modes based on their evolution strategies. We look at several modes of GAs to classify them according to their efficiency. In these algorithms we use population replacement, where a part of the population or solutions are replaced in order to provide a fresh impetus and help the GA from getting stuck at a local minima. Three different sets of benchmarks are considered for the evaluation of these different modes of GAs. They range from two-terminal bi-partitioning, multi-terminal bi-paritioning to multi-terminal multi-partition (open full item for complete abstract)

    Committee: Dr. Ranga Vemuri (Advisor) Subjects: