Graphs serve as an important tool for discrete data representation. Recently, graph representations have made possible very powerful machine learning algorithms, such as manifold learning, kernel methods, semi-supervised learning. With the advent of large-scale real world networks, such as biological networks (disease network, drug target network, etc.), social networks (DBLP Co-authorship network, Facebook friendship, etc.), machine learning and data mining algorithms have found new application areas and have contributed to advance our understanding of properties, and phenomena governing real world networks.
When dealing with real world data represented as networks, two problems arise quite naturally:
I) How to integrate and align the knowledge encoded in multiple and heterogeneous networks? For instance, how to find out the similar genes in co-disease and protein-protein interaction networks?
II) How to model and predict the evolution of a dynamic network? A real world example is, given N years snapshots of an evolving social network, how to build a model that can capture the temporal evolution and make reliable prediction?
In this dissertation, we present an innovative graph embedding framework, which identifies the key components of modeling the evolution in time of a dynamic graph. Different from the many state-of-the-art graph link prediction and modeling algorithms, it formulates the link prediction problem from a geometric perspective that can capture the dynamics of the intrinsic continuous graph manifold evolution. It is attractive due to its simplicity and the potential to relax the mining problem into a feasible domain which enables standard machine learning and regression models to utilize historical graph time series data.
To address the first problem, we first propose a novel probability-based similarity measure which led to promising applications in content based image retrieval and image annotation, followed by a manifold alignment framework to align multiple heterogeneous networks, which demonstrate its power in mining biological networks.
Finally, the dynamic graph mining framework generalizes most of the current graph embedding dynamic link prediction algorithms. Comprehensive experimental results on both synthesized and real-world datasets demonstrate that our proposed algorithmic framework for multiple heterogeneous networks and dynamic networks, can lead to better and more insightful understanding of real world networks. Scalability of our algorithms is also considered by employing MapReduce cloud computing architecture.
Committee: Anca Ralescu, PhD (Committee Chair); Anil Jegga, DVMMRes (Committee Member); Fred Annexstein, PhD (Committee Member); Kenneth Berman, PhD (Committee Member); Yizong Cheng, PhD (Committee Member); Dan Ralescu, PhD (Committee Member)