This thesis addresses clustering problems in two distinct settings. In the first setting we explore the notion of what it means for a graph to admit a good partition into k-clusters. We say a partition of a graph is strong if each cluster has small external conductance, and large internal conductance. We consider a natural embedding of the graph into R^{k} involving the first k eigenvectors of the graph's Laplacian matrix. We show that for graphs which admit a sufficiently strong k-partition, much of the l_{2}-mass of each partition element concentrates around a distinct point in the embedding. Further, we show how to estimate the degree of concentration as well as the separation of the k embedding points from the input graph. We exploit this fact to design a simple greedy spectral clustering algorithm, spectral k-clustering, which returns a partition that is provably close to a strong partition, provided that the input graph admits one. A recent result shows that strong partitions exist for graphs with a sufficiently large spectral gap between the k-th and (k+1)-st eigenvalues. Taking this together with our main theorem gives a spectral algorithm which finds a good partition in graphs with a sufficiently large spectral gap. Finally, we provide an experimental evaluation of the algorithm on inputs which do not necessarily satisfy the conditions required by our theorems.
The second setting concerns metric spaces which change over time. Here, our goal is to find a temporally coherent clustering. We introduce a framework for clustering finite sequences of metric spaces taken from a common ambient metric space and study generalizations of some classic clustering problems to this setting. Formally, let P denote this sequence of metric subspaces, which we call a temporal-sampling. Given a collection of clusters, C, we quantify the quality of the clustering under multiple objectives: complexity (size, |.|), a spatial clustering cost (μ), and a temporal cost (δ) which is small when the clusters exhibit good locality. For a fixed choice of objectives, we say that a temporal-sampling admits a temporal (k, r, δ)-clustering for some k in N, r in R_{≥0}, δ in R_{≥0} if there exists a clustering C of P such that |C| <= k, μ(C) <= r, and δ(C) <= δ. For certain choices of objectives, we study the problem of deciding in polynomial-time whether or not a temporal-sampling admits a temporal (k, r, δ)-clustering when k, r, and δ are taken to be some functions of the input. Further, let α, β, γ be positive real numbers which are all at least 1. An algorithm is a temporal (α, β, γ)-approximation if it is guaranteed to return a temporal (α k, β r, γ δ)-clustering whenever P admits a temporal (k, r, δ)-clustering. We study the existence of polynomial-time temporal (α, β, γ)-approximations under various assumptions. We give several approximation algorithms of this kind and exhibit some conditions under which approximation is NP-hard. Last, we introduce a model for hierarchical temporal clusterings and give a polynomial-time approximation algorithm. |