Doctor of Philosophy, The Ohio State University, 2008, Electrical and Computer Engineering
A major goal in pattern recognition algorithms is to achieve theperformance of the Bayes optimal rule, i.e. minimum probability in
classification error. Unfortunately, we usually can't achieve to
this goal, because the original data distributions are unknown. This
leaves us with the need to estimate the true, underlying class
distributions from samples. This estimation procedure adds
classification errors due to two major causes. First, the form of
the density function used in our estimate may not correctly define
the data. Second, noise and limited data available may generate
incorrect estimates. In particular, the first problem usually occurs
when the data representations share a common norm (spherical data).
Since the estimation of the Gaussian model is much easier than those
of spherical models, researchers generally resort to the uses of the
former. In this thesis, we show that in some particular cases, which
we named spherical-homoscedastic, one can use the Gaussian model and
still obtain Bayes optimal classifications. We applied the developed
theory to many practical problems including text classification,
gene expression analysis and shape analysis. For the analysis of
shapes, we introduce the new key concept of rotation invariant
kernels. Here, we derive a criterion to select the parameter of this
kernel that make the shape distributions spherical-homoscedastic in
the kernel space. The second major problem is addressed by proposing
a feature extraction algorithm considering the Bayes optimality of
the solution. Similarly to the above classification problem, most of
the algorithms defined to date are extracting the classification
information depending on some discriminant criteria, rather than the
Bayes error itself. This is due to the difficulties associated with
calculating the Bayes error. In the second part of this thesis, we
design an algorithm that can extract the 1-dimensional subspace
where the Bayes error is minimized for homoscedastic (i.e., same
c (open full item for complete abstract)
Committee: Aleix M. Martinez PhD (Advisor); Andrea Serrani PhD (Committee Member); Yoonkyung Lee PhD (Committee Member); Mikhail Belkin PhD (Committee Member)
Subjects: Computer Science; Electrical Engineering; Statistics