Nonparametric Robust Methods for Computer Vision
Ph.D. Thesis Dorin I. Comaniciu
Abstract
Low level computer vision tasks are misleadingly difficult and can yield unreliable re
sults, since often the employed techniques rely upon inaccurate parametric models.
This thesis introduces in computer vision a nonparametric procedure for the analysis
of multimodal data based on the mean shift property, and demonstrate its superior
performance in various applications. The convergence of the mean shift procedure to
the closest mode of the underlying distribution is proven, both for the Epanechnikov
kernel and the general case of kernels with convex and monotonic decreasing profile.
Exploiting parallel mean shift processes, a robust clustering method was developed
for the analysis of complex feature spaces derived from real data. The cluster centers
are obtained by finding the modes of the underlying distribution and their basins of
attraction define the cluster boundaries. Examples of image segmentation in color
spaces are presented to show the superior performance. The mean shift based analysis
was also employed in the joint, spatialrange (value) domain of gray level and color
images for discontinuity preserving filtering and image segmentation. Several examples,
for gray and color images, show the versatility of the method and compare favorably
with results described in the literature for the same images. The application of the mean
shift for the tracking of visual features is also investigated and examples of nonrigid
object tracking using color histograms are given.
The image segmenter was the central module of a content based image retrieval
system we developed to support decision making in clinical pathology. The Image
Guided Decision Support (IGDS) system locates, retrieves and displays cases which
exhibit morphological profiles consistent to the case in question. The reliability of
the segmentation made possible unsupervised online analysis of the query image and
extraction of the features of interest: shape, area, and texture of the nucleus. The
system performance was assessed through tenfold crossvalidated classification and
compares favorably with that of three human experts. To facilitate a natural man
machine interface, speech recognition and voice feedback engines were integrated. The
system also contains components for both remote microscope control and multiuser
visualization.
A general methodology for indexing with multivariate features based on the Bhat
tacharyya distance was also analyzed. To reduce the amount of computations and
the size of logical database entry, we propose the approximation of the Bhattacharyya
distance in the lowdimensional subspace of the first few principal components. The
retrieval performance was assessed for three texture databases (VisTex, Brodatz, and
MeasTex) and two texture representations (MRSAR model and Gabor features), and
was consistently superior to the traditional Mahalanobis distance based approaches.
The thesis has part1 part2 part3 and part4. The size of the compressed files is about 7 M. The thesis contains 102 pages.
Return to Theses