Nonparametric Robust Methods for Computer Vision

Ph.D. Thesis Dorin I. Comaniciu


Abstract

Low level computer vision tasks are misleadingly difficult and can yield unreliable re­ sults, since often the employed techniques rely upon inaccurate parametric models. This thesis introduces in computer vision a nonparametric procedure for the analysis of multimodal data based on the mean shift property, and demonstrate its superior performance in various applications. The convergence of the mean shift procedure to the closest mode of the underlying distribution is proven, both for the Epanechnikov kernel and the general case of kernels with convex and monotonic decreasing profile. Exploiting parallel mean shift processes, a robust clustering method was developed for the analysis of complex feature spaces derived from real data. The cluster centers are obtained by finding the modes of the underlying distribution and their basins of attraction define the cluster boundaries. Examples of image segmentation in color spaces are presented to show the superior performance. The mean shift based analysis was also employed in the joint, spatial­range (value) domain of gray level and color images for discontinuity preserving filtering and image segmentation. Several examples, for gray and color images, show the versatility of the method and compare favorably with results described in the literature for the same images. The application of the mean shift for the tracking of visual features is also investigated and examples of nonrigid object tracking using color histograms are given. The image segmenter was the central module of a content based image retrieval system we developed to support decision making in clinical pathology. The Image Guided Decision Support (IGDS) system locates, retrieves and displays cases which exhibit morphological profiles consistent to the case in question. The reliability of the segmentation made possible unsupervised on­line analysis of the query image and extraction of the features of interest: shape, area, and texture of the nucleus. The system performance was assessed through ten­fold cross­validated classification and compares favorably with that of three human experts. To facilitate a natural man­ machine interface, speech recognition and voice feedback engines were integrated. The system also contains components for both remote microscope control and multiuser visualization. A general methodology for indexing with multivariate features based on the Bhat­ tacharyya distance was also analyzed. To reduce the amount of computations and the size of logical database entry, we propose the approximation of the Bhattacharyya distance in the low­dimensional subspace of the first few principal components. The retrieval performance was assessed for three texture databases (VisTex, Brodatz, and MeasTex) and two texture representations (MRSAR model and Gabor features), and was consistently superior to the traditional Mahalanobis distance based approaches.

The thesis has  part1  part2  part3  and  part4. The size of the compressed files is about 7 M. The thesis contains 102 pages.


Return to Theses