Projection based Robust Estimators for Computer Vision

Ph.D. Thesis Haifeng Chen


Robust regression is the generic name of techniques which estimate regression models in the presence of significant number of data points not belonging to that model, i.e., outliers. In most computer vision applications where the data is only rarely homogeneous, tolerance to outliers is a necessary condition for the satisfactory performance. Often multiple structures are present in the data, for example, when analyzing a dynamic scene taken by a moving camera. Related to the structure of interest, the inliers of the other structures are called ``structured outliers''. The purpose of this thesis is two-fold. First, we analyze the conditions necessary to achieve a robust behavior when solving computer vision tasks. Second, we develop a new family of robust regression techniques which exhibit a performance superior to the current methods.
The theory of robust estimators was developed in the last thirty years in statistics and the two main classes of methods, M-estimators and least median of squares (LMedS), were successfully applied to many vision problems. However, some of the ``most'' robust techniques used in the vision community, Hough transform and RANSAC , are innate and %\cite{fischler81} were developed independently to meet the specific needs of processing visual data. We show that all four classes of robust estimation techniques can be regarded as particular cases of M-estimators with auxiliary scale. The robust techniques imported from statistics differ from those developed by the vision community in the way the scale is obtained. For the former the scale is estimated from the data, while for the latter its value is set a priori. In most cases, the success of a robust method depends on the reliability of the scale. Whenever the employed scale is not correct, the robust regression may fail to recover the inlier structure.
One main contribution of the thesis is to develop a new family of robust regression techniques in which performance is much less conditioned on the accuracy of additional information such as scale. We reinterpret M-estimation within the projection pursuit framework and show how the need for a reliable scale estimate can be avoided. Projection pursuit seeks ``interesting'' low-dimensional projections of multidimensional data. The informative value of a projection is measured with a projection index, which is a scalar function of the probability distribution estimated from the projected data. The ``best'' projection corresponds to an extremum of the projection index. To obtain this extremum, probabilistic sampling technique customary in some robust regression methods is replaced with a simplex based multidimensional direct search. The technique, called projection based M-estimator (pbM-estimator), provides a satisfactory inlier/outlier dichotomy for a wider range of contaminations than the traditional M-estimators. The pbM-estimator has been tested in several experiments both with synthetic and real data, and gave satisfactory results in spite of not using any prior knowledge about the data.
A new robust information fusion method is developed based on nonparametric method for detecting the modes of a multivariate probability distribution, the mean shift procedure. The method recovers from data with uncertainties the number and characteristics of the sources which generated it. The uncertainties of the measurements are encoded as the bandwidths of the mean shift which achieves much more robust behavior. The robust fusion technique is successfully applied to solve the model-based object recognition problem.
Robust regression in the presence of multiple structures is obtained when the robust fusion module is integrated with the pbM-estimator. A number of data subsets are randomly selected based on the density of data distribution and a probabilistic region growing procedure. The pbM-estimator is employed in each subset to extract the inliers, from which the model parameter and its covariance matrix are estimated. By performing the robust information fusion, the parameters of each structure are extracted and the measurements are segmented. Several computer vision applications are provided to show the effectiveness of our algorithm.

The thesis contains 150 pages.

Return to Theses