Learning a Mahalanobis distance metric for data clustering and classification
Introduction
Distance metric is a key issue in many machine learning algorithms. For example, Kmeans and K-nearest neighbor (KNN) classifier need to be supplied a suitable distance metric, through which neighboring data points can be identified. The commonly used Euclidean distance metric assumes that each feature of data point is equally important and independent from others. This assumption may not be always satisfied in real applications, especially when dealing with high dimensional data where some features may not be tightly related to the topic of interest. In contrast, a distance metric with good quality should identify important features and discriminate relevant and irrelevant features. Thus, supplying such a distance metric is highly problem-specific and determines the success or failure of the learning algorithm or the developed system [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13].
There has been considerable research on distance metric learning over the past few years [14]. One family of algorithms are developed with known class labels of training data points. Algorithms in this family include the neighboring component analysis [15], large margin nearest neighbor classification [16], large margin component analysis [17], class collapse [18], and other extension work [19], [20]. The success in a variety of problems shows that the learned distance metric yields substantial improvements over the commonly used Euclidean distance metric [15], [16], [17], [18]. However, class label may be strong information from the users and cannot be easily obtained in some real-world situations. In contrast, it is more natural to specify which pairs of data points are similar or dissimilar. Such pairwise constraints appear popularly in many applications. For example, in image retrieval the similar and dissimilar images to the query one are labeled by the user and such image pairs can be used to learn a distance metric [21]. Accordingly, another family of distance metric learning algorithms are developed to make use of such pairwise constraints [14], [21], [22], [23], [24], [25], [26], [27], [28], [29]. Pairwise constraint is a kind of side information [22]. One popular form of side information is must-links and cannot-links [22], [30], [31], [32], [33], [34], [35]. A must-link indicates the pair of data points must be in a same class, while a cannot-link indicates that the two data points must be in two different classes. Another popular form is the relative comparison with “A is closer to B than A is to C” [26]. The utility of pairwise constraints has been demonstrated in many applications, indicating that significantly improvement of the algorithm can be achieved [21], [22], [23], [24], [25], [26], [27].
The two families of distance learning algorithms are extended in many aspects. Based on the class labels of training data points, Weinberger and Tesauro proposed to learn distance metric for kernel regression [36]. Based on labeled training data, Hertz et al. maximized the margin with boosting to obtain distance functions for clustering [37]. Bilenko et al. integrated the pairwise constraints (must-links and cannot-links) and metric learning into a semi-supervised clustering [38]. Clustering on many data sets shows that the performance of Kmeans algorithm has been substantially improved. Also based on must-links and cannot-links, Davis et al. developed an information theory-based framework [39]. Compared with most existing methods, their framework need not perform complex computation, such as eigenvalue decomposition and semi-definite programming [15], [16]. Yang et al. presented a Bayesian framework in which a posterior distribution for the distance metric is estimated from the labeled pairwise constraints [40]. Kumar et al. used the relative comparisons to develop a new clustering algorithm in a semi-supervised clustering setting [41]. Formulating the problem as a linear programming, Rosales and Fung proposed to learn a sparse metric with relative comparison constraints. The sparsity of the learned metric can help to reduce the distance computation [42]. In addition, the distance metric learning algorithms are also extended with kernel tricks [11], [21], [43], [44], [45]. Nonlinear adaptive metric learning algorithm has also been developed [46]. Furthermore, some online distance metrics learning algorithms [39], [47] have been proposed recently for the situations where the data points are collected sequentially. The use of the learned distance metrics has been demonstrated in many real-word applications, including speech processing [48], visual representation [49], word categorization [12], face verification [50], medical image processing [51], video object classification[52], biological data processing [53], image retrieval [21], [54], and so on.
In this paper we focus on learning a Mahalanobis distance metric from must-links and cannot-links. The Mahalanobis distance is a measure between two data points in the space defined by relevant features. Since it accounts for unequal variances as well as correlations between features, it will adequately evaluate the distance by assigning different weights or importance factors to the features of data points. Only when the features are uncorrelated, the distance under a Mahalanobis distance metric is identical to that under the Euclidean distance metric. In addition, geometrically, a Mahalanobis distance metric can adjust the geometrical distribution of data so that the distance between similar data points is small [22]. Thus it can enhance the performance of clustering or classification algorithms, such as KNN classifier. Such advantages can be used to perform special tasks on a given data set, if given a suitable Mahalanobis distance metric. It is natural to learn it from some prior knowledge supplied by the user according to her/his own task. One easy way to supply prior knowledge is to supply some instances of similar/dissimilar data pairs (must-links/cannot-links). We hope a Mahalanobis distance metric can be learned by forcing it to adjust the distances of the given instances and then applied to new data.
The basic idea in this paper is to minimize the distances of point pairs in must-links and maximize those of point pairs in cannot-links. To this end, we formulate this task as a constrained optimization problem. Since the formulated problem cannot be analytically solved, an iterative framework is developed to find the optimum in way of binary search. A lower bound and an upper bound including the optimum are explicitly estimated and then used to control the initial value. This will benefit the initialization of the iterative algorithm. The globally optimal Mahalanobis distance matrix is finally obtained effectively and efficiently. In addition, the computation is also fast, up to exponential convergence. Comparative experiments on data clustering, interactive natural image segmentation and face pose estimation show the validity of our algorithm.
The remainder of this paper is organized as follows. Section 2 will briefly introduce the related work and our method. We address our problem and develop the algorithm in Section 3. The experimental results and applications in data clustering, interactive image segmentation and face pose estimation are reported in Section 4. Section 5 concludes this paper.
Section snippets
Related work and our method
Given two data points and , their Mahalanobis distance can be calculated as follows:where is positively semi-definite. Using the eigenvalue decomposition, can be decomposed into . Thus, it is also feasible to learn the matrix . Then, we have
Typically, Xing et al. studied the problem of learning a Mahalanobis matrix from must-links and cannot-links [22]. In their framework, the sum of the Mahalanobis distances of
Problem formulation
Suppose we are given a data set and two sets of pairwise constraints including must-links: , and cannot-links: . Our goal is to learn a Mahalanobis matrix such that the distances of point pairs in are as small as possible, while those in are as large as possible.
According to Eq. (2), equivalently, we can select to optimize the matrix , with . To this end, we introduce a transformation:
Experiments
We evaluated our algorithm on several data sets, and compared it with RCA, DCA and Xing's method. We show the experimental results and the applications to data clustering, interactive natural image segmentation and face pose estimation.
Conclusion
In summary, this paper addresses a general problem of learning a Mahalanobis distance metric from side information. It is formulated as a constrained optimization problem, in which the ratio of distances (in terms of ratio of matrix traces) is used as the objective function. An optimization algorithm is proposed to solve this problem, in which a lower bound and an upper bound including the optimum are explicitly estimated and then used to stipulate the initial value for iterations. Experimental
Acknowledgments
This work is supported by the projects (Grant No. 60721003 and 60675009) of the National Natural Science Foundation of China. The anonymous reviewers have helped to improve the quality and representation of this paper.
About the Author—SHIMING XIANG received his B.S. degree from Chongqing Normal University, China in 1993 and M.S. degree from Chongqing University, China in 1996, and Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2004. From 1996 to 2001, he was a lecturer of Huazhong University of Science and Technology, Wuhan, China. He was a Post Doctor in the Department of Automation, Tsinghua University until 2006. He is currently an Associate Professor in the
References (67)
- et al.
Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints
Pattern Recognition
(2006) - et al.
A new lda based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000) - et al.
A direct lda algorithm for high-dimensional data—with application to face recognition
Pattern Recognition
(2001) - et al.
A generalized Foley–Sammon transform based on generalized fisher discriminant criterion and its application to face recognition
Pattern Recognition Lett.
(2003) - et al.
Discriminant adaptive nearest neighbor classification
IEEE Trans. Pattern Anal. Mach. Intell.
(1996) - et al.
Learning from user behavior in image retrieval: application of market basket analysis
Int. J. Comput. Vision
(2004) - et al.
Learning a semantic space from users relevance feedback for image retrieval
IEEE Trans. Circuits Systems Video Technol.
(2003) - et al.
Learning more accurate metrics for self-organizing maps
- C. Domeniconi, D. Gunopulos, Adaptive nearest neighbor classification using support vector machines, in: Advances in...
- et al.
Adaptive kernel metric nearest neighbor classification
Negative pseudo-relevance feedback in content-based video retrieval
Learning an image manifold for retrieval
Manifold ranking based image retrieval
Learning semantics-preserving distance metrics for clustering graphical data
Formulating context-dependent similarity functions
Choosing a distance metric for automatic word categorization
A transductive framework of distance metric learning by spectral dimensionality reduction
Neighbourhood components analysis
Distance metric learning for large margin nearest neighbor classification
Large margin component analysis
Metric learning by collapsing classes
Parametric distance metric learning with label information
Distance metric learning, with application to clustering with side-information
Learning distance functions using equivalence relations
J. Mach. Learn. Res.
Distance metric learning with kernels
Learning sparse metrics via linear programming
Learning a distance metric from relative comparisons
Learning nonstructural distance metric by minimum cluster distortions
Cited by (530)
Regularization and optimization in model-based clustering
2024, Pattern RecognitionSemi-supervised fuzzy clustering algorithm based on prior membership degree matrix with expert preference
2024, Expert Systems with ApplicationsSecure delegated quantum algorithms for solving Mahalanobis distance
2023, Physica A: Statistical Mechanics and its ApplicationsNeural quantification of timbre and emotions from Indian Classical Music: A multifractal exploration
2023, Physica A: Statistical Mechanics and its ApplicationsSimulation of optical fiber sensor in motion training image analysis system based on human posture tracking algorithm
2024, Optical and Quantum ElectronicsGeodesic Fuzzy Rough Sets for Discriminant Feature Extraction
2024, IEEE Transactions on Fuzzy Systems
About the Author—SHIMING XIANG received his B.S. degree from Chongqing Normal University, China in 1993 and M.S. degree from Chongqing University, China in 1996, and Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2004. From 1996 to 2001, he was a lecturer of Huazhong University of Science and Technology, Wuhan, China. He was a Post Doctor in the Department of Automation, Tsinghua University until 2006. He is currently an Associate Professor in the Institute of Automation, Chinese Academy of Science. His interests include computer vision, pattern recognition, machine learning, etc.
About the Author—FEIPING NIE received his B.S. degree from the Department of Computer Science, North China University of Water Conservancy and Electric Power, China in 2000, and M.S. degree from the Department of Computer Science, Lanzhou University, China in 2003. He is currently a Ph.D. candidate in the Department of Automation, Tsinghua University. His research interests focus on machine learning and its applications.
About the Author—CHANGSHUI ZHANG received his B.S. degree from the Department of Mathematics of Peking University, China in 1986, and Ph.D. degree from the Department of Automation, Tsinghua University in 1992. He is currently a Professor of Department of Automation, Tsinghua University. He is an Associate Editor of the Journal of Pattern Recognition. His interests include artificial intelligence, image processing, pattern recognition, machine learning, evolutionary computation and complex system analysis, etc.