Large margin nearest local mean classifier

doi:10.1016/j.sigpro.2009.06.015

Signal Processing

Volume 90, Issue 1, January 2010, Pages 236-248

https://doi.org/10.1016/j.sigpro.2009.06.015 Get rights and content

Abstract

Distance metric learning and classifier design are two highly challenging tasks in the machine learning community. In this paper we propose a new large margin nearest local mean (LMNLM) scheme to consider them jointly, which aims at improving the separability between local parts of different classes. We adopt ‘local mean vector’ as the basic classification model, and then through linear transformation, large margins between heterogeneous local parts are introduced. Moreover, by eigenvalue decomposition, we may also reduce data's dimensions. LMNLM can be formulated as a semidefinite programming (SDP) problem, so it is assured to converge globally. Experimental results show that LMNLM is a promising algorithm due to its leading to high classification accuracies and low dimensions.

Introduction

Distance metric learning is of fundamental importance to many machine learning tasks, such as supervised classification, unsupervised clustering, semi-supervised learning, image retrieval, etc. [1]. A good distance metric may provide us with some enlightenment on underlying data structures, and thus an improved performance is expected to be obtained with the learned metric. In this paper we concentrate on supervised classification problems, so if a good metric can be learned effectively, higher classification accuracies will be achieved.

Another important aspect in machine learning is how to design classifiers. Similar to distance metric learning, a good classifier should also be data-dependent, i.e., some implicit data structures should be revealed by the designed classifier. Due to the common requirement of metric learning and classifier design, the two seemingly different problems could be considered jointly. Moreover, although there are some well-established classification approaches such as support vector machines (SVMs) [2], [3] and neural networks [4], most of these approaches do not consider the problem of learning a proper distance metric that is suitable for the corresponding classification model. Motivated by the above analysis, in this paper, we combine distance metric learning and classifier design as a whole, and this combination is one of central guidelines for our work.

In view of minimizing the structural risk, a special kind of ‘large margin’ classifier, support vector machines [2], [3], achieved general success in the last decade. Recently, some metric learning and/or classifier design works have adopted ideas similar to ‘large margin’ of SVMs. Xing et al. [5] applied the side information for Mahalanobis distance metric learning and aimed at minimizing the sum of squared distances between similarly labeled pairs and simultaneously maintaining a lower bound on the sum of squared distances between dissimilarly labeled ones. Although can be solved with global optimal solutions, the algorithm of [5] does not involve any slack variables and may incur overfittings. Bachrach et al. [6] introduced the ‘margin’ into feature selection tasks. The algorithm of [6] cannot be described as a convex optimizing problem and usually converges with local optimal solutions. The work in [7], [8], [9], [10] concentrated on different ‘marginal’ metric learning algorithms for k-NN classifiers. They employ the k-NN classifier as classification tools, which utilize information from k nearest neighbors and may neglect the discriminative information derived from other useful inputs. Veenman et al. [11] pioneered the combination of feature weighting and the nearest mean (NM) classifier. One obvious limitation of this feature weighting algorithm is that it is only suitable for binary classification problems. Jenssen et al. [12] designed an NM classifier which is based on nonlinear transformations. Peltonen et al. [13] introduced a probabilistic model to generalize linear discriminant analysis (LDA) [14], [15] for selecting informative or relevant components. The realization of both [12], [13] needs to evaluate the class probability density function with Parzen window, and consequently, the number of training examples used for the evaluation should be large enough to assure a high evaluation accuracy. Therefore, the algorithms of [12], [13] may be unsuitable for applications when the small sample size (SSS) problem occurs. Yang et al. [16] proposed a distance metric learning algorithm that optimizes local compactness and separability by a probabilistic framework. The algorithm of [16] is not convex and often leads to local optimal solutions.

Our work in this paper is also motivated by the ‘large margin’ enlightenment. In particular, here we propose a large margin nearest local mean (abbreviated as ‘LMNLM’) classifier which introduces large margins between local parts of different target classes and expects to improve the class separability. To reach this goal, we replace the old Euclidean distance metric by a new Mahalanobis one through linear transformations. Fortunately, we can express our algorithm as a semidefinite programming (SDP) problem, which is one special kind of convex optimizing problems. Due to its convex nature, the proposed LMNLM algorithm can avoid the trouble of local minima since it is assured to converge globally.

Dimensionality reduction is also quite important in the machine learning community. With the help of dimensionality reduction, three advantages are expected to be obtained in pattern recognition tasks. Firstly, we can save the memory space and reduce the system complexity. Secondly, those useless and harmful noisy components can be removed and improved classification accuracies can be obtained. Thirdly, it makes us capable to deal with data set with few samples and high dimensions and thus weaken the disadvantages caused by the so-called curse-of-dimensionality problem. In this paper, we also consider the dimensionality reduction problem, and how the proposed algorithm leads to lower-dimensional solutions is also introduced.

The rest of this paper is organized as follows. In Section 2, we discuss how the LMNLM classifier is designed and how the dimensions are reduced in detail. Some related work and their relations to our work are discussed in Section 3. In Section 4, we conduct classification experiments, respectively, on synthetic data set, benchmark data sets, and radar high-resolution range profiles (HRRP) data set, and then an analysis of these results follows. Finally, we give concluding remarks and suggestions for future work in Section 5.

Section snippets

The LMNLM scheme

Suppose that we are given a set of data points ${x_{k} | x_{k} \in R^{D}}$ with $k \in {1, 2, \dots, n}$ , and that $y_{k} \in {1, 2, \dots, c}$ denotes the class label for x_k. ${μ_{1}, \dots, μ_{c}}$ are the corresponding mean vectors for samples in c classes. Given a testing point z, $d^{2} (z, μ) = ∥ z - μ ∥_{2}^{2}$ denotes the squared Euclidean distance between z and μ. Based on the above notations, the nearest mean (NM) classifier can be defined as follows: we decide that the testing point z belongs to class g, if for $\forall h \in {1, 2, \dots, c}$ , $d^{2} (z, μ_{g}) \leq d^{2} (z, μ_{h})$ is always

Related work

In this section, we firstly review four related algorithms, and then discuss the motivations of LMNLM from these algorithms, and finally, give the innovations of LMNLM.

Experiments

In this section, we, respectively, conduct experiments on synthetic data set, UCI benchmark data sets [21], and radar HRRP data set to compare our LMNLM scheme with several other algorithms: MG (multivariate Gaussian) classifier, LDA (linear discriminant analysis), NKLT (non-probabilistic Karhunen–Loeve transformation) classifier [22], LSVMs (linear SVMs, which utilize the linear kernel), GSVMs (Gaussian SVMs, which utilize the Gaussian kernel), LESS and LMNN. For LDA, we use an NM classifier

Conclusions and future work

Motivated by the large margin idea and the local learning property of some related algorithms, a new large margin nearest local mean classifier, which considers metric learning and classifier design jointly, is proposed in this paper. In LMNLM, we design a new ‘local mean vector’ classification model, and by introducing large margins into this new model, improved local class separabilities are expected to be obtained. In order to reach this goal, we transform the initial Euclidean space to the

Acknowledgments

The authors are grateful to the National Laboratory of Radar Signal Processing for providing the radar HRRP data. This work is partially supported by the Program for Cheung Kong Scholars and Innovative Research Team in University (PCSIRT, Grant no. IRT0645) and NSFC with Grant of 60772140.

References (27)

F. Gianfelici et al.
A non-probabilistic recognizer of stochastic signals based on KLT
Signal Process.
(2009)
B. Pei et al.
Bispectrum based approach to high radar range profile for automatic target recognition
Pattern Recognition
(2002)
L. Yang, An overview of distance metric learning, Technical Report, School of Computer Science, Carnegie Mellon...
N. Cristianini et al.
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
(2000)
C.J.C. Burges
A tutorial on support vector machines for pattern recognition
Data Mining and Knowledge Discovery
(1998)
R.O. Duda et al.
Pattern Classification
(2001)
E. Xing, A. Ng, M. Jordan, S. Russell, Distance metric learning, with application to clustering with side information,...
R. Gilad-Bachrach, A. Navot, N. Tishby, Margin based feature selection—theory and algorithms, in: Proceedings of the...
K.Q. Weinberger, J. Blitzer, L.K. Saul, Distance metric learning for large margin nearest neighbor classification, in:...
J. Goldberger, S. Roweis, G. Hinton, R. Salakhutdinov, Neighbourhood components analysis, in: Proceedings of the NIPS,...

A. Globerson, S. Roweis, Metric learning by collapsing classes, in: Proceedings of the NIPS,...

L. Torresani, K. Lee, Large margin component analysis, in: Proceedings of the NIPS,...

C.J. Veenman et al.

LESS: a model-based classifier for sparse subspaces

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

Cited by (20)

A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean
2020, Pattern Recognition Letters
Citation Excerpt :
The LM-KNN algorithm first finds the local mean vectors in each class in terms of all k nearest neighbors and then allocates the query sample to the class represented by the local mean vector that has the lowest Euclidean distance from the query sample [5,6]. The robustness and the simplicity of the LM-KNN algorithm has invited researchers to develop a variety of enhanced method variants (see examples in [2,6–10]) and to construct variant-based classification systems [11]. One propellant for the development of new KNN variants has been the observation that the original method has weaknesses.
We present a new generalized version of the fuzzy k-nearest neighbor (FKNN) classifier that uses local mean vectors and utilizes the Bonferroni mean. We call the proposed new method Bonferroni-mean based fuzzy k-nearest neighbor (BM-FKNN) classifier. The BM-FKNN classifier can be easily fitted for various contexts and applications, because the parametric Bonferroni mean allows for problem-based parameter value fitting. The BM-FKNN classifier can perform well also in situations where clear imbalances in class distributions of data are found. The performance of the proposed classifier is tested with six real-world data sets and with one artificial data set. The results are benchmarked with classification results obtained with the classical k-nearest neighbor-, the local mean-based k-nearest neighbor-, the fuzzy k-nearest neighbor- and other three selected classifiers. In addition to this, an enhancement of the local mean-based k-nearest neighbor classifier by using the Bonferroni means is also proposed and tested. The results show that the proposed new BM-FKNN classifier has the potential to outperform the benchmarks in classification accuracy and confirm the usefulness of using the Bonferroni mean in the learning part of classifiers.
Survey and experimental study on metric learning methods
2018, Neural Networks
Distance metric learning has been a hot research spot recently due to its high effectiveness and efficiency in improving the performance of distance related methods, such as $k$ nearest neighbors ( $k$ NN). Metric learning aims to learn a data-dependent metric to make intra-class distance smaller and inter-class larger. A large number of methods have been proposed for various applications and a survey to evaluate and compare these methods is imperative. The existing surveys just analyze the algorithms theoretically or compare them experimentally with a narrow time scope. Therefore, the paper reviews classical and influential methods that were proposed between 2003 and 2017 and presents a taxonomy based on the most distinct character of each method. All the methods are categorized into five classes, including pairwise cost, probabilistic framework, boost-like approaches, advantageous variants and specific applications. A comprehensive experimental study is made to compare all the selected methods, exploring the ability in improving accuracy, the relation between distance change and accuracy, the relation between accuracy and $k$ NN neighbor size.
LMAE: A large margin Auto-Encoders for classification
2017, Signal Processing
Citation Excerpt :
In this paper, we propose a Large Margin Auto-Encoders (LMAE) to further enforce the discriminability by enforcing different class samples to be large marginally distributed in hidden feature space. Particularly, we employ a large margin penalizing term that constrains the samples with different class labels to be distanced by an safety margin in the k-nearest neighborhood [29–32]. Fig. 1 shows the deep architecture by stacking multiple layers of LMAE for classification.
Auto-Encoders, as one representative deep learning method, has demonstrated to achieve superior performance in many applications. Hence, it is drawing more and more attentions and variants of Auto-Encoders have been reported including Contractive Auto-Encoders, Denoising Auto-Encoders, Sparse Auto-Encoders and Nonnegativity Constraints Auto-Encoders. Recently, a Discriminative Auto-Encoders is reported to improve the performance by considering the within class and between class information. In this paper, we propose the Large Margin Auto-Encoders (LMAE) to further boost the discriminability by enforcing different class samples to be large marginally distributed in hidden feature space. Particularly, we stack the single-layer LMAE to construct a deep neural network to learn proper features. And finally we put these features into a softmax classifier for classification. Extensive experiments are conducted on the MNIST dataset and the CIFAR-10 dataset for classification respectively. The experimental results demonstrate that the proposed LMAE outperforms the traditional Auto-Encoders algorithm.
Designing bag-level multiple-instance feature-weighting algorithms based on the large margin principle
2016, Information Sciences
Citation Excerpt :
In brief, compared with M3IFW, designing class-specific feature-weighting algorithms from the bag level is the major novelty of our work. Many machine learning researchers have shown interest in designing large margin-related algorithms, and these algorithms can be applied to different domains such as supervised learning [10,14,15,48], semi-supervised learning [1,6,37], and unsupervised learning [32,47,52]. Among these algorithms, perhaps SVM [15] is the most popular and is usually treated as state-of-the-art.
In multiple-instance learning (MIL), class labels are attached to bags instead of instances, and the goal is to predict the class labels of unseen bags. Existing MIL algorithms generally fall into two types: those designed at the bag level and those designed at the instance level. In this paper, we aim to employ bags directly as learning objects and design multiple-instance feature-weighting algorithms at the bag level. In particular, we initially provide a brief introduction of the bag-level large margin feature-weighting framework and then adopt the three bag-level distances minimal Hausdorff (minH), class-to-bag (C2B) and bag-to-bag (B2B) as examples to design the corresponding bag-level feature-weighting algorithms. Experiments conducted on synthetic and real-world datasets empirically demonstrate the effectiveness of our work in improving MIL performances.
Hierarchical learning of large-margin metrics for large-scale image classification
2016, Neurocomputing
Citation Excerpt :
In [22], Torresani et al. presented a novel algorithm that simultaneously optimizes the objectives of dimensionality reduction and metric learning based on LMNN. In [31], Chai et al. introduced the local mean vector and proposed a large-margin nearest local mean scheme to improve the separability between local parts of different categories. Lu et al. [20] proposed a neighborhood repulsed metric learning method for kinship verification, where under the learned metric the intra-class samples are pulled as close as possible and inter-class samples lying in a neighborhood are repulsed and pushed away as far as possible.
Large-scale image classification is a challenging task and has recently attracted active research interests. In this paper, a new algorithm is developed to achieve more effective implementation of large-scale image classification by hierarchical learning of large-margin metrics (HLMMs). A hierarchical visual tree is seamlessly integrated with metric learning to learn a set of node-specific/category-specific large-margin metrics. First, a hierarchical visual tree is learned to characterize the inter-category visual correlations effectively and organize large numbers of image categories in a coarse-to-fine fashion. Second, a new algorithm is developed to support hierarchical learning of large-margin metrics by training nearest class mean (NCM) classifiers over our hierarchical visual tree. In addition, we also consider dimensionality reduction as a regularizer for high-dimensional data in our large-margin metric learning. Two top-down approaches are developed for supporting hierarchical learning of large-margin metrics. We focus on learning more discriminative metrics for NCM node classifiers to identify the visually similar sub-nodes (visually similar image categories) under the same parent node over our hierarchical visual tree. A mini-batch stochastic gradient descend method is used to optimize our HLMMs learning algorithm. The experimental results on ImageNet Large Scale Visual Recognition Challenge 2010 dataset (ILSVRC2010) have demonstrated that our HLMMs learning algorithm is very promising for supporting large-scale image classification.
Improved pseudo nearest neighbor classification
2014, Knowledge-Based Systems
Citation Excerpt :
In the LMKNN rule, the local mean vector of k nearest neighbors from each class is employed to classify the query pattern. Since the first introduction of the LMKNN rule, its basic idea has already been successfully borrowed to many new methods [35–41]. Among these algorithms, pseudo nearest neighbor (PNN) rule [35] is another promising KNN-based classifier, which is based on the distance weighted k-nearest neighbor (WKNN) rule [7] and the LMKNN rule.
k-Nearest neighbor (KNN) rule is a very simple and powerful classification algorithm. In this article, we propose a new KNN-based classifier, called the local mean-based pseudo nearest neighbor (LMPNN) rule. It is motivated by the local mean-based k-nearest neighbor (LMKNN) rule and the pseudo nearest neighbor (PNN) rule, with the aim of improving the classification performance. In the proposed LMPNN, the k nearest neighbors from each class are searched as the class prototypes, and then the local mean vectors of the neighbors are yielded. Subsequently, we attempt to find the local mean-based pseudo nearest neighbor per class by employing the categorical k local mean vectors, and classify the unknown query patten according to the distances between the query and the pseudo nearest neighbors. To assess the classification performance of the proposed LMPNN, it is compared with the competing classifiers, such as LMKNN and PNN, in terms of the classification error on thirty-two real UCI data sets, four artificial data sets and three image data sets. The comprehensively experimental results suggest that the proposed LMPNN classifier is a promising algorithm in pattern recognition.

View all citing articles on Scopus

View full text

Large margin nearest local mean classifier

Abstract

Introduction

Section snippets

The LMNLM scheme

Related work

Experiments

Conclusions and future work

Acknowledgments

Signal Process.

Pattern Recognition

An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods

A tutorial on support vector machines for pattern recognition

Data Mining and Knowledge Discovery

Pattern Classification

LESS: a model-based classifier for sparse subspaces

IEEE Trans. Pattern Anal. Mach. Intell.