Elsevier

Signal Processing

Volume 90, Issue 1, January 2010, Pages 236-248
Signal Processing

Large margin nearest local mean classifier

https://doi.org/10.1016/j.sigpro.2009.06.015Get rights and content

Abstract

Distance metric learning and classifier design are two highly challenging tasks in the machine learning community. In this paper we propose a new large margin nearest local mean (LMNLM) scheme to consider them jointly, which aims at improving the separability between local parts of different classes. We adopt ‘local mean vector’ as the basic classification model, and then through linear transformation, large margins between heterogeneous local parts are introduced. Moreover, by eigenvalue decomposition, we may also reduce data's dimensions. LMNLM can be formulated as a semidefinite programming (SDP) problem, so it is assured to converge globally. Experimental results show that LMNLM is a promising algorithm due to its leading to high classification accuracies and low dimensions.

Introduction

Distance metric learning is of fundamental importance to many machine learning tasks, such as supervised classification, unsupervised clustering, semi-supervised learning, image retrieval, etc. [1]. A good distance metric may provide us with some enlightenment on underlying data structures, and thus an improved performance is expected to be obtained with the learned metric. In this paper we concentrate on supervised classification problems, so if a good metric can be learned effectively, higher classification accuracies will be achieved.

Another important aspect in machine learning is how to design classifiers. Similar to distance metric learning, a good classifier should also be data-dependent, i.e., some implicit data structures should be revealed by the designed classifier. Due to the common requirement of metric learning and classifier design, the two seemingly different problems could be considered jointly. Moreover, although there are some well-established classification approaches such as support vector machines (SVMs) [2], [3] and neural networks [4], most of these approaches do not consider the problem of learning a proper distance metric that is suitable for the corresponding classification model. Motivated by the above analysis, in this paper, we combine distance metric learning and classifier design as a whole, and this combination is one of central guidelines for our work.

In view of minimizing the structural risk, a special kind of ‘large margin’ classifier, support vector machines [2], [3], achieved general success in the last decade. Recently, some metric learning and/or classifier design works have adopted ideas similar to ‘large margin’ of SVMs. Xing et al. [5] applied the side information for Mahalanobis distance metric learning and aimed at minimizing the sum of squared distances between similarly labeled pairs and simultaneously maintaining a lower bound on the sum of squared distances between dissimilarly labeled ones. Although can be solved with global optimal solutions, the algorithm of [5] does not involve any slack variables and may incur overfittings. Bachrach et al. [6] introduced the ‘margin’ into feature selection tasks. The algorithm of [6] cannot be described as a convex optimizing problem and usually converges with local optimal solutions. The work in [7], [8], [9], [10] concentrated on different ‘marginal’ metric learning algorithms for k-NN classifiers. They employ the k-NN classifier as classification tools, which utilize information from k nearest neighbors and may neglect the discriminative information derived from other useful inputs. Veenman et al. [11] pioneered the combination of feature weighting and the nearest mean (NM) classifier. One obvious limitation of this feature weighting algorithm is that it is only suitable for binary classification problems. Jenssen et al. [12] designed an NM classifier which is based on nonlinear transformations. Peltonen et al. [13] introduced a probabilistic model to generalize linear discriminant analysis (LDA) [14], [15] for selecting informative or relevant components. The realization of both [12], [13] needs to evaluate the class probability density function with Parzen window, and consequently, the number of training examples used for the evaluation should be large enough to assure a high evaluation accuracy. Therefore, the algorithms of [12], [13] may be unsuitable for applications when the small sample size (SSS) problem occurs. Yang et al. [16] proposed a distance metric learning algorithm that optimizes local compactness and separability by a probabilistic framework. The algorithm of [16] is not convex and often leads to local optimal solutions.

Our work in this paper is also motivated by the ‘large margin’ enlightenment. In particular, here we propose a large margin nearest local mean (abbreviated as ‘LMNLM’) classifier which introduces large margins between local parts of different target classes and expects to improve the class separability. To reach this goal, we replace the old Euclidean distance metric by a new Mahalanobis one through linear transformations. Fortunately, we can express our algorithm as a semidefinite programming (SDP) problem, which is one special kind of convex optimizing problems. Due to its convex nature, the proposed LMNLM algorithm can avoid the trouble of local minima since it is assured to converge globally.

Dimensionality reduction is also quite important in the machine learning community. With the help of dimensionality reduction, three advantages are expected to be obtained in pattern recognition tasks. Firstly, we can save the memory space and reduce the system complexity. Secondly, those useless and harmful noisy components can be removed and improved classification accuracies can be obtained. Thirdly, it makes us capable to deal with data set with few samples and high dimensions and thus weaken the disadvantages caused by the so-called curse-of-dimensionality problem. In this paper, we also consider the dimensionality reduction problem, and how the proposed algorithm leads to lower-dimensional solutions is also introduced.

The rest of this paper is organized as follows. In Section 2, we discuss how the LMNLM classifier is designed and how the dimensions are reduced in detail. Some related work and their relations to our work are discussed in Section 3. In Section 4, we conduct classification experiments, respectively, on synthetic data set, benchmark data sets, and radar high-resolution range profiles (HRRP) data set, and then an analysis of these results follows. Finally, we give concluding remarks and suggestions for future work in Section 5.

Section snippets

The LMNLM scheme

Suppose that we are given a set of data points {xk|xkRD} with k{1,2,,n}, and that yk{1,2,,c} denotes the class label for xk. {μ1,,μc} are the corresponding mean vectors for samples in c classes. Given a testing point z, d2(z,μ)=z-μ22 denotes the squared Euclidean distance between z and μ. Based on the above notations, the nearest mean (NM) classifier can be defined as follows: we decide that the testing point z belongs to class g, if for h{1,2,,c}, d2(z,μg)d2(z,μh) is always

Related work

In this section, we firstly review four related algorithms, and then discuss the motivations of LMNLM from these algorithms, and finally, give the innovations of LMNLM.

Experiments

In this section, we, respectively, conduct experiments on synthetic data set, UCI benchmark data sets [21], and radar HRRP data set to compare our LMNLM scheme with several other algorithms: MG (multivariate Gaussian) classifier, LDA (linear discriminant analysis), NKLT (non-probabilistic Karhunen–Loeve transformation) classifier [22], LSVMs (linear SVMs, which utilize the linear kernel), GSVMs (Gaussian SVMs, which utilize the Gaussian kernel), LESS and LMNN. For LDA, we use an NM classifier

Conclusions and future work

Motivated by the large margin idea and the local learning property of some related algorithms, a new large margin nearest local mean classifier, which considers metric learning and classifier design jointly, is proposed in this paper. In LMNLM, we design a new ‘local mean vector’ classification model, and by introducing large margins into this new model, improved local class separabilities are expected to be obtained. In order to reach this goal, we transform the initial Euclidean space to the

Acknowledgments

The authors are grateful to the National Laboratory of Radar Signal Processing for providing the radar HRRP data. This work is partially supported by the Program for Cheung Kong Scholars and Innovative Research Team in University (PCSIRT, Grant no. IRT0645) and NSFC with Grant of 60772140.

References (27)

  • F. Gianfelici et al.

    A non-probabilistic recognizer of stochastic signals based on KLT

    Signal Process.

    (2009)
  • B. Pei et al.

    Bispectrum based approach to high radar range profile for automatic target recognition

    Pattern Recognition

    (2002)
  • L. Yang, An overview of distance metric learning, Technical Report, School of Computer Science, Carnegie Mellon...
  • N. Cristianini et al.

    An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods

    (2000)
  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • E. Xing, A. Ng, M. Jordan, S. Russell, Distance metric learning, with application to clustering with side information,...
  • R. Gilad-Bachrach, A. Navot, N. Tishby, Margin based feature selection—theory and algorithms, in: Proceedings of the...
  • K.Q. Weinberger, J. Blitzer, L.K. Saul, Distance metric learning for large margin nearest neighbor classification, in:...
  • J. Goldberger, S. Roweis, G. Hinton, R. Salakhutdinov, Neighbourhood components analysis, in: Proceedings of the NIPS,...
  • A. Globerson, S. Roweis, Metric learning by collapsing classes, in: Proceedings of the NIPS,...
  • L. Torresani, K. Lee, Large margin component analysis, in: Proceedings of the NIPS,...
  • C.J. Veenman et al.

    LESS: a model-based classifier for sparse subspaces

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • Cited by (20)

    • A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean

      2020, Pattern Recognition Letters
      Citation Excerpt :

      The LM-KNN algorithm first finds the local mean vectors in each class in terms of all k nearest neighbors and then allocates the query sample to the class represented by the local mean vector that has the lowest Euclidean distance from the query sample [5,6]. The robustness and the simplicity of the LM-KNN algorithm has invited researchers to develop a variety of enhanced method variants (see examples in [2,6–10]) and to construct variant-based classification systems [11]. One propellant for the development of new KNN variants has been the observation that the original method has weaknesses.

    • LMAE: A large margin Auto-Encoders for classification

      2017, Signal Processing
      Citation Excerpt :

      In this paper, we propose a Large Margin Auto-Encoders (LMAE) to further enforce the discriminability by enforcing different class samples to be large marginally distributed in hidden feature space. Particularly, we employ a large margin penalizing term that constrains the samples with different class labels to be distanced by an safety margin in the k-nearest neighborhood [29–32]. Fig. 1 shows the deep architecture by stacking multiple layers of LMAE for classification.

    • Designing bag-level multiple-instance feature-weighting algorithms based on the large margin principle

      2016, Information Sciences
      Citation Excerpt :

      In brief, compared with M3IFW, designing class-specific feature-weighting algorithms from the bag level is the major novelty of our work. Many machine learning researchers have shown interest in designing large margin-related algorithms, and these algorithms can be applied to different domains such as supervised learning [10,14,15,48], semi-supervised learning [1,6,37], and unsupervised learning [32,47,52]. Among these algorithms, perhaps SVM [15] is the most popular and is usually treated as state-of-the-art.

    • Hierarchical learning of large-margin metrics for large-scale image classification

      2016, Neurocomputing
      Citation Excerpt :

      In [22], Torresani et al. presented a novel algorithm that simultaneously optimizes the objectives of dimensionality reduction and metric learning based on LMNN. In [31], Chai et al. introduced the local mean vector and proposed a large-margin nearest local mean scheme to improve the separability between local parts of different categories. Lu et al. [20] proposed a neighborhood repulsed metric learning method for kinship verification, where under the learned metric the intra-class samples are pulled as close as possible and inter-class samples lying in a neighborhood are repulsed and pushed away as far as possible.

    • Improved pseudo nearest neighbor classification

      2014, Knowledge-Based Systems
      Citation Excerpt :

      In the LMKNN rule, the local mean vector of k nearest neighbors from each class is employed to classify the query pattern. Since the first introduction of the LMKNN rule, its basic idea has already been successfully borrowed to many new methods [35–41]. Among these algorithms, pseudo nearest neighbor (PNN) rule [35] is another promising KNN-based classifier, which is based on the distance weighted k-nearest neighbor (WKNN) rule [7] and the LMKNN rule.

    View all citing articles on Scopus
    View full text