A dynamic framework based on local Zernike moment and motion history image for facial expression recognition
Introduction
In recent years facial expression recognition has become a popular research topic [1], [2], [3]. With the recent advances in robotics, and as robots interact more and more with human and become a part of human living and work space, there is an increasing requirement that robots are able to understand human emotions via a facial expression recognition system [4]. Facial expression recognition system also plays a significant role in Human-Computer Interaction (HCI) [5], which has helped to create meaningful and responsive HCI interfaces. It has also been widely used in behavioural study, video games, animations, safety mechanism in auto-mobile, etc. [6].
Discriminative and robust features that represent facial expressions are important for effective recognition of facial expressions, and how to obtain them is still a challenging problem. Recent methods that address this problem can be categorised into global-based methods and local-based methods. It has been shown that local-based methods (e.g., based on Gabor wavelets using grid points) achieve better performance than the global-based ones (e.g., based on eigenfaces, Fisher's discriminant analysis, etc.) [7]. Gabor wavelet results in good performance due to its locality and orientation selectivity. However, its computational complexity requiring high computational time makes it unsuitable for real-time applications. Local Binary Pattern (LBP) descriptor which is based on the histogram of local patterns also achieves a promising performance [8].
Shape as a geometric-based representation is crucial for interpreting facial expressions. However, current state-of-the-art methods only focus on a small subset of possible shape representation, e.g., point-based methods that represent a face using the locations of several discrete points. Noting that image moments can describe simple properties of a shape, e.g., its area (or total intensity), its centre and its orientation, Zernike moments (ZMs) have been used to represent a face and facial expressions in [9], [10]. Zernike moments are rotation invariant features, which can be used to address in-plane head pose variation. In the field of facial expression recognition, rotation invariant LBP and uniform LBP [11] have also been used to overcome the rotation problem. In [12], Quantised Local Zernike Moment (QLZM) is used to describe the neighbourhood of a face sub-region. The Local Zernike moments have more discriminant power than other image features, e.g., local phase-magnitude histogram(H-LZM), cascaded LZM transformation () and local binary pattern (LBP) [13].
Since a facial expression involves a dynamic process, and the dynamics contain information that represents a facial expression more effectively, it is important to capture such dynamic information so as to recognise facial expressions over the entire video sequence. Recently, there has been more effort on modelling the dynamics of a facial expression sequence. However, the modelling is still a challenging problem. Thus, in this paper, we focus on analysing the dynamics of facial expression sequences. First, we extend the spatial domain QLZM descriptor into spatio-temporal domain, i.e., Motion Change Frequency based QLZM (QLZM_MCF), which enables the representation of temporal variation of expressions. Second, we apply optical flow to Motion History Image (MHI) [14], i.e., (optical flow based MHI) MHI_OF, to represent spatial-temporal dynamic information (i.e., velocity).
We utilise two types of features: a spatio-temporal shape representation, QLZM_MCF, to enhance the local spatial and dynamic information, and a dynamic appearance representation, MHI_OF. We also introduce an entropy-based method to provide spatial relationship of different parts of a face by computing the entropy value of different sub-regions of a face. The main contributions of this paper are: (a) QLZM_MCF; (b) MHI_OF; (c) an entropy-based method for MHI_OF to capture the motion information; and (d) a strategy integrating QLZM_MCF and entropy to enhance spatial information.
The rest of the paper is organised as follows. Previous related work is presented in Section 2. Section 3 presents QLZM_MCF, the method using MHI_OF and entropy, and the intergration of the two dynamic features. The framework and the experimental results are respectively presented in 4 Facial expression recognition framework, 5 Experiments. Finally, Section 6 concludes the paper.
Section snippets
Related work
The two main focuses in the current research on facial expression are basic emotion recognition and recognition based on facial action coding system (FACS) action units (AUs). The most widely used facial expression descriptors for recognition and analysis are the six prototypical expressions of Anger, Disgust, Fear, Happiness, Sadness and Surprise [15]. The most widely used facial muscle action descriptors are AUs [1]. With regard to basic emotion recognition, geometric-based features and
Motion history image
MHI can be considered as a two-component temporal template, a vector-valued image where each component of each pixel is some function of the motion at that pixel location. The MHI is computed from an update function , i.e.,where is the spatial coordinates (x,y) of an image pixel at time t (in terms of image frame number), the duration τ determines the temporal extent of the movement in terms of frames, and δ is the
Facial expression recognition framework
Fig. 5 outlines the proposed framework which comprises pre-processing, feature extraction and classification. The pre-processing includes facial landmark detection and face alignment, where face alignment is applied to reduce the effects of variation in head pose and scene illumination. We use the local evidence aggregated regression [38] to detect facial landmarks over each frame, where the locations of detected eyes and nose are used for face alignment including scaling and cropping. The
Facial expression datasets
We use the Extended CK dataset (CK+) as it is widely used for evaluating the performance of facial expression recognition methods and thus facilitates comparison of performances. The dataset includes 327 image sequences of six basic expressions (namely Anger, Disgust, Fear, Happiness, Sadness and Surprise) and a non-basic emotion expression (namely Contempt), performed by 118 subjects. Each image sequence from this dataset has various number of frames and starts with the neutral state and ends
Conclusion
This paper presents a facial expression recognition framework using enMHI_OF and QLZM_MCF. The framework which comprises pre-processing, feature extraction followed by 2D PCA and SVM classification achieves better performance than most of the state-of-art methods on CK+ dataset and MMI dataset. Our main contributions are three folds. First, we proposed a spatio-temporal feature based on QLZM. Second, we applied optical flow in MHI to obtain MHI_OF feature which incorporates velocity
Acknowledgements
The authors would like to thank China Scholarship Council / Warwick Joint Scholarship (Grant no. 201206710046) for providing the funds for this research.
Xijian Fan received B.Sc. in Information and Communication Technology from Nanjing University of Posts and Telecommunications, China, and M.Sc. in Computer Information and Science from Hohai University, China, in 2008 and 2012, respectively. He is currently pursuing PhD in Engineering at the University of Warwick, U.K. His research interests include image processing and facial expression recognition.
References (44)
- et al.
Face recognition: component-based versus global approaches
Comput. Vis. Image Underst.
(2003) - et al.
Local binary patterns for multi-view facial expression recognition
Comput. Vis. Image Underst.
(2011) - et al.
Fusion of local normalization and gabor entropy weighted features for face identification
Pattern Recogn.
(2014) - et al.
A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences
Pattern Recognit.
(2015) - et al.
Facial expression recognition in dynamic sequencesan integrated approach
Pattern Recognit.
(2014) - et al.
Automatic analysis of facial expressions: the state of art
IEEE Trans. Pattern Anal. Mach. Intell.
(2000) - M. Pantic, L. Rothkrantz, Toward an affect-sensitive multimodal human-computer interaction, in: Proceedings of the...
- Y. Tian, T. Kanade, J. Cohn, Handbook of face recognition, Springer, 2005 (Chapter 11. Facial expression...
- T. Tojo, Y. Matsusaka, T. Ishii, T. Kobayashi, A conversational robot utilizing facial and body expressions, in:...
- et al.
Emotion recognition in human-computer interaction
IEEE Signal Process. Mag.
(2001)
Automatic facial expression analysis: a survey
Pattern Recognit.
Face recognition with Zernike moments
Syst. Comput. Jpn.
Face recognition using Zernike and complex Zernike moment features
Pattern Recognit. Image Anal.
The recognition of human movement using temporal templates
IEEE Trans., Pattern Anal. Mach. Intell.
Constants across cultures in the face and emotion
J. Pers. Soc. Psychol.
Facial action recognition for facial expression analysis from static face images
IEEE Trans. Syst., Man Cybern.
Dynamic of facial expressions - recognition of facial actions and their temporal segments from face profile image sequences
IEEE Trans. Syst., Man Cybern.
Cited by (65)
Spatial relationship recognition via heterogeneous representation: A review
2023, NeurocomputingQuaternion fractional-order weighted generalized Laguerre–Fourier moments and moment invariants for color image analysis
2023, Signal Processing: Image CommunicationMutual information regularized identity-aware facial expression recognition in compressed video
2021, Pattern RecognitionCitation Excerpt :Besides, the audio modality in AFEW can be used to boost the recognition performance [61–63]. We note that we only focus on the image compression in this paper, and the audio/video data are stored in separate tracks, but the additional modality can also potentially to be added on our framework following the multi-modal methods [61]. With the simplified mode variational LSTM-based [44] backbone, the exploration in the compressed domain can achieve comparable or even better recognition performance.
Spatial deep feature augmentation technique for FER using genetic algorithm
2024, Neural Computing and ApplicationsConvolutional Features-Based Broad Learning With LSTM for Multidimensional Facial Emotion Recognition in Human-Robot Interaction
2024, IEEE Transactions on Systems, Man, and Cybernetics: Systems
Xijian Fan received B.Sc. in Information and Communication Technology from Nanjing University of Posts and Telecommunications, China, and M.Sc. in Computer Information and Science from Hohai University, China, in 2008 and 2012, respectively. He is currently pursuing PhD in Engineering at the University of Warwick, U.K. His research interests include image processing and facial expression recognition.
Tardi Tjahjadi received B.Sc. in Mechanical Engineering from University College London in 1980, and M.Sc. in Management Sciences in 1981 and Ph.D. in Total Technology in 1984 from UMIST, U.K. He has been an associate professor at Warwick University since 2000 and a reader since 2014. His research interests include image processing and computer vision.