FERS - imitate the human visual system

CHAPTER TWO
LITERATURE SURVEY
2.1 Literature Review
In the literature, there are many methods for the holistic class, such as, Eigenfaces and Fisherfaces, which are built on Principal Component Analysis (PCA); the more recent 2D PCA, and Linear Discriminant Analysis are also examples of holistic methods. Although these methods have been studied widely, local descriptors have gained attention because of their robustness to illumination and pose variations. Heiseleet al.showed the validity of the component-based methods, and how they outperform holistic methods. The local-feature methods compute the descriptor from parts of the face, and then gather the information into one descriptor. Among these methods are Local Features Analysis, Gabor features, Elastic Bunch Graph Matching, and Local Binary Pattern (LBP). The last one is an extension of the LBP feature that was originally designed for texture description, applied to face recognition. LBP achieved better performance than previous methods, thus it gained popularity, and was studied extensively. Newer methods tried to overcome the shortcomings of LBP, like Local Ternary Pattern (LTP), and Local Directional Pattern (LDiP). The last method encodes the directional information in the neighborhood, instead of the intensity. Also, Zhanget al [11],[12]. explored the use of higher order local derivatives (LDeP) to produce better results than LBP. Both methods use other information, instead of intensity, to overcome noise and illumination variation problems. However, these methods still suffer in non-monotonic illumination variation, random noise, and changes in pose, age, and expression conditions. Although some methods, like Gradientfaces [13], have a high discrimination power under illumination variation, they still have low recognition capabilities for expression and age variation conditions. However, some methods explored different features, such as, infrared [14], near infrared [15], and phase information, to overcome the illumination problem while maintaining the performance under difficult conditions.
2.2 System requirements
The goal of FERS is to imitate the human visual system in the most similar way. This is very challenging task in the area of computer vision because not only it requires efficient image/video analysis techniques but also well-suited feature vector used in machine learning process. The first principle of FER system is that it should be effortless and efficient. That is connected with full automation, so that no additional manual effort is required. It is also preferred for such system to be real-time which is especially important in both: human-computer interaction and human-robot interaction applications.
Furthermore, the subject of study should be allowed to act spontaneously while data is being captured for analysis. System should be designed to avoid limitations on body and head movements which could also be an important source of information about displayed emotion. The constraints about facial hair, glasses or additional make-up should be reduced to minimum. Moreover, handling the occlusions problem seems to be a challenge for a system and it should be also taken into account
Other important features that are desired in FER system are user and environment independence. The former means that, any user should be allowed to work with the system, despite of skin color, age, gender or nation.
The latter is connected with handling the complex background and variety in lightning conditions. Additional benefit could be the view independence in FERS, which is possible in systems based on 3D vision.
2.3 Face detection:
As it was mentioned before, FER system consists of 3 stages. In the first stage, system takes input image and performs some image processing techniques on it in order to find the face region. System can operate on static images, where this procedure is called face localization or videos where we are dealing with face tracking
Major problems which can be encountered at this stage are different scales and orientations of face. They are usually caused by subject movements or changes in distance from camera. Significant body movements can also cause drastic changes in position of face in consecutive frames what makes tracking harder. What is more, complexity of background and variety of lightning conditions can be also quite confusing in tracking. For instance, when there is more than one face in the image, system should be able to distinguish which one is being tracked. Last but not least, occlusions which usually appear in spontaneous reactions need to be handled as well.
Problems mentioned above were a challenge to search for techniques which would solve them. Among the techniques for face detection, we can distinguish two groups: holistic where face is treated as a whole unit and analytic where co-occurrence of characteristic facial elements is studied.
2.3.1 Holistic face models:
â¢ Huang and Huang [7] used Point Distribution Model (PDM) which represents mean geometry of human face. Firstly, Canny edge detector is applied to find two symmetrical vertical edges which estimate the face position and then PDM is fitted.
â¢ Pantic and Rothkrantz [8] proposed system which process images of frontal and profile face view. Vertical and horizontal histogram analysis is used to find face boundaries. Then, face contour is obtained by thresholding the image with HSV color space values.
2.3.2 Analytic face models:
â¢ Kobayashi and Hara [9] used image captured in monochrome mode to find face brightness distribution. Position of face is estimated by iris localization.
â¢ Kimura and Yachida [10] technique processes input image with an integral projection algorithm to find position of eye and mouth corners by color and edge information. Face is represented with Potential Net model which is fitted by the position of eyes and mouth.
All of the above mentioned systems were designed to process facial images, however, they are not able to detect whether the face is present in the image. Systems which handle arbitrary images are listed below:
â¢ Essa and Pentland [11] created the âface spaceâ by performing Principal Component Analysis of eigenfaces from 128 face images. Face is detected in the image if its distance from the face space is acceptable.
â¢ Rowley et al. [12] proposed neural network based face detection. Input image is scanned with a window and neural network decides if particular window contains a face or not.
â¢ Viola and Jones [13] introduced very efficient algorithm for object detection with use of
Haar-like features as object representation and Adaboost as machine learning method. This algorithm is widely used in face detection.
2.4 Feature extraction
After the face has been located in the image or video frame, it can be analyzed in terms of facial action occurrence. There are two types of features that are usually used to describe facial expression: geometric features and appearance features. Geometric features measure the displacements of certain parts of the face such as brows or mouth corners, while appearance features describe the change in face texture when particular action is performed. Apart from feature type, FER systems can be divided by the input which could be static images or image sequences.
The task of geometric feature measurement is usually connected with face region analysis, especially finding and tracking crucial points in the face region. Possible problems that arise in face decomposition task could be occlusions and occurrences of facial hair or glasses. Furthermore, defining the feature set is difficult, because features should be descriptive and possibly not correlated.
2.4.1 Feature extraction methods:
â¢ Pantic and Rothkrantz [8] selected a set of facial points from frontal and profile face images.
The expression is measured by a distance between position of those points in the initial image (neutral face) and peak image (affected face).
â¢ Essa and Pentland [11] proposed temporal approach to the problem of facial expression analysis. They used the multiscale coarse-to-fine Kalman filtering. The facial motion is represented by spatio-temporal energy templates.
â¢ Black and Yacoob [14] introduced local parametric models of image motion based on optical flow information. Models could describe horizontal and vertical translation, divergence and curl.
â¢ Edwards et al. [15] used Active Appearance Model which is statistical model of shape and gray scale information. Relationships between AAM displacement and the image difference is analyzed for expression detection. Proposed system operates on static images.
â¢ Cohn et al.[16] developed geometric feature based system in which the optical flow algorithm is performed only in 13×13 pixel regions surrounding facial landmarks.
â¢ Zeng et al. [17] used data extracted by the 3D face tracker called Piecewise Bezier Volume Deformation Tracker [33]. The system was designed to recognize spontaneous emotions so three-dimensional tracking was beneficial.
â¢ Littlewort et al. [18] proposed system which uses only appearance features to describe facial expressions. Facial texture is measured by Gabor waveletes.
â¢ Shan et al. [19] investigated the Local Binary Pattern method for texture encoding in facial expression description. Two methods of feature extraction were proposed. In the first one, features are extracted from fixed set of patches and in the second method from most probable patches found by boosting.
2.5 Expression Recognition:
The last part of the FER system is based on machine learning theory, precisely it is the classification task. The input to the classifier is a set of features which were retrieved from face region in the previous stage. The set of features is formed to describe the facial expression.
Classification requires supervised training, so the training set should consist of labeled data.
Once the classifier is trained, it can recognize input images by assigning them a particular class label. The most commonly used facial expressions classification is done both in terms of Action
Units, proposed in Facial Action Coding System and in terms of universal emotions: joy, sadness, anger, surprise, disgust and fear. There are a lot of different machine learning techniques for classification task, namely: K-Nearest Neighbors, Artificial Neural Networks, Support Vector Machines, Hidden Markov Models, Expert Systems with rule based classifier, Bayesian Networks or Boosting Techniques (Adaboost, Gentleboost).
Three principal issues in classification task are: choosing good feature set, efficient machine learning technique and diverse database for training. Feature set should be composed of features that are discriminative and characteristic for particular expression. Machine learning technique is chosen usually by the sort of a feature set. Finally, database used as a training set should be big enough and contain various data. Approaches described in the literature are presented by categories of classification output.
2.5.1 Action Units classification:
â¢ Pantic and Rothkrantz [8] introduced the expert system with rule based classifier, which can recognize 31 action units with accuracy rate of 89%.
â¢ Cohn et al. [16] performed recognition with use of discriminant functions. Proposed method can distinguish 8 AUs and 7 AUs combinations. Tests were performed on 504 image sequences of 100 subjects and the system obtained accuracy rate of 88%.
2.5.2 Emotions classification:
â¢ Huang and Huang [7] detected motion by analysis of difference image between neutral and expression image. The minimum distance classifier is used for recognition of six basic emotions. Recognition result is 84.5%
â¢ Kobayasi and Hara [9] used 234x50x6 neural back propagation network for recognition of 6 basic emotions. The achieved recognition accuracy is 85%.
â¢ Zeng et al.[17] used Support Vector Data Description (SVDD) with Kernel Whitenning to avoid influence of nonhomogeneous data distributions in input space. The accuracy of a system is approximately 83%.
â¢ Littlewort et al. [18] introduced method called AdaSVM where facial expression is represented by Gabor wavelet coefficients. Firstly, the Adaboost method is applied and the most probable features are chosen by the highest value of frequencies. Then, reduced expression representation is the input to SVM classifier. System obtains 97% accuracy of generalization to novel subjects.
â¢ Pantic and Rothkranz [8] in their Expert System implemented also the rule based classification of emotions with use of previously recognized action units. For example, happiness is a combination of AU6, AU12, AU16, AU25. Blended emotions are allowed. The result can be: 75% of happiness if only AU6,AU12, AU16 occurred. Accuracy achieved by a system is 91%.
2.6 Recent advances
Apart from principal methods used in FER systems there were some advances made in the field of facial expression analysis recently. Facial expressions are recognized at higher semantic level. Expressions could be classified into categories such as confusion, boredom, agreement, frustration, pain etc. The example of such approach could be fatigue detection proposed by Ji et al. [20] or pain detection proposed by Littlewort et al. [21]. Additionally, more pressure is put on recognition of spontaneous emotions. Some systems are designed to divide emotions into posed or spontaneous categories to recognize if emotion was genuine or fake. Such functionality was proposed by Valstar et al. [22] for genuine smile detection. What is more, head motions or body gestures are also studied in order to describe human affective states, especially with use of threedimensional tracking. For instance, Gunes et al. [23] examined the significance of body movements in affective states analysis. Some efforts were also done in context-dependent interpretation of facial expressions, among the others by Fasel et al. [24]. Another improvement in the area of feature extraction could be found in the work by Valstar et al. [25] in which expression is described by temporal dynamics parameters such as speed, intensity, duration and co – occurrence of facial muscle activations.
2.7 Applications
Huge amount of different information is encoded in facial movements. Observing someone’s face we can learn about his/her:
â¢affective state, connected with emotions like fear, anger and joy and moods such as euphoria or irritation
â¢cognitive activity (brain activity), which can be perceived as concentration or boredom
â¢personality features like sociability, shyness or hostility
â¢truthfulness using analysis of micro-expressions to reveal concealed emotions
â¢psychological state giving information about some disorders helpful with diagnosis of depression, mania or schizophrenia.
Due to the variety of information visible on human face, facial expression analysis has applications in different fields of science and life.
Firstly, teachers use facial expression analysis to adjust the difficulty of the exercise and learning pace on a base of feedback visible on studentâs faces. Virtual tutor in e-learning proposed by Amelsvoort and Krahmer [26] provides student with suitable content and adjusts the complexity of courses or tasks by the information obtained from student’s face.
Another application of FERS is in the field of business where the measurement of people’s satisfaction or dissatisfaction is very important. Usage of this application can be found in many marketing techniques where information is gathered from customers by surveys. The great opportunity to conduct the surveys in the automatic way could be able by using customers’ facial expressions as a level of their satisfaction or dissatisfaction [3]. Moreover, prototype of
Computerized Sales Assistant, proposed by Shergill et al. [27] selects the suitable marketing and sales methods by the response deducted from customers’ facial expressions.
Facial behavior is also studied in medicine not only for psychological disorder diagnosis but also to help people with some disabilities. Example of it could be the system proposed by Pioggial et al. [28], that helps autistic children to improve their social skills by learning how to recognize emotions. Facial expressions could be also used for surveillance purposes like in prototype developed by Hazelhoff et al. [29]. Suggested system automatically detects discomfort of newborn babies by recognition of 3 behavioral states: sleep, awake and cry.
Additionally, facial expression recognition is widely used in human robot and human computer interaction. Kazi et al. [30] proposed Intelligent Robotic Assistant for people with disabilities based on multimodal HCI. Another example of human computer interaction systems could be system developed by Zhan et al. [31] for automatic update of avatar in multiplayer online games.
TABLE 2.1: LITERATURE REVIEW FOR FACE AND EXPRESSION RECOGNITION
Sr. No Methods / Database Result/Conclusion Limitation Future Work
1` Gabor Filter +SVM [6] Gabor Filter
outperformed then other existing techniques removes variability in lighting and other noise Selecting of best Gabor Features will reduce the space complexity of the system.
2 Gabor Wavelet + PCA + Multi class SVM
FEEDTUM database [2] Average performance
rate: 81.7% Misclassification between Sad and Neutral expression Implement in real time FER and testing different degrees.
3 PCA ATT, CSU and MPI facial expression database [3] ATT database: 85.5%
CSU database: 81.3% Classification are matters for
recognition rate
4 Multiple Edge detection on
Gabor features+ PCA + SVM
FEEDTUM database [4]
91.7% for 40 feature
vectors work with frontal images Further improvement of the robustness of the method and development of the real time facial
expression system
5 Gabor Wavelet + PCA + LBP JAFFE database [7] 90% average recognition
rate LBP operator is small which cannot capture dominant features
6 PCA + FLDA (Fisher LDA) JAFFE, MUG database [8] JAFFE: 94.37%
MUG:95.24% Facial images of different classes lead to poor classification
7 Canny Edge detection+ PCA + ANN
JAFFE database [6] 85.70% It required the high
calculating costs for the
learning process This approach uses ANN
for classifying and the
number of hidden nodes
is identified by
experience
8 PCA+ Euclidean Distance
JAFFE database [8] 96.667% for 60 Eigen
faces Specific distance for each class of expression was not calculated using Euclidean distance. Need more work with
preprocessing step for
100% recognition rate
9 LBP for feature extraction
From video frame
[9] 94.70% Video for some more
scope for movement and video take in the real surrounding environment can be further updates for
this Work
10 Local binary patter feature
(LBP) +SVM
[10] 91.90% The necessity of the proposed unrelated features for facial
expression recognition is
verified to be important in facial recognition Framework The proposed approach
achieves the better
performance against the state-of art Methods
2.8 FRVT 2006 and ICE 2006 Large-Scale Experimental Results
This describes the large-scale experimental results from the Face Recognition Vendor Test (FRVT) 2006 and the Iris Challenge Evaluation (ICE) 2006. The FRVT 2006 looked at recognition from high-resolution still frontal face images and 3D face images, and measured performance for still frontal face images taken under controlled and uncontrolled illumination. The ICE 2006 evaluation reported verification performance for both left and right irises. The images in the ICE 2006 intentionally represent a broader range of quality than the ICE 2006 sensor would normally acquire. This includes images that did not pass the quality control software embedded in the sensor. The FRVT 2006 results from controlled still and 3D images document at least an order-of-magnitude improvement in recognition performance over the FRVT 2002. The FRVT2006 and the ICE 2006 compared recognition performance from high-resolution still frontal face images, 3D face images, and the single-iris images. On the FRVT 2006 and the ICE 2006 data sets, recognition performance was comparable for high-resolution frontal face, 3D face, and the iris images. In an experiment comparing human and algorithms on matching face identity across changes in illumination on frontal face images, the best performing algorithms were more accurate than humans on unfamiliar faces.
2.9 Automatic 3D reconstruction for face recognition
This proposes an analysis-by-synthesis framework for face recognition with variant pose, illumination and expression. First, an efficient 2D-to-3D integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model from a single frontal face image with neutral expression and normal illumination; Then, realistic virtual faces with different of pose, illumination and expression are synthesized based on the personalized 3D face to characterize the face subspace; Finally, face recognition is conducted based on these representative virtual faces. Compared with other related works, this framework has the following advantages; 1) only one single frontal face is required for face recognition, which avoids the burdensome enrollment work; 2) the synthesized face samples provide the capability to conduct recognition under difficult conditions like complex pose, illumination and expression; and 3) the proposed 2D-to-3D integrated face reconstruction approach is fully automatic and more efficient. From the experimental results show that the synthesized virtual faces significantly improve the accuracy of face recognition with variant pose, illumination and expression
2.10 A coarse-to-fine curvature analysis-based rotation invariant 3D face land marking
Automatic 2.5D face land marking aims at locating facial feature points on 2.5D face models, such as eye corners, nose tip, etc. and has many applications ranging from face registration to facial expression recognition. In this paper, we propose a rotation invariant 2.5D face land marking solution based on facial curvature analysis combined with a generic 2.5D face model and make use of a coarse-to-fine strategy for more accurate facial feature points localization. Experimented on more than 1600 face models randomly selected from the FRGC dataset, our technique displays, compared to a ground truth from a manual 3D face land marking, a 100% of good nose tip localization in 8 mm precision and 100% of good localization for the eye inner corner in 12 mm precision.

Essay: FERS – imitate the human visual system

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: