Cultural differences in emotion recognition and expression

Some societal factors could influence our ability to process faces, and facial recognition could in turn predict social intelligence. In 2010, Johnston et al conducted an experiment in which they asked participants to view photographs of subjects showing varying levels of smiles and were asked things like which of the subjects they “would talk about personal issues with” or “would ask where the bathroom is”. The results suggested that perceivers tended to point to those demonstrating a “smile” type emotion, particularly if issues of trust or cooperation were involved. Participants evaluated those subjects who seemed to be enjoying themselves more positively than those who were smiling but were classified as “non-enjoyment” (either a grimace or a “fake” smile) (Johnston et al., 2010). While the existent literature does little to explore societal differences in perception and expression of emotion, Chen’s contribution in 2014 along with Johnston’s in 2010 suggest that the emotions we display certainly impact how we interact with each other in our day-to-day responsibilities in society.

There is a much larger body of research exploring cultural differences in emotion recognition and expression. In 1989, David Matsumoto noted that in the 1972 Ekman study, U.S. participants outperformed the Japanese. He suggested that some cultures such as Japanese follow social norms that might inhibit the understanding of emotion in cases where understanding it might be disruptive to social harmony. The literature suggests that Matsumoto collaborated with other researchers over the course of the next few years to explore these cultural differences in emotion recognition. One of the earliest significant contributions in this time period was in 1992 when Matsumoto & Assar outlined requirements for experiments studying cultural differences in expression. They established that in these studies (1) participants from various cultures must view the same set of stimuli, (2) expressions must meet the criteria for validly and reliably portraying said emotion, (3) each poser must appear only once in the set of stimuli, and (4) expressions must include posers of more than one race. (Matsumoto & Assar, 1992). In 1997, Michael Biehl, David Matsumoto, and Paul Ekman all collaborated with a number of other colleagues on exploring the differences in level of recognition and ratings of intensity across Hungarian, Japanese, Polish, Sumatran, US, and Vietnamese participants and the 7 core emotions established by Paul Ekman in 1972. The results revealed high agreement across countries in identifying the emotions portrayed in the photos, but cross-national differences in the exact level of agreement in anger, contempt, disgust, fear, sadness and surprise. In 2009, Matsumoto and colleagues tested whether the same cultural differences applied in recognizing spontaneous emotion by taking video frames from Olympic medal winners just after they had won or lost a medal, having them FACS coded into emotions, and then presenting them to observers of different cultures. They found that in the case of spontaneous emotion, observers of different cultures utilize the same facial cues when judging emotions, and the signal value of facial expressions is similar across cultures (Matsumoto et al., 2009).

Hilary Elfenbein and Nalini Ambady began a new branch of the cultural research by suggesting that there is an “in-group advantage” in the understanding of emotion: that participants were generally more accurate in recognizing emotions expressed by members of their own culture than in recognizing emotions expressed by members of another. The experiment was replicated across both positive and negative emotions and tested on non-facial nonverbal channels of emotion such as tone of voice and body language (Elfenbein & Ambady, 2002). Joshua Ackerman and his colleagues furthered this research by claiming they had found a “cross-race effect” in a study that asked participants to memorize emotional face stimuli and recall them later. Their results suggested White participants were more likely to remember angry Black faces than angry White faces and explained it with a biological response: that White participants found Black faces threatening and it was an evolutionary mechanism to remember them. This research was replicated by Eva Krumhuber and Antony Manstead in 2011 (Krumhuber & Manstead, 2011) and again by Steven Young and Kurt Hugenberg in 2012 (Young & Hugenberg, 2012) using the same stimuli set. JD Gwinn and Jamie Barden argued that in replicating the 2006 work by Ackerman, these two studies failed to validate the stimuli set. They noted that the stimuli contained only 4 black subjects whose facial expression were all quite “unusual”. They re-tested the effect of angry expressions on the memory of White and Black faces with some newly designed stimuli and found that angry expressions impaired memory for Black faces, compared to neutral which was contrary to the previous findings. They tested both a White and Black participant sample, finding similar results. They concluded that the cross-race effect was better explained by stereotype-congruency.

All of the literature discussed thus far in exploring biological, societal, and cultural differences in expression and recognition of emotion use nearly identical research methods: collecting a set of facial expression stimuli founded in Ekman’s 1972 theory of emotion or creating a new set that is then coded using the same FACS developed in 1978, presenting that to a panel of observers controlling for the variable of interest, asking them a set of questions about the faces presented, and then analyzing the results for significant differences. There is an entirely separate branch of work founded in Ekman’s 1978 FACS research that has sought over time to automate the coding process using machine learning and computer vision. The review suggests that it began in 1992 when Susanne Kaiser and Thomas Wehrle demonstrated a method where small dots were affixed to the faces of participants who were themselves FACS experts expressing various facial emotions. The dot patterns were captured and digitized from the videos using a special algorithm, and an artificial neural network was then used to automatically classify the distances and dot patterns into the separate emotions. In 1997, Curtis Padgett and Garrison Cottrell advanced the neural net classification method by testing three different representation schemes as input to the classifier to compare results (a full face projection, an eye-and-mouth projection, and an eye-and-mouth projection onto random 32×32 patches from the image) (Padgett & Cottrell, 1997). The results suggested that the latter of the three systems achieved an 86% generalization on new face images. During the same year, two other significant contributions were made testing alternative feature sets as input to machine learning classifiers: one by Lanitis, Taylor and Cootes which used measurements of the shapes of key facial features and spatial arrangements to achieve between 70% and 99% accuracy on a normal test set of 200 images (Lanitis, Taylor & Cootes, 1997) and one by Essa and Pentland which used estimates of facial motion called optical flow extracted from video slides to achieve similar results (Essa & Pentland, 1997). M.S. Barlett and colleagues advanced the research in 1999 by successfully feeding a hybrid feature set of facial features and optical flow estimations into a three-layer artificial neural network to automatically detect the presence of facial action units 1 through 7 in a facial image (out of Ekman’s total of 46 from the 1978 research) (Barlett et al., 1999).

Neural networks remained the method of choice for automatic facial emotion and facial action classification through the 1990s. In 2005, Meulders, De Boeck, Van Mechelen, and Gelman proposed a probabilistic feature analysis to extract the most relevant features to producing an expression, with the goal of identifying a minimal feature set that could more efficiently classify facial emotions. While neural networks remain a popular and effective method for classifying emotion even in recent research (Meng et al., 2016), the literature shows emergence of other methods that can make more efficient classifications with smaller feature sets, like support vector machines and hidden Markov models. These very same methods were used in 2012 by Jiang, Valstar and Pantic to create a fully automatic facial action recognition system (Jiang et al., 2012).

The methods that I will employ in my research are not focused on the automated recognition of emotion in facial expressions. Instead, we will use FACS-coded faces from the Cohn Kanade database of tagged facial images (Lucey et al., 2010) to measure how one’s own emotions effect our ability to perceive emotion in others. I would like to contribute to the current body of literature around societal context by asking the key question: does how we feel impact how we perceive others?

Q6 by Dr. Gregg Vesonder

In Kahneman’s book, System 1 is the term used to explain the part of our brain that makes quick, automatic decisions based only on information from the past. In other words, it is a low-energy decision making engine and does not bother to expend any energy making decisions using information that is not already known. System 2 is the part of our brain which is capable of making slow, well-thought out decisions and that often requires an extra expense of energy for critical thinking and incorporating pieces of information that may not be fully known or understood. There is a relationship between the Systems in that, ideally, the two systems work in harmony and when System 1 requires a little more thinking power to make a decision it turns to System 2 for processing. The theory suggests that all illogical decision making comes from cases where this harmony does not exist (Kahneman, 2011).

Kahneman explains that System 1 is easily influenced, impatient, impulsive, and more driven by emotion than System 2. When System 1 is fired up or under load (i.e. from emotions), System 2 tends to fail to override and performs poorly. In addition, every time we have an emotional experience, we are providing System 1 with more information on which it will automatically use to make a quick decision in the future. So, even if System 1 is not under load at the time of decision making, emotional experiences emotional in the past are still influencing the decision making process of our ‘autopilot’ System 1 which, Kahneman writes, actually makes the majority of our decisions even when we believe we are actually making rational decisions with System 2. I believe that emotional content and emotional experiences heavily influence our decision making, even if we are not emotional in the moment.

My hypotheses do not presently take this into account, but perhaps asking the participants to think about which faces are exhibiting certain emotions might actually be considered a System 2 task as it requires some level of thinking and careful examination. Based on this, it would be interesting to test if priming a subject with an emotional stimulus (i.e. suppressing System 2) in advance of completing the questionnaire would significantly alter emotion perception and the results.

Q7 by Dr. Gregg Vesonder

Gestalt Principles state that a whole is greater than the sum of its parts, or in other words, that the whole picture tells a different story than any individual piece. The concept of figure and ground explains that we have a perceptual tendency to separate parts or “figures” out from their background based on traits such as shapes, colors, or size. The focus in any moment is on the figure. The ground is simply the backdrop. Sometimes this is a stable relationship, but sometimes (in an unstable relationship) our attention shifts such that what was formerly the figure is now the ground, and vice versa. (Grais, 2017). In the example presented in the question text, a smile might be considered unstable. We may perceive it as “happy” when presented it in a blank context, or if the individual is sitting on the beach with their family. But we may perceive it as an altogether different emotion if that same smile is on the face of a shooter holding a gun.

Similarly, the Gestalt concept of “Proximity” explains that objects that appear close together appear to form groups. A smile alone may require more thinking to decide whether it is actually a “happy” emotion being shown than a smile among 11 other smiles or a group of people who are smiling in a photo. Context in this way does not have to be environmental with a single figure, but can also include multiple figures that exhibit some similar features.

Gestalt theory also explains that we tend to group things together that share similarities (i.e. shape, color, size) in the concept of “Similarity”. We have grown to recognize smile as a smile and a frown as a frown based on being exposed to hundreds if not thousands of past interactions with individuals who have exhibited those facial expressions. When confronted with a new expression, we are comparing features of that new expression to those from the past and classifying it based on similarities. They may not be as simple as “color” and “shape” and in fact may be quite complex and comprise over a hundred unique features we cannot describe individually, but the concept holds. In this way, past context can affect present perception of emotion.

In all of these cases, the common theme is that context absolutely effects our perception of emotion. When we test our theories, we should consider the context not only of the stimuli we are asking participants to tag but perhaps even the context of the participant. Do I perceive emotion when I myself am in the comfort of my own home versus just before leaving the office after a stressful day at work? Even when presented an identical image of a smile, my own context might alter my response.

Q8 by Dr. Gregg Vesonder

Yes. The primary common theme is that each of these items will invoke an emotional response in us. In 2017, Schindler et al. conducted a meta-review of the literature around extant measures of emotional response to stimuli from various domains, ranging from film, music and art to consumer products, architecture, and physical attractiveness, and developed a new assessment tool called the Aesthetic Emotions Scale (AESTHEMOS) designed to measure the stimuli’s perceived aesthetic appeal from any of these domains (Schindler et al., 2017). What they discerned from their literature review is that extant measures of emotion have become very domain-focused because the way we respond emotionally to, say, a landscape, is different than how we might respond to a piece of music (i.e. the collection of emotions invoked are typically different), but both responses are emotional ones. They call these responses “aesthetic emotions”.

While AESTHEMOS focuses on creating a domain-agnostic assessment tool for measuring emotional response to stimuli, one contribution by my research would be to measure how our own emotions affect our perception of stimuli from these various domains, just as we do with faces. For instance, if the lit review conducted by Schindler et al. suggests that different combinations of emotions are invoked by stimuli from different domains, then what does feeling angry do to our perception of the world around us? Are we less likely to enjoy art and music, or will we feel more enjoyment (happiness) from certain types of art and music more in those cases because they provide an outlet?

I say: yes, there are common themes in the perception of all of these stimuli in that they all invoke an emotional response in us. And I hypothesize that our state of emotion effects our perception of them, and therefore their effect on us, in different ways depending on the domain that they come from.

Q9 by Dr. Gregg Vesonder

Data Structure

The structure of the raw data collected has 4 main sections in a single flat table containing a total of 137 columns… First there is a unique Session ID (to the user and device) for every submission along with their Age, Gender, whether or not they identify as a Native English Speaker, and their baseline self-rated emotion response (Happy, Sad, Angry, Afraid, Surprised):

Following this there is a long series of columns containing the 8 images that the user was shown for each emotion (since they are presented from a random pool) and whether or not the user flagged that image (i.e. “Tap the faces that look ‘happy’”). We do this for each of the 5 emotions and twice more for “NOT Happy” and “NOT Sad” resulting in a total of 112 columns containing this data:

The third component is the same user’s self-rated emotion responses on a scale of 1-5 after they have been asked to tag all of the faces to see if playing the ‘game’ has had any impact on emotion:

And finally there is a series of time stamps indicating the time that the user submits each task, designed to see if there is variation in response time depending on the emotional responses.

Preprocessing

Before any quantitative analysis, data processing will be applied to calculate some additional features:

(1) For each face tagged (there are 56), we will compare them to the already-tagged Cohn-Kanade database from whence they come (Lucey et al., 2010) to see if the user was “correct”. This will generate 56 new features explaining, for each face, we know whether they correctly identified the dominant emotion.

(2) For each emotion (there are 5) and each “non” emotion (there are 2) we will tally the total number of responses correct and incorrect, as well as the overall total correct and incorrect. This will generate 16 new features.

(3) For each time stamp, we will calculate the completion time (in seconds) it took each participant to complete the step as well as the total time to completion. This will generate 10 new features.

(4) Each participant will also be placed in an age group: (18 to 24), (25 to 44), (45 to 64), (65 and over) based on those collected by the US Census Beaureau.

(5) For each participant, we will create 5 new binary features, each representing a positive or negative flag for feeling each emotion. For example, if a participant responds 1-2 (low) on the ‘sad’ scale, they will be considered “not sad”. If they respond 3-5 (mid-high) on the ‘sad’ scale, they will be considered “sad”.

Our dataset will now have a total of 225 columns. 26 of those features are of interest for statistical analysis (those generated in (2) and (3)) and the remainder will be treated as explanatory variables or preserved for exploratory hypotheses.

Qualitative Analysis

Before any quantitative statistical analysis is performed, a qualitative assessment of the data will be conducted. Histograms will be generated for age group, gender, the native English speaker flag to look for any anomalies or outliers in the distributions that should be removed prior to formal analysis. The way the application is designed should not allow for any missing data. Rows with empty cells or empty responses will be removed before statistical analysis as they indicate a system error or abandonment of the questionnaire and preliminary data collection suggests that such cases should be sparse (<5%).

The distributions of the features calculated in preprocessing will be checked for normalcy to assess the appropriate statistical test method for comparing between-group responses.

An exploratory data analysis will be conducted to produce summary visualizations of the responses. Visualizations that explain the average number of correct/incorrect responses and average response times by age group, gender, English speakers, and emotional baseline will be created to tell a data story and present the results in summary. The visualizations will also aid in identifying any outliers or particularly interesting patterns.

Quantitative Analysis

The Spearman and Pearson partial correlations and their statistical significance will be calculated between the participants’ emotional responses (i.e. Happiness level) and the number of correct/incorrect responses to each emotion and correct/incorrect responses overall. We will calculate these over all ages and genders, as well as within gender and age groups.

A student’s t-test will be conducted to test for statistically significant differences in the number of correct/incorrect responses to ALL emotions for each emotion group (i.e. do the number of correct “happy” responses differ between the “happy” and “not happy” groups?).

A one-way Analysis of Covariance (ANCOVA) test will also be conducted to compare the dependent variable (number of correct responses to each emotion) between emotion groups (i.e. “happy” vs. “not happy”) while including (1) age, (2) gender, and (3) native English speaker as covariates.

Finally, unsupervised learning techniques will be used to identify clusters of participants with similarities in their emotional responses that are more complex than obvious to the human eye. K-means clustering with varying levels of k will be employed on the participants’ responses to the emotional questionnaire (5 features) and the elbow method will be used to identify the optimal k. DBSCAN (density-based spatial clustering of applications with noise) will also be used to generate the same emotion clusters. Then, ANOVA and ANCOVA tests will be performed once more to compare the number of correct/incorrect responses between these new complex “emotion clusters” that were generated by k-means and DBSCAN, and the results compared.

Each of these quantitative analyses will generate a LOT of results, but will all be done algorithmically so that significance levels and correlations can be compared easily in the end.

Q10 by Dr. Babak Heydari

There are various machine learning methods that are popularly used for classification problems like the one in question. Deciding on the most effective approach is usually a function of things like the size of the sample set, the dimensionality of the feature space, whether or not we believe the data is linearly separable, and any underlying assumptions the method might make about the distribution of the data. A few of the more popular methods are discussed here as well as advantages and disadvantages of each and the reason for final selection.

Logistic Regression – one of the simpler and more traditional approaches, and often a good place to start, logistic regression fits a linear regression model to the training data and makes predictions by computing the probability that a dependent variable falls into a specific category as a linear function of independent variables. While one of its advantages are its simplicity, it assumes that the features are generally linear and that the feature space is linearly separable. There are few disadvantages to starting out with Logistic Regression in a new classification problem and then trying more advanced methods from there.

Naïve Bayes – Based on Bayes theorem that works on conditional probability: that the probability that something will happen given that something else has already occurred. Given this, we can calculate the probability of an event using its prior knowledge. The Naïve Bayes classifier assumes this holds true for the data we are using to make our prediction. It also assumes that all of the features in the data set we are using are unrelated to each other. This can be a disadvantage if learning the relationships between features would provide more accurate classification, since it is unable to do so. However, it is fast, simple, and highly scalable. It also works well with categorical data if the data is not linearly separable.

K Nearest Neighbors (KNN) – The KNN algorithm makes a prediction of a class based on the feature similarity of the test data to the existing (training) data. The advantage to this is that it is a non-parametric method, meaning it makes no prior assumptions about the distribution of the data and is therefore very helpful when we have no prior knowledge and need to let the structure of the data speak for itself. It works very well in real-world cases, and because there is no (or very minimal) formal “training” period, it is generally very fast. However, because it makes the prediction based on the “nearness” of similar items, it requires we come up with a meaningful measure of distance, which can be a challenge depending on the type of data we are working with. For the same reason it is insensitive to outliers, it is very sensitive to irrelevant features inappropriately included in the measure of distance.

Support Vector Machines – SVM’s separate the data into classes by maximizing the margin between classes using what are called “support vectors”. There are both linear SVM’s as well as non-linear when it is not possible to separate the training data using a hyperplane (in other words, the boundary the SVM creates doesn’t have to be a straight line). The benefit of non-linear SVM’s are that we can capture much more complex relationships between classes, but at the expense of being computationally expensive. Because they do not make any strong underlying assumptions of the data and because of their ability to understand complex relationships, they often provide some of the best classification performance for real world classification problems when simpler methods do not produce acceptable performance.

Decision Trees & Random Forests – Decision Trees use a branching methodology to make predictions just as the name would suggest. Each “branch” of the tree represents a decision made based on a prior decision, and a “leaf” node at the end of a branch represents a predicted class. They help make decisions under uncertainty, and also provide a nice visual representation of a decision situation (like deciding between classes). They also work well on categorical or even mixed data since they do not make any assumptions about the data or linearity. However, the accuracy of decisions generally goes down as the dimensionality of features go up and they generally do not work well for high dimensionality data sets. Random Forests generate multiple decision trees with different random samples of the data and then use the “most popular” prediction as the final output.

Artificial Neural Network & Deep Nets – Finally, artificial neural networks represent an entire branch of research that uses simulations of biological neural networks to make decisions or make predictions using data. The basic anatomy of an ANN consists of an input layer containing the feature set that is being used to make predictions, an output layer which contains one or more “nodes” representing an output of the network (this could be, for example, multiple classes), and a series of hidden layers which transform the input to the output data. All nodes are connected by a weight, which is reinforced when a neuron reaches a threshold and “fires” to those nodes on the right. We compare the output to that in the training set, adjust the weights to reduce error, and then make another guess. An ANN keeps doing this until it feels it can’t decrease the error any more. “Deep Learning” networks are simply ANNs with a much higher number of hidden layers. ANNs are very computationally expensive, but work well when the feature space is complex and generalized decisions need to be made by detecting patterns that may or may not be detectable by humans. They have been shown to work very well in computer vision applications, and are popular in facial recognition and emotion detection as seen in the literature, however, due to their complexity and computation intensity I will not be using ANN’s in this response.

Selected Model: For this problem, I have ruled out traditional methods like Logistic Regression and Naïve Bayes and the more advanced and computationally intense methods of ANNs and Deep Learning networks. Decision Trees will become too complex with the high dimensionality of continuous variables, and while the KNN approach may also provide good performance, coming up with a meaningful definition of distance may be difficult. For an implementation with relatively good performance and moderate complexity, I will be implementing a linear Support Vector Classifier for this problem.

Major Steps:

(processor.py)

1) First, we use the OpenCV (Open Computer Vision) library which has a pre-trained model that detects the face from an image to extract JUST the face from the images in the JAFFE database and use the ‘glob’ package to sort those files into subdirectories labeled by emotion. We chose three emotions to focus on to reduce the problem space: happy, sad, and angry

(classifier.py)

2) Next, we initialize a face detector and landmark predictor class using the open source ‘Dlib’ library which contains a pre-trained model to extract landmark coordinates from a facial image. These are going to be the features we use to train the SVC to recognize emotion. The pre-trained models have learned to extract the landmarks of 68 unique landmarks on the face. Rather than compute distances to a centroid or anything complex, we will use the raw coordinate values as input features to the SVC.

3) For 10 iterations, we:

a. Pull the images from the emotion directories (happy, sad, angry)

b. Split them into an 80/20 train & test set and append the emotion labels

c. Extract the facial landmarks and store the x and y coordinates in an array

d. Train an SVC classifier on the 80% training data

e. Test the performance of predictions on the 20% test data

f. Append the accuracy to an array

4) Calculate the mean of all of the iteration accuracies to produce a final result.

The mean accuracy on a single run (10 iterations) was 0.878, which is quite good. However, since the sample size is so small, it is worth noting that the iteration accuracies bounce between the same few values… this is because for each emotion, the test set is only ~6 faces or so. In some cases, we even see 100% accuracy which is likely improbable at scale. In order to improve the model’s performance and ability to generalize, we might try:

– Including additional data. The sample size is pretty small in this example (~30 faces per each emotion) which gives the model less data to use to differentiate.

– Exploring different features. For this example we used the raw coordinates, but we might explore using distance measures, for example, between the facial landmarks.

– Image transformations. For this example we leave the greyscale images as they are, but there are a number of techniques that apply transformations to the images that make differentiating features stand out more (for example, adjusting contrast, or applying filters that reduce the number of pixels to only the principal components of an NxN grid overlaid on the image).

SVC performance is typically visualized by projecting the feature space into two dimensions and then visualizing the linear separation. Because of the high dimensionality of the feature space in this problem (68), projection is quite difficult and the visualizations become meaningless. I am including an output of the model here and a link to the git repository below:

runfile(‘/Users/jmanfre/dev/python/jaffe/classifier.py’, wdir=’/Users/jmanfre/dev/python/jaffe’)

training SVM 0

accuracy: 1.0

training SVM 1

accuracy: 0.888888888889

training SVM 2

accuracy: 0.944444444444

training SVM 3

accuracy: 1.0

training SVM 4

accuracy: 0.777777777778

training SVM 5

accuracy: 0.722222222222

training SVM 6

accuracy: 0.944444444444

training SVM 7

accuracy: 0.888888888889

training SVM 8

accuracy: 0.722222222222

training SVM 9

accuracy: 0.888888888889

mean accuracy for Linear SVM: 0.877777777778

Git repository: https://github.com/joemanfredonia/JAFFE-Emotion-Classifier

Packages used (cited in References)

– DLib (for extracting the coordinates of 68 facial features from the images)

– Glob (for file and directory manipulation)

– Numpy (for basic numeric array manipulation)

– OpenCV (for automatically extracting the region of raw images that contain the face)

– Random (for generating a random 80/20 Train/Test split)

– Scikit-learn (for training and testing the Support Vector Classifier)

Projects referenced for framework

– Paul Vangent’s “Emoton Recognition with Python, OpenCV, and a face data set http://www.paulvangent.com/2016/04/01/emotion-recognition-with-python-opencv-and-a-face-dataset/

– Paul Vangent’s “Emotion Recognition using Facial Landmarks” tutorial http://www.paulvangent.com/2016/08/05/emotion-recognition-using-facial-landmarks/

Q11 by Dr. Babak Heydari

In the literature review conducted in response to Dr. Mansouri’s question, I explored the evolution of a taxonomy of research branching from a foundation in Paul Ekman’s work in 1972 on emotion expression (Ekman, 1972) and in 1978 on the coding of facial expression through facial action units (FACS) (Ekman, 1978). I discussed that through the 1990s and 2000s there was a significant amount of research done on societal and cultural effects on emotion recognition and expression, as well as an evolution of computational methods used to code for those emotions when expressed. In this response, I will elaborate on other more recent branches of the taxonomy that grew from the same roots in Ekman’s 1972 research.

While there was a heavy focus on the effects of cultural and societal context through the early 2000s, more recently, Barrett & Kensinger explored whether context in general is routinely encoded during emotion perception. Their research in that study formally validated that people remember the context more often when asked to label an emotion in a facial expression than when asked to simply judge the expression itself. Their research suggested that facial action units when viewed in isolation might be insufficient for perceiving emotion and that context plays a key role. (Barrett & Kensinger, 2010). One year later in 2011, Barrett, Mesquita & Gendron continued the research to test various context effects during emotion perception, such as visual scenes, voices, bodies, and other faces, and NOT just cultural orientation. Their findings suggested that, in general, context is automatically encoded in perception of emotion and plays a key role in its understanding.

There is a branch of research that began to explore differences in biology and their effects on emotion perception. The first part of that branch focused on age. In 2010, Phills, Scott, Henry, Mowat, and Bell conducted a study where they compared the ability to recognize emotion between healthy older adults, those with Alzheimer’s disease, and those with late-life mood disorder. Emotion detection was impaired, expectedly, in those with Alzheimer’s, and also slightly in the MD group. (Phills, Scott, Henry, Mowat & Bell, 2010). They also found that issues with emotion perception predicted the quality of life in older adults, indicating that emotion decoding skills play an important role in the well-being of older adults and prompting some further research on the Age relationship. In 2011, Kellough and Knight conducted a study that suggested that there is a positivity bias in older adults, and explained it by suggesting that these effects were related to “time perspective” rather than strictly to age per se. (Kellough & Knight, 2011). This research was validated in a systemic meta-analysis in 2014 by Reed, Chan and Mikels where their analyses indicated that older adults indeed show a processing bias toward positive versus negative information, and also that younger adults show the opposite pattern. (Reed, Chan & Mikels, 2014). In 2011, Riediger, Voelkle, Ebner & Lindenberger conducted a study that included not just adults but also younger raters to assess the age effect more broadly. They found results that also suggested the age of the poser might effect the raters’ ability to correctly identify the emotion (Riediger, Voelkle, Ebner & Lindenberger, 2011). This was studied specifically by Folster, Hess & Werheid in 2014 where they concluded that the age of the face does indeed play an important role for facial expression decoding, and that older faces were typically more difficult to decode than younger faces (Folster, Hess & Werheid, 2014).

The second part of the “biology” branch explored Gender. There were a couple of point studies conducted in 2010, one by Collignon et al. in a multisensory study where participants were asked to categorize fear and disgust expressions through facial expression, and also accompanied by audio. They found that women tended to process the multisensory emotions more efficiently than men (Collignon et al., 2010). The second study the same year that highlighted gender differences was by Hoffman et al. where their results suggested women were more accurate than men in recognizing subtle facial displays of emotion, even though there were no significant differences observed when the facial expressions being identified were labeled as “highly expressive” (Hoffman et al., 2010).

In the late 2000s and early 2010s the large majority of new literature around facial expression perception seems to focus on its relationship with an assortment of psychological disorders. In 2010, Bourke, Douglas and Porter found that there was evidence in patients with clinical Depression of a bias toward sad expressions and away from happy expressions (Bourke, Douglas & Porter, 2010). The same year, Schaefer et al. conducted a similar study for bipolar depressive raters specifically, and found evidence of emotional processing abnormalities (Schaefer et al., 2010). Kohler et al. tested various controls for a similar bipolar rater panel to explore whether there were other explanatory factors and found the same deficit regardless of task type, diagnosis, age of onset/duration of illness, sex, or hospitalization status, suggesting that difficulty with emotion perception is likely a stable deficit in depressive disorders (Kohler et al., 2011). In 2012, Penton-Voak et al. furthered the research by testing the effects of emotion perception training on depressive symptoms and mood in young adults. They found that there was some evidence for increased positive mood at a 2-week follow-up compared to controls, suggesting that modification of emotional perception might lead to an increase in positive effect (Penton-Voak et al., 2012). This sort of finding has seeded further research about how emotion perception training or intervention might actually be used to aid those suffering from psychological disorders, namely depression. We will discuss this later.

Like depression, there is a large body of work focusing on schizophrenia. In 2010, Chan, Li, Cheung, and Gong noted that there was mixed evidence regarding whether patients with schizophrenia have a general facial emotion perception deficit or only a deficit in specific facial emotion recognition tasks (Chan, Li, Cheung, and Gong, 2010). They conducted a meta-analysis of 28 facial emotion perception studies and found patients with schizophrenia that included control tasks, and their findings demonstrated a general “moderate to severe” impaired ability to perceive facial emotion in schizophrenics. This seeded a chain of follow up research. Brown and Cohen in 2010 studied which specific symptoms of schizophrenia seemed to contribute to the deficit. They found that impaired ability to label emotional faces did not correlate with symptoms, but were generally associated with lower quality of life and disorganization (Brown & Cohen, 2010). The same year, Linden et al. studied the same ability in raters but with a focus on working memory. Their results actually indicated a preserved implicit emotion processing in schizophrenia patients, which contrasts with their impairment in explicit emotion classification (Linden et al., 2010). In 2011, Amminger et al. examined at-risk patients for schizophrenia as well as those who were clinically stable with first-episode diagnosis to test whether emotion recognition deficit was apparent in people at risk before

Essay: Cultural differences in emotion recognition and expression

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: