Facial expression of emotion and perception of the Uncanny Valley in virtual characters

https://doi.org/10.1016/j.chb.2010.10.018Get rights and content

Abstract

With technology allowing for increased realism in video games, realistic, human-like characters risk falling into the Uncanny Valley. The Uncanny Valley phenomenon implies that virtual characters approaching full human-likeness will evoke a negative reaction from the viewer, due to aspects of the character’s appearance and behavior differing from the human norm. This study investigates if “uncanniness” is increased for a character with a perceived lack of facial expression in the upper parts of the face. More important, our study also investigates if the magnitude of this increased uncanniness varies depending on which emotion is being communicated. Individual parameters for each facial muscle in a 3D model were controlled for the six emotions: anger, disgust, fear, happiness, sadness and surprise in addition to a neutral expression. The results indicate that even fully and expertly animated characters are rated as more uncanny than humans and that, in virtual characters, a lack of facial expression in the upper parts of the face during speech exaggerates the uncanny by inhibiting effective communication of the perceived emotion, significantly so for fear, sadness, disgust, and surprise but not for anger and happiness. Based on our results, we consider the implications for virtual character design.

Introduction

Jentsch (1906) first introduced the subject of “The Uncanny” into contemporary thought in an essay entitled “On the Psychology of the Uncanny”. The uncanny was described as a mental state where one cannot distinguish between what is real or unreal and which objects are alive or dead. Referring to Jentsch’s essay, Freud (1919) characterized the uncanny as a feeling caused when one cannot detect if an object is animate or inanimate upon encountering objects such as “waxwork figures, ingeniously constructed dolls and automata” (p. 226).

In 1970 (as translated by MacDorman and Minato (2005)), the roboticist Masahiro Mori made associations of the uncanny with robot design. Mori observed that as a robot’s appearance became more human-like, a robot continued to be perceived as more familiar and likeable to a viewer, until a certain point was reached (between 80% and 85% human-likeness), where the robot was regarded as more strange than familiar. As the robot’s appearance reached a stage of being close to human, but not fully, it evoked a negative affective response from the viewer. Fig. 1 depicts a visualization of Mori’s theory showing familiarity increasing steadily as perceived human-likeness increases, then decreasing sharply, causing a valley-shaped dip.

The unpleasant feelings evoked by the uncanny have been attributed to it being a reminder of one’s own mortality (MacDorman, 2005, Mori, 1970). Kang (2009) however, suggested the negative impact of the uncanny is related to how much of a threat a character is perceived to be and how much control we have over the potentially threatening or dangerous interaction.

Advances in technology have facilitated increased visual realism in video games and designers in some game genres are creating near-realistic, human-like characters. Contrary to Mori’s advice, these designers are aiming for the second peak as enhanced realism is believed to improve the player experience and sense of immersion (e.g. Ashcraft, 2008, Plantec, 2008). As characters approach high levels of human-likeness and exhibit human-like motor behavior, aspects of their appearance and behavior are being placed under greater scrutiny by the audience. Factors such as facial expression may appear odd or unnatural and can adversely make a character appear life-less as opposed to life-like. As with robots, highly human-like video game characters may be subject to the Uncanny Valley phenomenon (e.g. Brenton et al., 2005, Gouskos, 2006, MacDorman et al., 2009, Pollick, 2009).

Design guidelines have been authored to advise character designers on how to avoid the Uncanny Valley. Such guidelines have included factors such as facial features and proportion and level of detail in skin texture (e.g. Green et al., 2008, MacDorman et al., 2009, Seyama and Nagayama, 2007). Hanson (2006) found that by changing a character’s features to a more cartoon-like style eliminated the uncanny. Schneider, Wang, and Yang (2007) identified that character designs of a non-human appearance with the ability to emote like a human were regarded more positively. These authors acknowledge that the results from their experiments provide only a partial understanding of what a viewer perceives to be uncanny, based on “inert” (unresponsive) still images. The majority of characters featured in animation and video games do not remain still, and cross-modal factors such as motion, sound, timing and facial animation contribute to the Uncanny Valley (Richards, 2008, Weschler, 2002). When a human engages with an android, behavior that seems natural and appropriate from the android (referred to as “contingent interaction” by Ho, MacDorman, and Pramono (2008, p. 170)) is important to obtain a positive response to that android (Bartneck et al., 2009, Kanda et al., 2004). Previous authors (such as Green et al., 2008, Hanson, 2006, MacDorman et al., 2009, Schneider et al., 2007) state that, had movement been included as a factor, the results and conclusions drawn from their experiments might have differed.

Previous attempts to recreate an Uncanny Valley shape do not comply with Mori’s (1970) diagram and suggest that it may be too simplistic with various factors (including dynamic facial expression) influencing how uncanny an object is perceived to be (see e.g. Bartneck et al., 2009, Ho et al., 2008, MacDorman, 2006, Minato et al., 2004, Tinwell and Grimshaw, 2009, Tinwell et al., 2010).

It is well-documented that, in humans and animals, successful recognition of each type of the six universally recognized basic emotions, anger, disgust, fear, happiness, sadness and surprise, (Ekman, 1992a, Ekman, 1992b) serves a different adaptive (survival or social interaction) function (Darwin, 1965, Ekman, 1979, Ekman, 1992a, Ekman, 1992b). For example, detection of fear and sadness in others may foretell potential harm or distress to self and humans react instinctively to such emotions to avoid a possible threat. As Blair (2003) states:

Fearful faces have been seen as aversive unconditioned stimuli that rapidly convey information to others that a novel stimulus is aversive and should be avoided (Mineka & Cook, 1993). Similarly, it has been suggested that sad facial expressions also act as aversive unconditioned stimuli discouraging actions that caused the display of sadness in another individual and motivating reparatory behaviors (Blair, 1995, p. 561).

It has been suggested that disgust also serves the adaptive function of evoking a negative, aversive reaction from the viewer; it warns others to be concerned about potential infection or approaching a distasteful object (Blair, 2003). In contrast, displays of anger or embarrassment do not serve to act as unconditioned stimuli for instrumental learning. Instead, they are important signals to modulate current behavioral responses, particularly in social situations involving hierarchy interactions (Blair and Cipolotti, 2000, Keltner and Anderson, 2000).

Ekman and Friesen’s (1978) Facial Action Coding System (FACS) has been integrated within facial animation software to achieve authentic facial expression of emotion in realistic, human-like video games characters. Dyck et al. (2008) conducted a study to investigate whether the facial emotional expressions of a virtual character could be recognized as easily as those produced on human faces. Still images of virtual characters expressing the emotions happiness, sadness, anger, fear, disgust, and neutral were compared to still images of humans expressing the same emotions at the same medium levels of intensity. Emotion recognition between the two groups indicated that while emotions expressed were, for the most part, recognized equally well in humans and virtual characters, the two emotions fear and sadness achieved better recognition rates when presented in the virtual character than when expressed on human faces; disgust was the only emotion that could not achieve an acceptable recognition rate in virtual characters when compared to humans and was mainly confused with anger. The emotions anger and happiness were recognized equally well for both groups. Again, the authors acknowledged that the results of their study might have been different had animated characters been used to assess how emotions can be interpreted with motion (and speech).

There have been no studies investigating if the type of emotion portrayed by a virtual character influences level of perceived uncanniness and if so which emotions are most significant in exaggerating the uncanny. Considering that anger, fear, sadness and disgust may be considered signals of a threat, harm or distress (Ekman, 1979), it would be reasonable to suggest that these survival-related emotions will be regarded as more uncanny in near human-like (but not quite fully authentic), virtual characters; especially when part of the facial expression of them is aberrant, blurring the clarity of their depiction. Emotions not associated with threat or distress (i.e. happiness and surprise) may be regarded as less important, formidable or essential for survival and therefore less noticeably strange or uncanny; even when the animation of facial features appears odd or wrong to the viewer.

During speech nonverbal signals are used to interpret the emotional state of a person. Nonverbal signals are largely conveyed by the upper part of the face; the lower region of the face constrained by the articulatory processes (Busso and Narayanan, 2006, Ekman, 2004, Ekman, 1979, Ekman and Friesen, 1978, Ekman and Friesen, 1969). For example a narrowing of the eyes and a shaking fist show that a person is angry. Raised eyebrows typically demonstrate the emotion surprise (Ekman, 2004). Brow lowering and raising are used as “batons” (Ekman, 2004, p. 41), to provide additional emphasis for words or phrases. A lowered brow is associated with negative emotions such as “fear, sadness, … anger” (Ekman, 2004, p. 42) to accentuate a negative word. A raised brow is likely to be associated with more positive emotions such as happiness and surprise to emphasise a more positive word such as “easy, light, good” (Ekman, 2004, p. 42). Communication is enriched by nonverbal signals generated by movement in the forehead and eyelids to display emotional content. As such, it has been recommended that when creating virtual characters, the upper face must be modeled correctly to avoid confusion as to the affective state of the character (Busso & Narayanan, 2006). The results from the previous study by Tinwell et al. (2010) support these findings. Strong relationships were identified between the uncanny in virtual characters and an awareness of a lack of expressivity in the upper face region, the forehead area being of particular significance. The current authors aimed to extend this by investigating whether inadequate movement in the upper facial areas may have differential effects depending on which emotion is being depicted.

On the basis of results from previous studies indicating features that contribute to the uncanny (e.g. Green et al., 2008, MacDorman et al., 2009, Tinwell, 2009, Tinwell and Grimshaw, 2009, Tinwell et al., 2010), the current study examined the phenomenon of the Uncanny Valley using a male human (referred to as the group human) and a male virtual character (“Barney” from the video game Half-Life 2 (Valve, 2008)) who produced synchronized and emotionally congruent utterances. To investigate how an ambiguity of facial expression during speech and how blurring the salience of an emotion may exaggerate the uncanny in virtual characters, the virtual character was used in two experimental conditions: one in which movement above the lower eyelids was disabled (a partially animated condition named lack) and one in which movement was not disabled (a fully animated condition named full).

To establish if the type of emotion conveyed by a character also affects perceived uncanniness for that character, all six basic emotions (anger, fear, disgust, happiness, sadness and surprise) were portrayed (facially and prosodically) by the human and virtual character. The experimental hypotheses were as follows:

H1a. Effect of condition: For all of the six basic emotions of anger, disgust, fear, happiness, sadness and surprise (Ekman, 1992a, Ekman, 1992b): the group human will be rated as most familiar and human-like; the group full (facial animation) will be rated second highest for familiarity and human-likeness; and the group lack (of upper facial animation) will be rated as least familiar and human-like.

H1b. Emotion type: The degree of difference in perceived familiarity and human-likeness between full and lack will vary across the six emotions due to differences in the psychological significance of each emotion type and variations in the importance of upper facial cues in their successful detection. Negative valence emotions (aversive or warning stimuli), that is anger, fear, sadness and disgust will attract the lowest ratings of familiarity and human-likeness (i.e. be perceived as most uncanny) in human and in both animated conditions, especially so in lack.

Section snippets

Design

A 3 × 6 repeated measures design was used in the study. The independent variables (IVs) were: (1) the type of character (a) human (a human actor), (b) full (a virtual character with full facial animation), and (c) lack (the virtual character with movement disabled in the upper part of the face); and (2) the type of emotion expressed facially and orally by the human or virtual character (six types; the six basic emotions: anger, fear, disgust, happiness, sadness, and surprise (Ekman, 1992a, Ekman,

Results

The first hypothesis (H1A) proposed that, for both familiarity and human-likeness, there would be a main effect of character type (human videos versus full animation versus lack of upper face animation). Specifically, regardless of emotion type, human videos would be rated highest on both DV measures followed by those in the full condition with lack videos rated lowest. Table 3 (and Fig. 2, Fig. 3) shows the mean ratings for familiarity and human-likeness associated with each emotion for each

Effect of inadequate upper facial animation on the Uncanny Valley

A primary aim of the current research was to investigate the effect of inadequate facial movement (leading to an ambiguity in the emotional expression being portrayed) on perceived uncanniness in virtual characters. As was predicted, humans were rated the most familiar and human-like, followed by fully animated characters and then partially animated characters with movement disabled in the upper face region. These results may be accounted for in the following ways.

Uncanniness ratings were low

Conclusion

Data from this study imply that animated, high-fidelity, human-like, talking-head, virtual characters are rated by users as uncanny (less familiar and human-like) but significantly more so when movement, and therefore emotional expressivity, is limited in the upper face. More important, the magnitude of this increased uncanniness varies depending on which emotion is being communicated. Under these conditions, the emotions fear, sadness, disgust, and surprise, evoke a strong sense of the uncanny

Acknowledgments

For this study thanks is given to Daniel Whitehead for acting as the human stimulus, and Brennan Tighe and Roy Attwood at the University of Bolton for providing technical support with recording the video footage of the human. Thanks are also due to our colleague John Charlton for his expert advice and opinion.

References (48)

  • C. Darwin

    The expression of the emotions in man and animals

    (1965)
  • Dyck, M., Winbeck, M., Leiberg, S., Chen, Y., Gur, R. C., et al. (2008). Recognition profile of emotions in natural and...
  • P. Ekman

    About brows: Emotional and conversational signals

  • P. Ekman

    An argument for basic emotions

    Cognition and Emotion

    (1992)
  • P. Ekman

    Are there basic emotions?

    Psychological Review

    (1992)
  • P. Ekman

    Emotions revealed: Recognizing faces and feelings to communication and emotional life

    (2003)
  • P. Ekman

    Emotional and conversational nonverbal signals

  • Ekman, I., & Kajastila, R. (2009). Localisation cues affect emotional judgements: Results from a user study on scary...
  • P. Ekman et al.

    The repertoire of nonverbal behavior: Categories, origins, usage, and coding

    Semiotica

    (1969)
  • P. Ekman et al.

    Facial action coding system: A technique for the measurement of facial movement

    (1978)
  • P. Ekman et al.

    Felt, false and miserable smiles

    Journal of Nonverbal Behavior

    (1982)
  • N. Epley et al.

    When we need a human: Motivational determinants of anthropomorphism

    Social Cognition

    (2008)
  • S. Freud

    The uncanny

  • Gouskos, C. (2006). The depths of the Uncanny Valley. Gamespot. <http://uk.gamespot.com/features/6153667/index.html>...
  • Cited by (191)

    View all citing articles on Scopus
    View full text