Emotion recognition studies often find people gazing at threatening facial expressions faster and more (Eisenbarth & Alpers, 2011; Wells et al., 2016). Similarly, participants in the current study oriented more to the avatar in arousing conditions. When the avatar was standing close, participant might feel like their personal space was being invaded. As a self-related cue, the avatar’s direct gaze can elevate the sense of discomfort as well (Ioannou et al., 2014), since participants could have the feeling of being within the attentional spotlight. Although some studies suggested that perceived direct gaze alone was insufficient to elicit arousal (Binetti et al., 2015; Helminen, 2017), this seems not to be the case in the current research. This is possibly because the avatar maintained direct gaze throughout the speech delivery. As noted by the previous studies, prolonged direct gaze could indicate potential dominance and social competence (Doherty-Sneddon & Phelps, 2005; Hamilton, 2016). Both over-proxemic interpersonal distance and prolonged direct gaze are intimidating to people, and they can hence lead to increased sense of threat and attention enhancement in interaction.
In addition to facilitating detection, people also appear to have difficulty in disengaging from threatening stimuli (Koster et al., 2004). This may possibly explain the longer direct gaze duration observed in the present study. From an evolutionary perspective, biological preparedness enables individuals to detect and focus on potentially threatening stimuli to increase the chance of survival (Sussman et al., 2016). Driven by enhanced awareness, gaze can be used to direct attention to sources or cues of threats in the environment. In the conversational task, avatars were the major social targets and provided most of the information in interaction. Most of the emotion recognition studies have shown that people’s attention is largely devoted to the most diagnostic or salient region of threat-related stimuli (Schurgin et al., 2014). Consistent with this, participants gazed longer at avatar’s face, especially the eye region, when the sense of threat increased. Eyes are important partly because they can indicate one’s visual attention in space (Kolkmeier, 2015). By looking at avatar’s eye region, participants could possibly gain information to determine where the threat is located. As the interpersonal distance became over-proxemic, the avatar could be the source of threat to participants. Hence, it would be important for participants to know whether they were the targets of avatar’s aggressive approach by looking into avatar’s eyes. In addition, the eye region also largely facilitates face perception (Gilad et al., 2009). In threatening situations, it is crucial for people to gather information efficiently. Therefore, participants would tend to learn more about the avatar’s identity by looking into their eyes when the sense of threat increased.
Alternatively, the results can be interpreted in terms of social engagement. Instead of imposing threat, intimate interpersonal distance and perceived direct gaze may promote the sense of social engagement displayed by the avatars. With reference to the Intimacy Equilibrium model (Argyle & Dean, 1965), it was expected that participants might avert their gaze to maintain the appropriate level of intimacy as the avatar intrusively approached. Nevertheless, the results seem to be inconsistent with this. Studies on interpersonal distance often adopt Hall’s model to define comfortable and uncomfortable physical approach, and several of them provide support for the Intimacy Equilibrium model (Bailenson et al., 2003; Ioannou et al., 2014). However, most of the “interactive scenarios” in these studies simply have experimenter walking towards participants, and/or vice versa. The current research shows that the models may not possess the same level of validity in conversational setting. Although the distance of “close” condition in the current study falls into the zone of intimate distance defined in Hall’s model (Bailenson et al., 2001), it may not be as intrusive as expected. Moreover, the inverse relationship of proxemic interpersonal distance and mutual gaze in maintaining appropriate intimacy may not be easily applicable in conversational interaction. One of the major differences between the previous and current settings is the sense of social engagement, which people should probably find themselves more socially involved in conversational interaction.
Unlike the previous literature, the conversational setting in the current study creates a scenario for the avatar and participant to engage in simultaneously. The threshold of inappropriate intimacy can possibly be higher in such scenario, and hence the proxemic interpersonal distance may not turn out to be as intrusive as expected. Similar to physical proximity, gazing at interactant’s face signals intimacy and social engagement in conversational interaction as well (Rossano, 2012). While proxemic interpersonal distance promotes intimacy, avatar’s direct gaze can indicate that participant is being within the attentional spotlight. Although literature has noticed the tendency for listeners to retain direct gaze despite speakers’ gaze aversion (Hamilton, 2016), the results do not appear to support this. In general people tend to show direct gaze in interaction to collect information and communicate intimacy (Cummins, 2012), and one’s engagement may foster equivalent level of interactant’s engagement. When the avatar was showing averted gaze or standing far away, the sense of social connection between avatar and participant may reduce. Reciprocity is considered as an important social norm in interaction (Qualls & Corbett, 2016). When avatar demonstrates a high level of social engagement in the interaction, participants may feel the social obligation to show more direct gaze as response.
Compared with interpersonal distance, the effects of perceived gaze on people’s gaze reactions seem to be more specific. It was found that participants gazed more at avatar’s head when he was standing close, but not when he was showing direct gaze. These are similar with the findings in Kolmeier’s work (2015). When participants were engaging in conversation with avatars, Kolmeier measured participants’ gaze direction based on their head orientation and found no significant effect. Approximate gaze direction measurement was acknowledged as a limitation in his work, and Kolmeier doubted whether the meaningful effects of perceived gaze in conversational setting were overlooked. The current research employed eye-tracking technique with high accuracy and addressed this limitation. As discussed, it is suggested that speaker’s gaze direction does influences listener’s perceived intimacy or threat. Given the saliency of the eye region in social interaction, this can explain why the effect of avatar’s gaze is large enough to be observable only when the analysis is limited to participants’ direct gaze duration. It seems that interpersonal distance i
nfluenced gaze behaviour to a larger extent than avatar’s gaze did. Nevertheless, it is also possible that the difference may be simply due to the increased area in participant’s visual space which occupied by avatar’s head in “close” conditions. Although it is difficult to interpret the differences with precise theoretical implication, the saliency of the eye region in social interaction is clearly demonstrated.
Not only the eye region, the current study shows that mouth is also an important cue in conversational interaction compared to other facial areas. Participants gazed more often at avatar’s mouth when he was standing close or showing direct gaze. This is possibly related to the saliency of mouth in audio-visual perception of speech, which was demonstrated in other studies as well (Bailly et al., 2010; Lansing &McConkie, 2003). As shown by Mcgurk effect, people integrate visual and audio information unconsciously when watching one speaks (Tiippana, 2014). Mcgurk effect is a multisensory illusion which first demonstrated by McGurk and MacDonald (1976). They dubbed an articulated consonant into a video of speaker articulating another consonant, and found that the perceived sound would become the fusion of the two different “articulated” consonants. Although participants did not gaze significantly more at avatar’s mouth than other facial areas in general, the relative importance of mouth in conversational interaction is shown when avatar was standing close or showing direct gaze. Nevertheless, it is uncertain whether participants’ increased gaze duration for mouth is due to enhanced attention or social engagement.
Since participants involved in a relatively realistic social interaction in this study, the valence value of facial expression was supposed to be higher compared to previous emotion recognition studies. Although the obtained results appear to be in favour of the prediction driven from emotion recognition studies, surprisingly the current study does not find any significant effects of facial expressions. This is possibly because most of the previous studies only involved presentation of facial expressions alone, and they did not capture the effects of body expressions while the present study does. Similar to facial expressions, body language like postures and movements can convey emotional information indicating one’s affect or intention (Grezes et al., 2007; Rajhans et al., 2016). However, when the emotional values they conveyed are inconsistent, that carried by body expressions may influence emotion recognition. For instance, Righart and colleagues (2007) showed that participants were less capable in categorizing a fearful face when it was combined with a happy body expression. Avatars in the current study always stand still and straight, and this “neutral” body expression may weaken the emotional valence of facial expression. Therefore, the results do not necessarily imply that facial expressions are unimportant or less important than other factors in conversational interaction despite the non-significant effect.
Regarding the secondary objective, the present research shows that social anxiety does contribute to individual differences in gaze behaviour. Previous studies have shown that socially anxious individuals were more aroused by proxemic interpersonal distance and they tended to avoid eye contact (Perry et al., 2013; Wieser et al., 2010). Similar to these studies, the current research recruited participants form non-clinical population. Despite the non-significant difference in average direct gaze duration between the two groups, it is worth noticing that a tendency of shorter gaze duration for all gaze targets has been observed among those with HSA. Regardless of interpersonal distance and perceived gaze, social anxiety appears to motivate individuals to avert gaze, at least to a certain extent. Literature has generally explained this kind of avoidant behaviour as a result of increased arousal (Schneier et al., 2011; McTeague et al., 2009). Interestingly, the current study found that increase in direct gaze duration was larger in high than low social anxiety group when avatar was standing close. It may be argued that this is due to enhanced attention in participants with HSA in threatening situation. Nevertheless, the analysis of pupil dilation suggests that the effect of proximity on direct gaze duration among participants with HSA is not mediated by arousal. Therefore, it seems that attention enhancement cannot be the sole explanation for it.
Participants in the current study were recruited from non-clinical population, and therefore even those in high social anxiety group may not show pronounced gaze aversion despite the tendency to avoid social interaction. Instead, they can possibly adopt compensatory strategy to maintain normal functioning in accordance to social expectation. As discussed, eye contact is considered as a social rule in conversational interaction. While avatar is standing close, it may pressure participants to display reciprocal intimacy or social engagement by showing direct gaze. Since participants with HSA show relatively less direct gaze in general, they would need to increase a significant amount of direct gaze to meet the social expectancy. Whereas, it is found that increased arousal motivates participants in high social anxiety group to gaze less at avatar’s mouth when he is standing close. As shown by Mcgurk effect, people extract information by looking at speaker’s mouth spontaneously (Tiippana, 2014). Participants with HSA can be more aroused when avatar was standing close to them since they were more sensitive to other’s intrusion, and possibly regarded proximity as potential threat. Despite the effortful maintenance of direct gaze, they may pre-attentively look less at avatar’s mouth as they are not highly engaged in the interaction. Theoretically social anxiety is related to excessive self-focused attention in interaction (Hofmann, 2007). Participants with HSA possibly spent more cognitive resources on internal monitoring, and hence it led to increased arousal as well as less attention devoted to interaction.
To date, there are very few attempts have been made to study gaze behaviour systematically in realistic social interactions. The current research exploratorily employed eye-tracking technique in VR to address the limitations in previous literature. The sense of reality in the experiment is not only built upon the VR technology, but also the inclusion of multiple variables. Social interaction is a complex process which involves various factors, however, factors are rarely taken into account simultaneously in previous literature. Perceived gaze, interpersonal distance and facial expressions are often investigated independently in other studies. Due to the fundamental differences in experimental settings, it is difficult to resolve the contradictory behavioural findings by comparing studies directly. The current research suggests that people may tend to show more direct gaze in conversational interaction when interactant is standing close, and gaze behaviour is more likely to be in a reciprocal manner. Although participants are recruited from non-clinical population, it appears that the effects of social anxiety on gaze behaviour are subtle but observable. People with social anxiety tendency can effortfully adopt compensatory strategy to meet social expectation, but they may be less likely to engage in social interactions spontaneously.
The current research is successful as an exploratory study, but there are several limitations. Firstly, there is no clear interpretation can be made for the 3-way interaction obtained despite the effort to understand it. It is found that participants would look more at facial areas other than eyes and mouth when the interpersonal distance is close, but only if the avatar is showing averted gaze. This is suspected that it may be related to the relatively larger surface a
rea of the “face” region; theoretically, however, this does not seem to fit either of the main interpretations. Another critical limitation of this study is related to the degree of caution required for accepting the findings straightforwardly. As noted by Kolkmeier (2015), studies have found different reactive behaviour for intimacy regulation, such as body orientation and leaning. The current study obviously has not taken these behaviours into account, and these may have compensated the effects of gaze avoidance.
Despite the insightful findings, it is uncertain whether increased direct gaze is driven by enhanced attention or sense of social obligation. If the former interpretation is corrected, then future studies should find increased arousal or better memory in more threatening scenarios. Moreover, some emotion recognition studies have suggested that attention enhancement and gaze avoidance may occur in different time course. Phase-wise comparison has not been examined in regard to simplicity, but it can be done in future studies to attain more thorough understanding. Although the effects of facial expression seem to be relatively mild in the current research, it is by no means less important than gaze or interpersonal distance in social interaction. In reality, social interactions are complex, and the interplays of different factors are meaningful but yet to be explored. VR provides a means to study social interaction systematically, and it is worth employing it to test theories in realistic scenarios.
...(download the rest of the essay above)