The present research aims to investigate the effects of interpersonal distance, perceived gaze and facial expression on people’s gaze behaviour in social interaction. Along with this primary objective, the influences of social anxiety on individual differences in gaze behaviour were studied as well. There are several main findings. Firstly, participants spent more time on direct gaze when the avatar was standing close or showing direct gaze, while facial expressions did not induce any significant effects. The eye region is known to provide a wealth of information in social interaction (Letourneau & Mitchell, 2011) and this is supported by the current study. Compared with other facial areas, it was found that participants oriented their gaze to the avatar’s eye region more often than face or mouth. Moreover, the effects of interpersonal distance and the avatar’s gaze appeared to be larger in participants’ gaze that targeting the eye region as well. Regarding the secondary objective, it was found that arousal only motivated participants with HSA to gaze less at the avatar’s mouth.
Previous literature noticed that people found both over-proxemic interpersonal distance and threat-related facial expressions arousing, especially when these cues were accompanied with perceived direct gaze (Ioannou et al., 2014; Schrammel et al., 2009). Nevertheless, there were contradictory behavioural findings for gaze reactions. The present study appears to support the interpretation upheld by emotion recognition studies, stating that threatening social stimuli would attract attention. Although it was expected that participants might retain direct gaze despite the avatar’s gaze aversion in conversational setting, the results did not meet the expectation. Longer direct gaze duration may be related to enhanced attention in threatening situations. Alternatively, participants may show more direct gaze as they feel the social obligation to display reciprocal intimacy.

Emotion recognition studies often find people gazing at threatening facial expressions faster and more (Eisenbarth & Alpers, 2011; Wells et al., 2016). Similarly, participants in the current study oriented more to the avatar in arousing conditions. When the avatar was standing close, participant might feel like their personal space was being invaded. As a self-related cue, the avatar’s direct gaze can elevate the sense of discomfort as well (Ioannou et al., 2014), since participants could have the feeling of being within the attentional spotlight. Although some studies suggested that perceived direct gaze alone was insufficient to elicit arousal (Binetti et al., 2015; Helminen, 2017), this seems not to be the case in the current research. This is possibly because the avatar maintained direct gaze throughout the speech delivery. As noted by the previous studies, prolonged direct gaze could indicate potential dominance and social competence (Doherty-Sneddon & Phelps, 2005; Hamilton, 2016). Both over-proxemic interpersonal distance and prolonged direct gaze are intimidating to people, and they can hence lead to increased sense of threat and attention enhancement in interaction.

In addition to facilitating detection, people also appear to have difficulty in disengaging from threatening stimuli (Koster et al., 2004). This may possibly explain the longer direct gaze duration observed in the present study. From an evolutionary perspective, biological preparedness enables individuals to detect and focus on potentially threatening stimuli to increase the chance of survival (Sussman et al., 2016). Driven by enhanced awareness, gaze can be used to direct attention to sources or cues of threats in the environment. In the conversational task, avatars were the major social targets and provided most of the information in interaction. Most of the emotion recognition studies have shown that people’s attention is largely devoted to the most diagnostic or salient region of threat-related stimuli (Schurgin et al., 2014). Consistent with this, participants gazed longer at avatar’s face, especially the eye region, when the sense of threat increased. Eyes are important partly because they can indicate one’s visual attention in space (Kolkmeier, 2015). By looking at avatar’s eye region, participants could possibly gain information to determine where the threat is located. As the interpersonal distance became over-proxemic, the avatar could be the source of threat to participants. Hence, it would be important for participants to know whether they were the targets of avatar’s aggressive approach by looking into avatar’s eyes. In addition, the eye region also largely facilitates face perception (Gilad et al., 2009). In threatening situations, it is crucial for people to gather information efficiently. Therefore, participants would tend to learn more about the avatar’s identity by looking into their eyes when the sense of threat increased.

Alternatively, the results can be interpreted in terms of social engagement. Instead of imposing threat, intimate interpersonal distance and perceived direct gaze may promote the sense of social engagement displayed by the avatars. With reference to the Intimacy Equilibrium model (Argyle & Dean, 1965), it was expected that participants might avert their gaze to maintain the appropriate level of intimacy as the avatar intrusively approached. Nevertheless, the results seem to be inconsistent with this. Studies on interpersonal distance often adopt Hall’s model to define comfortable and uncomfortable physical approach, and several of them provide support for the Intimacy Equilibrium model (Bailenson et al., 2003; Ioannou et al., 2014). However, most of the “interactive scenarios” in these studies simply have experimenter walking towards participants, and/or vice versa. The current research shows that the models may not possess the same level of validity in conversational setting. Although the distance of “close” condition in the current study falls into the zone of intimate distance defined in Hall’s model (Bailenson et al., 2001), it may not be as intrusive as expected. Moreover, the inverse relationship of proxemic interpersonal distance and mutual gaze in maintaining appropriate intimacy may not be easily applicable in conversational interaction. One of the major differences between the previous and current settings is the sense of social engagement, which people should probably find themselves more socially involved in conversational interaction.

Unlike the previous literature, the conversational setting in the current study creates a scenario for the avatar and participant to engage in simultaneously. The threshold of inappropriate intimacy can possibly be higher in such scenario, and hence the proxemic interpersonal distance may not turn out to be as intrusive as expected. Similar to physical proximity, gazing at interactant’s face signals intimacy and social engagement in conversational interaction as well (Rossano, 2012). While proxemic interpersonal distance promotes intimacy, avatar’s direct gaze can indicate that participant is being within the attentional spotlight. Although literature has noticed the tendency for listeners to retain direct gaze despite speakers’ gaze aversion (Hamilton, 2016), the results do not appear to support this. In general people tend to show direct gaze in interaction to collect information and communicate intimacy (Cummins, 2012), and one’s engagement may foster equivalent level of interactant’s engagement. When the avatar was showing averted gaze or standing far away, the sense of social connection between avatar and participant may reduce. Reciprocity is considered as an important social norm in interaction (Qualls & Corbett, 2016). When avatar demonstrates a high level of social engagement in the interaction, participants may feel the social obligation to show more direct gaze as response.

Compared with interpersonal distance, the effects of perceived gaze on people’s gaze reactions seem to be more specific. It was found that participants gazed more at avatar’s head when he was standing close, but not when he was showing direct gaze. These are similar with the findings in Kolmeier’s work (2015). When participants were engaging in conversation with avatars, Kolmeier measured participants’ gaze direction based on their head orientation and found no significant effect. Approximate gaze direction measurement was acknowledged as a limitation in his work, and Kolmeier doubted whether the meaningful effects of perceived gaze in conversational setting were overlooked. The current research employed eye-tracking technique with high accuracy and addressed this limitation. As discussed, it is suggested that speaker’s gaze direction does influences listener’s perceived intimacy or threat. Given the saliency of the eye region in social interaction, this can explain why the effect of avatar’s gaze is large enough to be observable only when the analysis is limited to participants’ direct gaze duration. It seems that interpersonal distance i
nfluenced gaze behaviour to a larger extent than avatar’s gaze did. Nevertheless, it is also possible that the difference may be simply due to the increased area in participant’s visual space which occupied by avatar’s head in “close” conditions. Although it is difficult to interpret the differences with precise theoretical implication, the saliency of the eye region in social interaction is clearly demonstrated.

Not only the eye region, the current study shows that mouth is also an important cue in conversational interaction compared to other facial areas. Participants gazed more often at avatar’s mouth when he was standing close or showing direct gaze. This is possibly related to the saliency of mouth in audio-visual perception of speech, which was demonstrated in other studies as well (Bailly et al., 2010; Lansing &McConkie, 2003). As shown by Mcgurk effect, people integrate visual and audio information unconsciously when watching one speaks (Tiippana, 2014). Mcgurk effect is a multisensory illusion which first demonstrated by McGurk and MacDonald (1976). They dubbed an articulated consonant into a video of speaker articulating another consonant, and found that the perceived sound would become the fusion of the two different “articulated” consonants. Although participants did not gaze significantly more at avatar’s mouth than other facial areas in general, the relative importance of mouth in conversational interaction is shown when avatar was standing close or showing direct gaze. Nevertheless, it is uncertain whether participants’ increased gaze duration for mouth is due to enhanced attention or social engagement.

