Visual information for guiding real-time action is thought to be processed separately from more abstract perception in the mammalian brain, reflecting an evolutionary specialisation for the control of movement (Goodale
2017). The prevailing characterisation of visual processing identifies a ventral pathway (projecting from primary visual cortex, V1, to the inferior temporal lobe) that is primarily concerned with perception and identification of visual inputs, and a dorsal pathway (projecting from V1 to the posterior parietal lobe) which provides visual information for guiding real-time action (Goodale
2017; Goodale and Milner
1992; Milner and Goodale
1993). Vision-for-action and vision-for-perception pathways are separately susceptible to disruption from brain damage, indicating they are functionally segregated in the normal brain. Naturally, the two pathways interact on some level (Goodale and Cant
2007), but the dorsal pathway maintains a specialisation for visual control of skilled movement. There is a reason to question, however, whether this normal functional separation is maintained in the virtual world.
Cues to depth in the virtual world
The primary reason vision for action may be disrupted in VR is the artificial presentation of depth information (Wann et al.
1995). Several findings have illustrated the impaired estimation of distance and a general perception of the virtual world as ‘flatter’, although this effect seems to attenuate in higher fidelity systems (Interrante et al.
2004,
2006). The dorsal stream relies primarily on binocular information (Mon-Williams et al.
2001), whereas monocular cues to distance (such as texture and perspective) tend to inform perceived distance through the ventral stream. Restricted binocular cues to depth do not preclude execution of visually guided tasks (Carey et al.
1998), but reliance on monocular cues does lead to increased use of the ventral stream for guiding action (Marotta et al.
1998) and, as a result, movement inefficiency (Loftus et al.
2004). The ventral stream is required for pre-planned or delayed movements but utilizes different information to guide action. If binocular cues are impaired in VR, as the general perception of ‘flatness’ suggests they might be, actions in the virtual world may be achieved using much greater ventral input than real-world skills.
The primary binocular cues to depth are binocular disparity and vergence. Vergence (the simultaneous horizontal rotation of the eyes to maintain binocular fixation) is an important cue to depth for the dorsal stream (Mon-Williams and Tresilian
1999; Mon-Williams et al.
2001). Perceived depth is constructed using a range of available cues, but Tresilian et al. (
1999) propose that the weight afforded to vergence information decreases when there is a conflict between vergence and other depth cues—exactly as is the case in a VE. In the physical world, accommodation (the focusing of the lenses to maintain a clear image over distance) varies synchronously with vergence, but in head-mounted displays the normal connection is broken due to presentation of varying depth objects on a fixed depth screen (~ 5 cm from the eyes in head-mounted displays) (Eadie et al.
2000). This conflict may reduce the weight afforded to vergence as a cue to depth (Tresilian et al.
1999), leading to less reliable binocular information and a greater reliance on ventral processing (Marotta et al.
1998). Retinal image size also provides an effective cue to depth when object size is known. Lack of prior experience with and uncertainty about virtual objects may, however, make this cue uninformative as well. Consequently, general uncertainty about depth information may lead to a greater reliance on ventral mode control in VR.
Initial brain imaging findings have suggested that the normal pattern of dorsal and ventral activation may indeed be disrupted in VR. In the real-world, visual information about objects within arm’s reach (peripersonal space) tends to be encoded in the dorsal stream, while far-away objects (extrapersonal space) are processed using the ventral stream (Weiss et al.
2003). This reflects the archetypal dorsal/ventral distinction; near-by objects are potential targets for action, whereas far-away objects merely need to be recognised. To investigate this functional separation, Beck et al. (
2010) asked participants to make spatial judgements about objects presented at near (60 cm) and far (150 cm) locations in virtual space. In contrast to the expected dissociation, fMRI indicated a disordered picture of dorsal and ventral activation, with near objects eliciting a high degree of ventral processing and far objects eliciting some dorsal activation. As discussed, visually guided motor skills can still be performed adequately with ventral mode control, (Loftus et al.
2004), but this finding raises concerns that visually guided actions in VR may operate through fundamentally different mechanisms to those performed in the real-world.
An additional concern for the execution of visually guided motor skills in VR is the dearth of haptic information, which may also have negative effects on the user experience (Berger et al.
2018). Haptic feedback is derived from the active experience of touch but hand-held controllers in common VR systems do not change their tactile properties, other than providing vibrations to signal contact between virtual hands (or tools) and other surfaces. This kind of haptic information, however, remains unlike real-world feedback for most movements. Specialised feedback devices are currently being developed, such as haptic gloves and the Tesla full body suit, but extensive haptic feedback from exoskeleton-based systems remains expensive and impractical. There is reason to believe this general lack of haptic information may further push users into a ventral mode of processing, as has been observed for basic reaching and grasping movements (Goodale et al.
1994).
Terminal tactile feedback from target objects, which is absent in VR, is necessary for normal, real-time, reaching and grasping. Reaching to a virtual target (e.g. a mirror reflection or imagined target object) with no end-point tactile feedback has disruptive effects on grasp kinematics (e.g. the normally tight scaling between in-flight grip apertures with object sizes) indicative of a switch from real-time visual control (dorsal mode) to one dependent on cognitive supervision (ventral mode) (Goodale et al.
1994; Whitwell et al.
2015). A recent investigation by Wijeyaratnam et al. (
2019) showed that when reaching to a target in a virtual environment (where the hand was represented by a cursor and no end-point feedback was present) movement kinematics were indicative of offline (i.e. ventral) control and impaired online corrective processes, even though visual feedback was available.
Such pantomimed reaching movements—those made to imagined, remembered or virtual targets which provide no endpoint feedback—are informative for understanding how the lack of haptic information may impact actions in VR. Pantomimed reaches to a target are made more slowly, reach a lower peak velocity and have lower movement amplitude due to inefficient ventral mode control (Goodale et al.
1994; Whitwell et al.
2015). Movements in VR are effectively pantomimed, as they provide no endpoint feedback, and accordingly are also slower and more exaggerated (Whitwell and Buckingham
2013). Taken together, the artificial presentation of visual depth cues, the peculiarities of haptic feedback, and the general uncertainty created by impoverished sensory information, seems likely to elicit a more ventral mode of control in VR than the real-world. If visually guided skills in VR do indeed rely on ventral mode control, even in part, skills learned or performed using these altered perceptual inputs may not be representative of their real-world counterparts.