The present study replicated evidence that reach-to-grasp performance with full binocular vision is faster (e.g., increased PV; shorter MD, LVP, GCT, GAT), more accurate (e.g., better grip-to-object size scaling at peak and contact; fewer errors and corrections) and precise (e.g., less variability in initial digit positioning in the object’s depth plane) than when viewing is restricted to one eye. It also replicated evidence that performance is generally slower—with significantly altered sub-action timing patterns—less accurate and imprecise when vision is available only for planning the up-coming actions. Only the earliest movement parameters (PV, tPV, tPG) and the directness of the reach path (HPL, mis-reaches) were hardly affected at all by the NVF condition and thus seemingly under exclusive feedforward control. These results confirm that our current subjects exhibited the typical binocular advantages and use of visual feedback for most aspects of prehension performance typically reported for normal adults.
Against these important pre-conditions, there were four main findings related to the major study aims. First, two key features of the reach, its PV and LVP duration, retained reduced, but significant, advantages from binocular viewing for planning hand transport, whereas those associated with virtually every aspect of the grasp were eliminated when binocular vision was absent after movement onset. The two exceptions to this were the tPG and GCT (Table
4) which temporally overlap the tPV and LVP, respectively, and were correlated with these transport components (Figs.
8,
9). Second, the losses of binocular advantage were unrelated to the distance of the goal object, whereas its size sometimes mattered. In particular, the larger (and heavier) object was associated with elimination of the normal binocular benefits for efficient reach–grasp coupling and grip application, with the smaller (less stable) object linked to the loss of normal binocular grasping accuracy at grip pre-shaping and initial object contact. Third, prolonged grip application times were responsible for altering the overall movement pattern in the NVF condition (Fig.
3) and for eliminating the normal binocular advantage for faster movement times. Fourth, durations of the early, middle/pre-contact and final/post-contact movement periods were uncorrelated with FV available, supporting the notion that there are differences in their modes of control. But they were all correlated with each other with NVF available, consistent with being outcomes of a single feedforward mechanism derived from the stored memorial representation of the up-coming tasks generated during the 1-s previews.
Some binocular advantages for planning hand transport
The first finding suggests that additional sources of binocular information available during the task previews were sufficient to enhance dynamic aspects of the reach. A general (main) effect of binocular viewing was also revealed by some improvements its spatial aspects (i.e., fewer mis-reaches, shorter hand paths; Table
3), implying a further advantage over monocular vision for programming the hand trajectory. One potential source of these binocular planning advantages could be ocular vergence-derived cues to the target’s absolute egocentric distance, a key determinant of the transport kinematics. Saccade and vergence latencies following target presentation are reported to be around 200 ms (Yang et al.
2002) and so are short enough that our subjects should have had ample opportunity to bi-fixate the goal objects in the 1 s planning time they were allotted. Moreover, vergence-related distance information has been shown to systematically influence the PV, LVP duration and amplitude of binocularly programmed reaches in normal adults (Mon-Williams and Dijkerman
1999; Melmoth et al.
2007); that is, some of the very measures of reaching performance for which our subjects exhibited some binocular planning advantages. This is consistent with our second finding that differences in the target’s distance, while affecting several reach and grasp parameters, was not a factor in eliminating any of our subjects’ normal binocular advantages in the NVF condition.
It was, nonetheless, surprising that any residual binocular advantage for reducing the duration of the final reach LVP (and the co-varying GCT) was observed when NVF was available, since these normal benefits have been widely attributed (e.g., Watt and Bradshaw
2000; Loftus et al.
2004; Melmoth et al.
2007,
2009; Anderson and Bingham
2010) to online processing (e.g., ‘nulling’) of horizontal disparities signifying the receding space between the hand/digit-tips and the goal object. One possibility is that vergence-specified target distance estimates—possibly with contributions from vertical image size disparities in the two eyes (Rogers and Bradshaw
1993)—were reliable enough to partly override the loss of online disparity cues normally used in the final approach. This would accord with the idea that data required for reach programming need only to be accurate enough to aid hand transport to the target while braking early enough to avoid colliding heavily with it (e.g., Loftus et al.
2004; Melmoth and Grant
2006). It would also align with evidence that people who cannot process horizontal disparities, because they lack stereovision, seem to adopt a strategic trade-off in which they dispense with spending time estimating hand–target depth relations during the LVP and GCT in favour of using non-visual, haptic feedback to correct their grip when subsequently contacting the goal object (Melmoth et al.
2009).
Indeed, binocular NVF compared to FV trials were associated with an especially marked (~ threefold) increase in corrections to the reach velocity during the braking period, so that the normal advantage of binocular over monocular vision for reducing the need for these was lost in the absence of online vision (Table
3). One interpretation of these adjustments is that they represent a strategic undershooting of the target, deliberately produced for safety reasons to prevent the programmed reaches colliding hard with the unseen targets. Another relates to observations by Wolpert et al. (
1995) that when subjects reach in the dark in the absence of a target, they slightly—but consistently—over-estimate the distance that their hand has actually travelled indicating a systematic bias in predictive reach control. Since it is likely that subjects will do this in the presence of a target too, this would require them to generate an extra acceleration/deceleration in their end-phase reach so as to make contact with it. Either way, the similar frequency of these corrections across views only in the NVF condition represents one of the few indicators in our data of an equivalence between binocular and monocular reach planning.
Little or no binocular advantage for planning the grasp
By contrast, there were multiple equivalences between using binocular or monocular vision for grasp planning in the NVF condition, supporting previous reports that the normal binocular advantage for enhancing most aspects of grip timing, accuracy and precision derive from online disparity processing. Importantly, our data now indicate that this may apply to formation of the PGA at hand pre-shaping, contrary to the common assumption (e.g., Melmoth and Grant
2006) that the normal advantage for this grasp parameter arises from exploiting additional disparity cues to the target’s solid 3D properties at the programming stage. Yet the width of the PGA was nearly identical when our subjects formed their grasp for both small and large targets regardless of whether this extra information was present during the task preview. Instead, it was only when binocular vision was available online that an advantage for improved PGA sizing occurred, selectively related to the smaller of the two objects (Fig.
6). This latter observation is not unusual, as we (Melmoth and Grant
2006) and others (e.g., Servos et al.
1992; Watt and Bradshaw
2000; Keefe and Watt
2009; Keefe et al.
2011) have previously found that monocular viewing is associated with a relative PGA ‘over-sizing’ for smaller (e.g., ≤ 40 mm wide) targets. The effect is typically ascribed to uncertainty in judging an object’s true size when planning monocular grasps, with selective over-sizing for smaller/less stable targets a precautionary strategy designed to ensure their successful capture without knocking them over. But our data suggest that the addition of disparity cues to target solidity during grip planning does little to improve confidence in these judgements.
This would be consistent with evidence, some of which we previously overlooked, that early online vision of the target is critical for PGA formation. More specifically, it has been shown that abruptly increasing the apparent size of an object at the moment of movement onset after subjects have planned their grasp for a smaller target results in gradual widening of the evolving grip to re-scale the PGA to the new target’s dimensions (Paulignan et al.
1997; Karok and Newport
2010), with wider/safer PGAs also gradually produced when vision is suddenly occluded during the earliest (acceleration) phase of the reach (Fukui and Inui
2006,
2013). Chen and Saunders (
2016) have further shown that changing the size of the object-to-be-grasped by introducing a brief mask just after the peak reach velocity results in accurate corrections to the grip at contact appropriate for the dimensions of the new target. One possibility is that online disparity processing early in the movement is involved in comparing the evolving grip aperture with the target’s dimensions to improve PGA scaling, whereas afterwards it is more involved in comparing relative 3D positions of the digit tips and their pre-selected contact points on the object in the hand–target approach period, when our correlation analyses showed that adjustments to the closing aperture can enhance end-point grip accuracy and precision.
Both this specific conclusion regarding the PGA and our more general one regarding the very limited role of binocular vision in grasp planning, however, require some qualification. First, although the data shown (Fig.
6) support that conclusion and are similar to those obtained by Watt and Bradshaw (
2003), we should note that the relevant three-way interaction did not quite meet the criterion of statistical significance. Second, Keefe et al. (
2011) previously found that better PGA scaling for smaller targets was reduced under binocular compared to monocular NVF conditions, although not as markedly as we did. Their study involved targets defined only by stereo/disparity- or by texture/perspective-cues in a virtual reality set-up, with observers allowed to grasp real, presentation-matched, objects at the end of the movements to provide veridical haptic feedback. But it could be that their subjects inevitably placed a greater weighting on the disparity information present within the limited subset of available cues during binocular grasp planning than did ours, who were operating in a more natural environment, richer in alternative sources of monocular 3D information. We found an overall correlation between shorter movement times and wider PGAs in this condition. This relationship suggests a speed–accuracy trade-off (e.g., Wing et al.
1986; McIntosh et al.
2018), whereby the faster-moving participants—perhaps in an effort to grasp the more challenging object before their memorial representation of it had substantially degraded—may have built an extra safety margin into their PGA which contributed to the more marked effect we observed. Consistent with this possibility, post hoc analyses revealed a significant correlation between shorter movement durations and wider peak grips when our subjects binocularly planned to grasp the smaller (Spearman’s
\(\rho\) = 0.54,
p = 0.014), but not the larger (
\(\rho\) = 0.09,
p = 0.7), object in the absence of visual feedback. We do acknowledge, though, that in other studies more comparable to ours (e.g., Jakobson and Goodale
1991; Whitwell et al.
2008; Keefe and Watt
2009; Hesse and Franz
2010) binocular PGA scaling was not so affected under NVF conditions.
In this context, we only used the same two objects as targets which the subjects picked up at the end of their movements. This may have provided them with familiarity-based information derived from haptic feedback and from retinal image size cues which have sometimes (Marotta and Goodale
2001; Keefe and Watt
2009)—although not always (McIntosh and Lashley
2008; Borchers et al.
2011)—been suggested to be more beneficial for calibrating monocular compared to binocular grasps. The NVF trial blocks also always followed the FV blocks providing further opportunities for short-term associative learning of the specific object presentations to influence performance in the absence of online vision. An important new finding in these regards was that the altered overall movement patterns occurring in the NVF condition across all three views (Fig.
2) resulted mainly from a longer proportion of time spent in contact with the objects during their manipulation. The relevant dependent measure, the grip application time, corresponds to the period during which the thumb and finger secure the target and generate the grip and load forces needed to lift it. This period is known to increase with target weight (Weir et al.
1991) and is considered to be under predictive control as such learned representations of an object’s material properties are reported to play an increasing role in planning the scaling of these forces in advance of repetitive lifts (Johansson and Westling
1988) with purely visual analyses of the object’s likely size–weight relationship correspondingly subordinated as it becomes more familiar (Mon-Williams and Murray
2000). The fact that our subject’s grip application times increased across all views in the NVF trial blocks is, therefore, opposite to the effect expected of a strong contribution of familiarity-based object knowledge in planning its lift. However, we also found that the selective advantage of binocular vision for reducing the time spent in contact with the larger/heavier object was completely lost when it was not available to guide the grasp (Fig.
5). This effect occurred, at least in part, because the relative increase in the GAT for this object was much smaller in the monocular (~ 75 ms) compared to binocular (~ 130 ms) NVF conditions, which is in line with the possibility that familiarity may have been more useful for grasp planning with one eye.
As in Weir et al. (
1991) and in our previous work (Melmoth and Grant
2006), we observed two main types of object contact ‘error’ in the grip profiles obtained from our current subjects. One involved no change at all in the size of the grip aperture once contact had been established, but with an unusually long time spent before executing the lift. This indicates that although their digits were initially well placed on the object, subjects appeared to require confirmation of the grip’s stability via haptic feedback before picking it up. In fact, this type of accurate, but prolonged, grip application occurred less commonly across all views in the NVF compared to FV conditions (not shown). The other involved a corrective re-opening and closing of the digits, indicating that their initial contact with the object was inaccurate and that haptic information was being used in a feedforward–feedback fashion to shift them into more secure positions. The occurrence of this type of post-contact grip adjustment increased significantly in the NVF condition, and most markedly after planning the grasp binocularly (Table
4). These observations extend our arguments above by suggesting that the loss of binocular advantage for the GAT was due to inaccuracies and inconsistencies in end-point thumb and/or finger contacts with the objects, particularly in their depth plane (Table
5), with resolution of these difficulties mediated by greater dependence on haptic digit–object interactions during the contact period. Melmoth et al. (
2009) found that adults with selectively reduced or negative disparity processing capabilities exhibit a similar set of deficits in end-point grasping accuracy and precision—including prolonged grip application times on heavier objects and frequent post-contact grip adjustments—with closed-loop binocular viewing. As implied above, the non-availability of online stereo/disparity information, therefore, most likely accounted for the pattern of end-point binocular grasping deficits in the current NVF condition, even though the timing of the grip aperture closure period was less affected.
We conclude that binocular viewing during prehension planning is associated with some slight improvements, over monocular vision, in the feedforward/predictive programming of faster velocity (c.f., Jackson et al.
1997) and straighter reaches with faster hand–target approach times, whereas it provides no obvious benefits for grasping, including PGA formation. Such dissociations, even if contrary to our original thinking, are to be expected since proficient performance of different phases of the transport and grip components of prehension are generally considered to depend on analysis of different types of visual information by anatomically and functionally distinct superior parietal–dorsal premotor (dorsomedial) and intraparietal–ventral premotor (dorsolateral) cortical networks, respectively (Rizzolatti and Matelli
2003; Grafton
2010). Perhaps not coincidentally, given our findings, one of the dorsomedial network areas of superior parieto-occipital cortex (SPOC) appears to be primarily—if not, exclusively—concerned with the automatic encoding of target information needed for planning the reach (Pisella et al.
2000; Gallivan et al.
2009; Lindner et al.
2010; Vesia et al.
2010; Glover et al.
2012) and is a selective processing site of near-space vergence-derived signals (Quinlan and Culham
2007).
It is less clear whether any areas in the dorsolateral grasp circuit are selectively involved in only its pre-movement planning (Glover et al.
2012). This would include the anterior intraparietal (AIP) area, known for some time to be necessary for both deciding on and preshaping the optimal grip for different types of graspable objects (Gallese et al.
1994; Binkofski et al.
1998; Murata et al.
2000; Begliomini et al.
2007) and to be active when subjects precision grasp 3D objects in the absence of online vision (Culham et al.
2003). In fact, evidence from various sources (Toni et al.
2007; Grafton
2010; Verhagen et al.
2008,
2012; Begliomini et al.
2014) suggests that AIP can rapidly formulate grasp plans weighted to meet the spatial accuracy demands of the task and based on integrating whatever monocular pictorial and/or binocular disparity cues seem to be most informative about the target object along with any prior knowledge of its properties. But it then quickly switches roles to dynamically control the grasp online to ensure that the ultimate action goal is successfully achieved. Our present and other data (e.g., Servos et al.
1992; Bradshaw and Elliot
2003; Loftus et al.
2004; Melmoth and Grant
2006; Lee et al.
2008; Anderson and Bingham
2010) converge on the conclusion that it is in this latter role that binocular vision usually makes its most significant contributions to the proficiency of prehension movements.
The preceding arguments have followed those of our original work (Melmoth and Grant
2006) in adhering to a commonly accepted conceptualization of prehension as requiring multi-factorial control of near-sequential reach–grasp–manipulate components (Jeannerod
1984), for which we have provided some support (Fig.
8). An alternative framework suggests that it more simply involves independent control of the thumb and index finger in aiming and guiding them to opposing contact points on the goal-object (Smeets and Brenner
1999). Evidence shows that, usually, either the thumb or the finger leads the way to make an initial soft landing at its pre-selected site on the target (Haggard and Wing 1997; Mon-Williams and McIntosh
2000; Melmoth and Grant
2012; Cavina-Pratesi and Hesse
2013; Grant
2015; Voudouris et al.
2016,
2018), the programming of which would likely benefit from binocular vergence cues to the absolute distance that the given digit needs to travel, as they do for single finger aiming-in-depth (Melmoth et al.
2007). The framework further suggests that the PGA is merely an emergent property of each digit’s independent trajectories, rather than a specifically controlled grasp parameter, with the thumb–finger separation scaling for target size because the movement of each digit needs to incorporate a margin for clearing the object’s opposing sides. Online binocular disparity processing could provide advantages for ensuring such clearances occur so that unintended collisions with object are avoided and that the digits then approach their contact sites along the pre-selected opposition axis through the target, as the framework specifically contends (Smeets and Brenner
1999; Verheij et al.
2014). These re-formulations of our conclusions are important, because they indicate that our main findings are compatible with key precepts of this alternative model.