In this study, we investigated the behavioral and electrophysiological signatures of encoding the production of two distinct speech modes (i.e., loud speech (LS) and whispered speech (WS)) relative to standard speech (SS). Since the same procedure and material were used for both speech modes, we can broadly comment on the discrepancies and similarities across experiments. In the two experiments, behavioral results demonstrated longer initialization times for non-standard speech mode, with the result driven by participants that started the experiment with a block in the non-standard condition. This intriguing result may be interpreted as a “novelty bias” as speakers are not accustomed to speaking louder and whispering over such a long period of time. As a result, participants are potentially less familiar with the task and this behavioral encoding cost would thus need further investigation with another experimental plan. Current behavioral results however converge with previous studies using different paradigms and showing a cost of encoding non-standard speech modes (Zhang & Hansen
2007; Bourqui et al., submitted). Some authors have proposed that different speech modes with specific phonatory and articulatory features would involve unique encoding processes in comparison to standard speech (Scott,
2022; Zhang & Hansen
2007). Comparing the electrophysiological results across the present two experiments suggests that speech modes cannot be grouped as a whole entity encoded in the motor programming stage (i.e., last encoding process preceding articulation) as suggested in the FL model ( Van der Merwe,
2020). Indeed, EEG/ERP results do not converge as LS seems to entail important electrophysiological modulations while WS electrophysiological activity is very close to SS, a null result that has been reported previously (Sikdar et al.
2017). Particularly, LS electrophysiological activity differed in several times periods that seems to extend beyond the programming time-window, one close and one quite distant from the vocal onset. Our interpretation is that only the significant difference in strength of the last microstate preceding the vocal onset (map C in Fig.
2) could be considered as the “increase in neuromotor drive” proposed in the study of Whitfield (2021). The present results thus validate previous propositions by providing neuroimaging data indicating that speaking loud entails changes in temporal dynamics and an increase in brain activation during motor encoding. Additionally, they also replicate the finding from Sikdar et al., (
2017) showing that WS and SS are similar on the electrophysiological level. In this particular case, the microstates results invalidate the idea that an additional mechanism is responsible for producing whispered utterances as proposed in Tsunoda et al., (
2011). Indeed, the same microstates maps or the same encoding processes were found for both WS and SS. However, the dynamics of brain activation underlying these processes did not differ across conditions. In the light of the electrophysiological data, whispering cannot be distinguished from speaking normally and thus the literature should perhaps adopt a more nuanced approach to understand and characterize this mode.
Moreover, if the time-window of ERP modulations for LS seem to encompass a large portion of the time-window associated to motor speech encoding, likely planning and programming in the FL model, the present results can be also related to the neurocomputational framework of speech production and acquisition from Guenther (
2016) named Direction Into the Velocities of Articulators (DIVA) model. Indeed, although there is no input so far on the dynamics of brain activation in the latter (Tourville & Guenther
2011), the present findings challenge our comprehension of the feedforward control system. If one assumes that the speech sound map (SSM, see more in Guenther & Vladusich
2012) corresponds to the motor planning in the FL model (i.e., where motor plans are retrieved) while the Articulatory map corresponds to the motor programming stage (i.e., where spatiotemporal and force dimensions are specified), our outcomes suggest that LS could be encoded somehow all along the process of activating the cells in the SSM and transmitting the motor targets to the Initiation Map and the Articulatory Map. For future studies, investigating speech modes thus seems to provide an interesting window to understand the intricate interplay between the functional units in the feedforward control system.