Research report
Modulation of neural responses to speech by directing attention to voices or verbal content

https://doi.org/10.1016/S0926-6410(03)00079-XGet rights and content

Abstract

We studied with functional neuroimaging the cortical response to auditory sentences, comparing two recognition tasks that either targeted the speaker’s voice or the verbal content. The right anterior superior temporal sulcus responded during the voice but not during the verbal content task. This response was therefore specifically related to the analysis of nonverbal features of speech. However, the dissociation between verbal and nonverbal analysis was only partial. Left middle temporal regions previously implicated in semantic processing responded in both tasks. This indicates that implicit semantic processing occurred even when the task directed attention to nonverbal input analysis. The verbal task yielded greater bilateral activation in the fusiform/lingual region, presumably reflecting an implicit translation of auditory sentences into visual representations. This result confirms the participation of visual cortical regions in verbal analysis of speech [15], [16].

Introduction

Speech processing implicitly requires processing of the human voice. Voices, however, are not only a vehicle for language but also convey nonverbal information as the speaker’s gender, identity, emotional state, which can be perceived independently from verbal information. Accordingly, neuropsychological data point to a partial neuroanatomical dissociation between vocal and verbal processing. It has been observed that lesions of the right temporo-parietal cortex impair recognition of a speaker’s voice while speech comprehension is preserved [43], [44], [45].

Previous functional neuroimaging studies found both superior temporal sulci (STS) to be more active when listening to human voices than when processing other sounds [1]. These voice responsive areas along both STS displayed a strong preference for natural speech stimuli over their scrambled counterpart as well as over non-speech vocal stimuli [2]. Among these regions, however, only the right anterior STS activated significantly more for nonspeech vocalizations than for a scrambled version of them [2]. These findings suggest that the right anterior STS processes nonverbal components of speech that are related to voices independent of their low-level physical stimulus properties and the verbal content they express. However, this conclusion is constrained in two ways. Firstly, if an area responds better to verbal than to nonverbal vocalizations, as in the aforementioned studies, this points to an interaction of verbal and nonverbal feature processing rather than an exclusive processing of nonverbal information conveyed by voice stimuli. The conclusion of a functional specificity of the right anterior STS for voices would only be justified if this area was at the same time proven to be insensitive to verbal information. Secondly, scrambled vocal stimuli control only to a certain extent for sensory input features. In the comparison of vocal and scrambled stimuli, voice-specific effects are potentially confounded by processing of the fine grained acoustic structure of vocal stimuli.

Previous functional neuroimaging studies on speech analysis have not only shown activation of the left temporal region, but also of visually responsive areas [15]. The location of these visual activations presumably reflects individual strategies to translate auditory information into specific visual representations [16]. Although speech-related activations in visual areas always occurred in response to meaningful stimuli, it is still unclear whether they are implicitly driven by speech sounds or specifically related to vocal or verbal components of speech analysis.

In the present study, we addressed the following issues: (1) Do specific and distinct brain regions analyze verbal and vocal components of speech, respectively? (2) Does visual cortex participate in vocal or verbal processing of speech?

We therefore particularly attempted to minimize the overlap between verbal and vocal components of speech processing and to avoid potential confounds related to sensory input structure. Attentional modulation of neural activity is a means to that end and we hence employed tasks that, while dealing with identical stimulus material, selectively targeted different features of speech, i.e. voice and verbal content, respectively. Our assumption was that, depending on the feature that is the focus of the task, activity would increase in those regions that are recruited by the corresponding stimulus feature in a sensory experiment. Such a top-down approach has successfully confirmed cortical functional specialization in other sensory modalities [6], [21], [30], review in 10 and 23]. Two tasks were performed on identical sets of spoken sentences and emphasized either vocal or verbal processing. They were further controlled by an analogous task involving speech envelope noises.

Section snippets

Subjects

Fourteen volunteers participated in the study (eight women, six men; aged 20–51 years, written informed consent). They all had normal hearing and no history of neurological disease. All were right handed as determined by a modified version of the Edinburgh Inventory of handedness [31] including the following questions: “With which hand do you (1) write (2) draw (3) throw (4) use a pair of scissors (5) use a toothbrush (6) use a knife (without fork) (7) use a light-match (8) open a jar? (9)

Behavioral results

The responses given by key-presses revealed a recognition rate of 96.76% in the sentence task (47% false negative, 53% false positive), 86.79% in the voice task (53% false negative, 47% false positive) and 92.25% in the noise task (32% false negative, 68% false positive). The good performance rate indicates that the stimuli were perfectly audible despite the presence of the noise produced by the scanner.

Recognition rates during the voice task were significantly lower than in the semantic task [t

Discussion

We sought to dissociate verbal from nonverbal aspects of speech processing by directing the task to the voice or to the verbal content of sentences. Contrasting each task against an equivalent task performed on speech envelope noises permitted to determine regions involved in natural speech processing. Both sentence and voice recognition task activated auditory language areas, i.e. bilateral middle and superior temporal gyrus (BA21/22). In agreement with previous observations, we did not find

Conclusion

In two respects this study goes beyond previous findings on the neural processing of human voices: (1) by showing voice specific responses that cannot be attributed to the processing of acoustic features of voices, (2) by showing that the right anterior STS is specifically involved in voice processing without detectably contributing to verbal processing. We additionally confirm a participation of visual cortical regions in the verbal analysis of speech.

Acknowledgments

ALG is supported by Alexander von Humboldt Foundation, EE and AK by the Volkswagen Foundation.

References (49)

  • P. Rama et al.

    Working memory of identification of emotional vocal expressions: an fMRI study

    Neuroimage

    (2001)
  • D. Schmidt et al.

    Brain systems engaged in encoding and retrieval of word-pair associates independent of their imagery content or presentation modalities

    Neuropsychologia

    (2002)
  • P. Sterzer et al.

    Neural correlates of spontaneous direction reversals in ambiguous apparent visual motion

    Neuroimage

    (2002)
  • H.M. Sussman

    A neuronal model of vowel normalization and representation

    Brain Lang.

    (1986)
  • S.L. Thompson-Schill et al.

    A neural basis for category and modality specificity of semantic knowledge

    Neuropsychologia

    (1999)
  • D.R. Van Lancker et al.

    Impairment of voice and face recognition in patients with hemispheric damage

    Brain Cogn.

    (1982)
  • D.R. Van Lancker et al.

    Phonagnosia: a dissociation between familiar and unfamiliar voices

    Cortex

    (1988)
  • P. Belin et al.

    Voice-selective areas in human auditory cortex

    Nature

    (2000)
  • J.R. Binder et al.

    Human temporal lobe activation by speech and nonspeech sounds

    Cereb. Cortex

    (2000)
  • R.L. Buckner et al.

    Functional anatomic studies of memory retrieval for auditory words and visual pictures

    J. Neurosci.

    (1996)
  • L. Cohen et al.

    Language-specific tuning of visual cortex? Functional properties of the visual word form area

    Brain

    (2002)
  • M. Corbetta et al.

    Attentional modulation of neural processing of shape, color and velocity in humans

    Science

    (1990)
  • L. Cornette et al.

    The neural substrate of orientation short-term memory and resistance to distractor items

    Eur. J. Neurosci.

    (2002)
  • J.F. Demonet et al.

    The anatomy of phonological and semantic processing in normal subjects

    Brain

    (1992)
  • Cited by (250)

    View all citing articles on Scopus
    View full text