Introduction
Spasmodic dysphonia (SD) is a severe voice disorder characterised by involuntary disruption of phonation [
1]. The aetiology of SD was originally thought to be psychoneurotic, but it has become clear that the cause is neurological. Nowadays, injecting botulinum toxin A is the preferred therapy [
1].
There are three general types of SD: adductor, abductor and mixed adductor/abductor. The most common of these is adductor spasmodic dysphonia (AdSD): it is characterised by strained, strangled and effortful phonation with words frequently cut off or difficult to start. Irregularly intermittent increases in adductor muscle activity coincide with phonatory offsets [
1].
We define AdSD as “a focal laryngeal dystonia resulting in a strained voice quality with spastic breaks” which according to Rees et al. [
2] improves with botulinum toxin A injections.
A reliable methodology is essential for evidence-based (voice) assessment and/or therapy. The European Laryngological Society protocol for voice assessment focuses on five approaches: (1) perception through the GRBAS Scale (perceptual evaluation of voice quality), (2) visualisation through videostroboscopy, (3) acoustic analysis using the multidimensional voice programme from Kay-Elemetrics (MDVP), (4) aerodynamic measurements and (5) self-assessment. However, this protocol is designed to assess “ordinary” laryngeal dysphonia such as in vocal nodules, and so on which means that in some categories of severe voice pathology such as substitution voices or SD it is not applicable due to the strong signal irregularity [
3].
For perceptual evaluation, several scales have been developed, but only few rated for reliability. Webb et al. [
4] found that by comparing three perceptual scales (The Buffalo Voice Profile, The Vocal Profile Analysis Scheme (VPA) and GRBAS) only the GRBAS scale was reliable across all parameters except for the parameter “strain”. Dejonckere et al. [
5] demonstrated that the parameters asthenicity, strain and instability provide low interrater agreement values: 0.69 for breathiness and 0.65 for asthenicity/strain and concluded that only the G, R and B parameters have clinical importance. This confirms the German perceptual rating with the RBH scale (
R Rauigkeit,
B Behauchtheit,
H Heiserkeit) [
6].
It seems that in substitution voicing (SV), the GRBAS is not applicable because of the extreme severity of the voice pathology (often SV scores as a G3) and the inability of the GRBAS scale to describe variations in quality or minor differences. GRBAS is also inadequate in scoring fluency problems, which are one of the main characteristics of spasmodic dysphonia [
3].
Moerman et al. [
3] developed a rating scale, called IINFVO, which is appropriate for extremely deviant voices such as substitution voices. It consists of five parameters “Overall Impression”, Impression of Intelligibility”, “Noise”, “Fluency” and “Voicing”, which are all scored on a visual analogue scale (VAS) from 0 to 10. The IINFVo was found to be reliable when scored by professionals [
7].
Regarding the acoustic analysis, the computer program traditionally used for this purpose, MDVP (Kay Elemetrics), is geared to the peaks of the signal in determining the F0 and only works reliably if the acoustic signal (1) contains little or no noise and (2) shows a certain amount of periodicity [
8]. Although the evaluation of running speech better matches reality, MDVP can only thoroughly analyse sustained vowels. SD cannot reliably be analysed by MDVP because of its difficulties in determining the fundamental frequency (F0), the strong aperiodicity of the signal and the high amount of voice breaks. Voice onset problems and voiced/unvoiced variations are the bigger challenge in acoustic analysis.
Moerman et al. [
3] used the Auditory Model based Pitch Extractor (AMPEX), developed by Van Immerseel and Martens [
9] to analyse substitution voices. They proved it is robust in differentiating (1) normal speech, (2) one vocal cord speech, (3) tracheo-oesophageal speech and oesophageal speech. AMPEX was used to analyse various speech types (Table
1: “Sustained vowels”, “Syllables p2, b2, p3 and b3, “text (running speech)” and “count (digit strings)”. For its analysis, the auditory model produces a 27-dimensional feature vector every 10 ms. Each vector consists of 23 spectral parameters, an energy value, a V/U (voiced/unvoiced) value, F0 value and F0 evidence. AMPEX is also able to extract the F0 against background noise (Table
2).
Table 1
Content of the voice recordings of AdSD patients consisting of vowels, syllables (VCV and CVCVCV) and running speech (text)
Vowels | Vowels /a/, /i/, /u/ |
p2 | The VCV utterances /apa/, /ipi/, /upu/ |
b2 | The VCV utterances /aba/, /ibi/, /ubu/ |
p3 | The CVCVCV utterances /papapa/, /pipipi/, /pupupu/ |
b3 | The CVCVCV utterances /bababa/, /bibibi/, /bububu/ |
Text | German phonetic rich text: Einst stritten sich Nordwind und Sonne, wer von ihnen beiden wohl der Stärkere wäre, als ein Wanderer, der in einen warmen Mantel gehüllt war, des Weges kam. Sie wurden einig, dass derjenige für den Stärkeren gelten sollte, der den Wanderer zwingen würde, seinen Mantel abzulegen |
Table 2
Parameters of the acoustic analysis by AMPEX for the voice quality of AdSD patients
PVF | Percentage of voiced frames |
PVS | Percentage of voiced speech frames |
AVE | Mean voicing evidence of voiced frames |
Jit | Evidence weighted F0-variation in voiced frames |
Jc | Evidence weighted F0-variation in reliable voiced frames |
PUVF | Percentage of unreliable voiced frames |
VL90 | 90th percentile of the voicing length distribution |
Our investigation focuses on whether AMPEX could be a robust assessment tool in analysing AdSD.
Based on the basic protocol by Dejonckere [approved by the guidelines elaborated by the committee on phoniatrics of the European Laryngological Society (ELS)] and the findings of Moerman we evaluate (1) the acoustic analysis of sustained vowels and running speech with AMPEX and (2) the perceptual evaluation with IINFVo in AdSD [
5,
10]. In addition, we asked patients to self assess them by responding to the question: rate the quality of your voice on a scale from 0% (worst voice quality ever experienced) to 100% (best voice quality ever experienced).
A visual evaluation of the glottis function and the performance of aerodynamic measures, although of clinical interest were omitted because our focus was on alternative measurement methods specifically aimed at acoustical analysis and perceptual evaluation. The aim of this study is to determine whether the alternative tools (IINFVo and AMPEX) are suitable for describing voice quality in AdSD.
Discussion
The aim of this study was to determine if two alternative tools (IINFVo and AMPEX) are suitable for analysing the severe voice pathology of spasmodic dysphonia. The relative infrequency cases of SD should be taken into consideration when interpreting the statistics.
IINFVo
The inter-rater consistency for professionals shows high correlations (the mean scores of the three judges range from 0.827 to 0.950), except for “Noise” (0.177). Also Kendall’s tau shows significant values: ranging from 0.726 to 0.823, except for noise (0.233).
The intra-rater consistency, measured for one professional judge 3 months later, also showed significant values for all parameters in the IINFVo scale, ranging from 0.863 to 0.943, except for the parameter “Noise” (0.224).
Table
5 shows that the mean of the MAD values of the professional judges for the parameters “Impression”, “Intelligibility”, “Fluency” and “Voicing” are between 0.56 and 1.42 on a 0–10 scale. As such it suggests that these parameters are valuable in evaluating SD voicing for professionals.
In further investigation, the parameter “Noise” may probably be redefined for patients with AdSD and the training of the rater should focus on this specific problem.
AMPEX
When comparing the data of AdSD with the data of normal voices and of patients with substitution voices (tracheoesophageal and oesophageal voices, voices with one vocal cord) as shown in Tables
9 and
10, one parameter was seen to be very deviant: AVE, the average voicing evidence. The AVE for AdSD was only 0.8 for “Vowels” and 0.7 for “Text”. The corresponding values for normal voices were 6.8 (Vowels) and 5.1 (Text). Even the severe voice pathologies (tracheoesophageal and oesophageal voices, voices with one vocal cord) showed significantly higher values than AdSD voices: 2.8–5.1 for “Vowels” and 2.5–3.8 for “Text”. This could indicate the specific difficulty experienced by AdSD patients in producing voiced/unvoiced sounds where they are required.
For syllables p3, the mean percentages of frames with an unreliable F0 (PUVF) do not differ much for people with normal speech and AdSD patients: 8.9 and 12.3, respectively (Table
11). However, for syllables b3 the people with normal speech only show a mean value of 3.0, whereas the AdSD patients show nearly the same mean value as for syllables p3 (12.9). This shows the instability of the F0 for syllables loaded with voiced consonants.
The mean VL90 of AdSD patients in syllables p3 is higher (55.2) than that of normal speakers (21.0) whereas in syllables b3 the two are very comparable (AdSD 60.5, normal speakers 57). This would suggest that voices with AdSD may not be able to accurately control the changes between voiced and unvoiced segments.
Comparing the percentage of voiced frames (PVF) for the syllables loaded with unvoiced consonants (p3) of normal speakers and AdSD patients it follows that the mean value for normal speakers (40.2) is about twice as high as the mean value for AdSD patients (21.0), whereas the mean values for syllables loaded with voiced consonants (b3) are nearly similar (65.2 and 65.6). This would indicate that the syllables with voiceless consonants present more problems for the voicing part.
This would confirm the suggestions of Roy [
12] et al. for certain task specificity. Roy poses the question, based on the clinical observations, whether sentences loaded with voiced consonants are more difficult, e.g. that sentences loaded with voiced consonants would provoke more frequent and severe spasms/voice breaks [
12]. Our findings would suggest that the alternation between voiced and unvoiced segments in syllables with unvoiced consonants is the greater challenge in AdSD. Hence, task specificity is assumable, but is more complex and needs further investigation.
IINFVo and AMPEX
The results in Table
8 show significant correlations, ranging from 0.608 to 0.818, between all the parameters of the IINFVo and AMPEX except for “Noise”. It emphasises that both tools are able to measure specific voice dimensions in spasmodic voicing. Both the assessment methods are complementary and seem to provide us with the valuable information without being mutually redundant.
Conclusion
This study suggests that multidimensional voice assessment consisting of objective acoustic analysis using AMPEX and perceptual evaluation using IINFVo could be a robust tool for assessing severe voice pathology such as spasmodic dysphonia. In follow-up studies, these tools may be used for determining markers for segregating and diagnosing AdSD.
Clinical observations by Roy et al. suggest that patients with AdSD would have more difficulties in speaking sentences loaded with voiced consonants [
12]. Our findings indicate that voices with AdSD may not be able to control the changes between voiced and unvoiced segments. Our findings confirm the possible presence of task specificity, but also illustrate complexity and the need for further investigation.
Acknowledgments
We thank the University Hospital Hamburg, Germany, for recording patients with spasmodic dysphonia and the Department of Phoniatrics and Paediatric Audiology, University Hospital Münster, Germany, for scoring the samples with IINFVo.