Reliability of Clinician-Based (GRBAS and CAPE-V) and Patient-Based (V-RQOL and IPVI) Documentation of Voice Disorders

doi:10.1016/j.jvoice.2006.05.001

Journal of Voice

Volume 21, Issue 5, September 2007, Pages 576-590

https://doi.org/10.1016/j.jvoice.2006.05.001 Get rights and content

Summary

This study examined the reliability of two methods for documenting voice quality by clinicians and compared the methods for documenting patients' perceptions of voice quality. It involved a prospective reliability study and a retrospective chart review. Reliability of two clinician-based voice assessment protocols—Grade, Roughness, Breathiness, Asthenia, Strain (GRBAS) and Consensus Auditory Perceptual Evaluation–Voice (CAPE-V)—was evaluated. These two protocols were then compared after use in voice assessments of 42 males and 61 females performed by a certified speech-language pathologist specializing in the assessment of voice disorders. In addition, two patient-based scales (Voice Related Quality of Life, or V-RQOL, and Iowa Patient's Voice Index, or IPVI) obtained from the same patients were compared with each other and with the clinician-based scales. Reliability of clinicians' ratings of overall severity of dysphonia using GRBAS and CAPE-V scales was very good (r > 0.80). Agreement between V-RQOL Total scores and IPVI ratings of the patient's perceptions of impact of dysphonia was less strong (Spearman's r = −0.76). There was relatively weak agreement between patient-based and clinician-based scales. Clinician's perceptions of dysphonia appeared to be reliable and unaffected by rating tool, as indicated by the high level of agreement between the two rating systems when they were used together. The CAPE-V system appeared to be more sensitive to small differences within and among patients than the GRBAS system. The V-RQOL and IPVI approaches to documenting patient's perceptions of dysphonia agreed less well possibly due to differences in patient dependence on voice and on interpretation of the rating tool items. The differences between clinician-based and patient-based data support the conclusion that clinicians and patients experience and consider dysphonia very differently.

Introduction

Perceptual assessment is the foundation of voice assessment and fundamental to studies of treatment outcomes for surgical and behavioral approaches to management of voice disorders.1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Approaches to documenting perceived voice qualities have evolved from descriptive approaches to more concise coding systems.11, 12 Many are designed for well-trained, experienced voice professionals while others are intended for untrained patient use.

A system for voice professionals based on a multidimensional analysis of voice qualities evolved from the work of several researchers.13, 14 Known as GRBAS (Grade, Roughness, Breathiness, Aesthenia, Strain), this ordinal system was popularized after being described by Hirano.¹⁴ Concerns regarding the reliability of such systems resulted in considerable discussion. Bassich and Ludlow¹⁵ reported that four inexperienced raters required 8 hours of training before reaching 80% agreement in their ratings of normal and pathological voices. De Bodt et al¹⁶ reported that test-retest reliability of the Grade (G) parameter ranged from fair to good while that of the other parameters ranged from moderate to fair, based on kappa statistics. They also found that experience had an important impact on ratings. Gerratt et al¹⁷ suggested that explicit anchors are needed to maximize reliability of perceptual assessment of voice quality.

Kreiman et al¹² further suggested that scaling systems that rely primarily on ordinal or equal-appearing interval scales may have limited reliability potential. They suggested that a visual analog scaling procedure could serve to address several limitations of other approaches. This perspective was incorporated into a new scaling tool produced by a group of clinical speech-language pathologists and voice scientists specializing in perceptual assessment of voice at the Consensus Conference for Perceptual Measure of Voice Quality sponsored by the American Speech-Langauge-Hearing Association Special Interest Division #3 for Voice and Voice Disorders, June 10–11, 2002. The tool was called CAPE-V (Consensus Auditory Perceptual Evaluation of Voice) and used a type of visual analog scaling supplemented by various other descriptors. Instructions for using CAPE-V and a rating form are available online through the American Speech-Language-Hearing Association's Division 3 for Voice and Voice Disorders at http://www.asha.org/about/membership-certification/divs/div_3.htm.

Some controversy clearly remains. The concerns expressed by Kreiman et al¹² motivated Wuyts et al¹⁸ to compare the original GRBAS scale with a visual analog version of the GRBAS scale they designed. They asked 29 raters to evaluate the pathologic voices of 14 individuals. The authors reported that contrary to the findings of Krieman et al,¹² the original 4-point GRBAS scale yielded higher interrater agreement than did the visual analog version.

Another approach to documenting voice disorders arose from concerns of healthcare professionals and insurance providers that characterization of the presence and severity of disease processes requires the input of the individuals affected by the disease. In the area of voice, Smith et al¹⁹ found that patients' ratings of impairment due to voice disorders were similar in range and severity to those that were due to more severe medical diseases. Several “patient-perception” techniques have been described in the literature. Jacobson et al²⁰ described a 30-item version of an original 80-item questionnaire they called the “Voice Handicap Index.” Hogikyan and Sethuraman²¹ reported that their 10-item “Voice-Related Quality of Life” or V-RQOL instrument was a valid index of quality of life impairments due to voice disorders.

Although these approaches have been used in clinics and described in the literature, little is known about how they compare and relate to each other and how reliable they are. The purpose of this research was to examine the reliability of the clinician's rating systems when the two were used simultaneously. The two clinician's scales were used simultaneously to test the possibility that the structure of the two rating systems (4-point GRBAS vs 100-point CAPE-V) might impact reliability, as has been suggested in the literature. If the structure of the two scales has little impact, we would expect reliability for both scales to be very similar. However, rating variability of 1 point on a 4-point scale represents a difference of 25%, while rating variability of 1 point on a 100-point scale represents only a 1% difference. For this reason alone, it is possible that reliability of the two scales could be substantially different, even when used simultaneously.

An additional purpose was to compare and contrast two clinician-based and two patient-based approaches to documentation of voice quality and the effects of voice disorders on patients' quality of life. If the clinician's perceptions of voice quality are not unlike the patient's perceptions, it may be assumed that the ratings of voice quality may not be influenced by the personal experience of producing the voice. However, if it is the case that the experience of production colors the patient's percept, we may expect the clinician's ratings of quality to differ from those of the patient's. Also, if there is similarity between the two patient's rating scales, which attempt to capture how voice quality affects the patient's life, we may choose the one that is shorter, simpler, and less time consuming.

Section snippets

Methods

This research was reviewed and approved by the Institutional Review Board of the University of Iowa. Four tools for documenting perceptual judgments of dysphonia were used in the clinical assessment of voice disorders. Two of these (GRBAS and CAPE-V) were clinician-based approaches to characterizing perceptual aspects of voice disorders. Two others (V-RQOL and IPVI) were patient-based approaches to characterizing the patient's perception of the presence, severity, and impact of voice disorders

Intrarater reliability

Spearman's correlation coefficients calculated to estimate reliability of the clinician-based severity of dysphonia comparisons (GRBAS Grade and CAPE-V Severity) are presented in Appendix A. As stated previously, the voice sample set was selected to ensure a balanced representation of samples based on the original examining clinician's “Grade” ratings of dysphonia. Reliability of the ratings of the other clinician-based perceptual parameters (roughness, breathiness, etc.) was less meaningful

Discussion

As patients with disordered voices have received growing and more careful attention from speech-language pathologists and otolaryngologists, so have the tools that are designed to assist in characterizing the nature of those disorders. The importance of considering both the clinician's and the patient's perceptions has only recently been recognized. When a new tool becomes available, it is the clinician's responsibility to understand its strengths and weaknesses relative to tried and true

Acknowledgments

The authors thank Gail Kempster, PhD, for her insights and suggestions during development of this manuscript.

References (21)

J.A. Kitch et al.
Performance effects on the voices of 10 choral tenors: acoustic and perceptual findings
J Voice
(1996)
T. Bhuta et al.
Perceptual evaluation of voice quality and its correlation with acoustic measurements
J Voice
(2004)
I.V. Bele
Reliability in perceptual analysis of voice quality
J Voice
(2005)
M.S. De Bodt et al.
Test-retest study of GRBAS scale: influence of experience and professional background on perceptual rating of voice quality
J Voice
(1997)
F.L. Wuyts et al.
Is the reliability of a visual analog scale higher than an ordinal scale? An experiment with the GRBAS scale for the perceptual evaluation of dysphonia
J Voice
(1999)
N.D. Hogikyan et al.
Validation of an instrument to measure voice-related quality of life (V-RQOL)
J Voice
(1999)
T. Shipp
Some acoustic and perceptual factors in acute-laryngitic hoarseness
J Speech Hear Dis
(1965)
F. Darley et al.
Differential diagnosis patterns of dysarthria
J Speech Hear Res
(1969)
H. Takahashi et al.
Some perceptual dimensions and acoustical correlates of pathologic voices
Acta Otolaryngol Suppl
(1976)
B. Hammarberg et al.
Perceptual and acoustic correlates of abnormal voice qualities
Acta Otolaryngol
(1980)

There are more references available in the full text version of this article.

Cited by (233)

Use of Terminology and the Effect of Training on Auditory-Perceptual Ratings of Speaking Voice by Singing Teachers
2024, Journal of Voice
The purpose of this study was to investigate group differences between singing teachers and speech-language pathologists when rating dysphonic speaking voices and whether training using reference samples and the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) improves inter-rater reliability between and among the two groups. Differences in use of terminology and ratings could reveal potential for miscommunication in the team treatment of singers with voice disorders.
This is a prospective pre test post test cohort study with between and within group comparisons.
Recorded samples of dysphonic speaking voices were rated by 18 experienced singing teachers with free written descriptions and an Overall Severity (OS) rating of 0-100. Participants were then trained in application of the CAPE-V with verbal definitions and reference samples exemplifying characteristics of disordered voice. Participants rated the samples a second time using the CAPE-V. The pre and post training ratings of participants were compared to composite ratings of six speech-language pathologists.
Descriptive statistics indicated the mean aggregate Overall Severity rating of speech-language pathologist (SLP) raters as 25.79 (SD = 6.10, SE = 2.49), as compared to 35.05 (SD = 12.72, SE = 3.00) for singing teachers. Differences in ratings were more pronounced in samples rated by SLPs as “mild” (OS 6-20) or “mild-moderate” (OS 21-35). ANOVA revealed statistically significant group differences between SLPs and singing teachers for the parameters Overall Severity (P = 0.0109, F = 7.8) and Strain (P = 0.0085, F = 8.35). While CAPE-V training did not significantly change the OS ratings of singing teachers, it did improve their inter-rater reliability from 0.67 pre training to 0.83 post training, with agreement similar to that of SLP raters (0.86). After training, participants responded “yes” to the presence of dysphonia in disordered samples more frequently.
The results support the recommendation of training singing teachers in perceptual evaluation of speaking voice to increase sensitivity to the presence of organic voice disorders and to encourage compatibility in terminology used among SLPs and singing teachers.
French-Language Adaptation of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V)
2024, Journal of Voice
This study aimed to adapt the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) protocol for perceptual voice assessment to the French language. The primary objective was to achieve consensus among an international panel of voice experts on the content of the adapted protocol.
To ensure the relevance and robustness of the French CAPE-V protocol, this study employed a systematic Delphi method and involved an international panel primarily comprising speech therapists and lecturers from France and Belgium. The multi-stage process included an initial panel size of 15 experts. Three rounds of online questionnaires, integrating both quantitative and qualitative data collection, were conducted. Participants provided feedback and ratings on various protocol elements until a consensus was reached. Adaptations targeted the choice of task stimuli (sustained vowel, sentence reading, semi-spontaneous speech), of the rating scales, and vocal quality terminology.
The Delphi process achieved consensus on all elements of the adapted CAPE-V protocol. Notably, the sustained vowel task saw consensus in favor of the vowel /a/. Sentence adaptations achieved substantial agreement, with the final set unanimously approved. The simple Visual Analog Scale emerged as the preferred rating scale. Agreement on terms for describing vocal qualities marked a crucial step in establishing a shared vocabulary among French-speaking voice experts.
The study successfully adapted the CAPE-V protocol for perceptual voice assessment to the French language through a systematic Delphi process. The final protocol closely resembles the original English version, maintaining its structure and core objectives. Consensus on sustained vowel tasks, sentence adaptations, rating scales, and vocal quality terminology indicates the relevance and robustness of the adapted protocol. Ongoing validation studies in France demonstrate the potential clinical utility of the adapted CAPE-V in French-speaking contexts, representing a significant step toward standardized and validated voice assessment tools for clinicians and researchers globally.
Silk-Hyaluronic Acid for Vocal Fold Augmentation: Safety Profile and Long-Term Voice Outcomes
2024, Journal of Voice
Silk-hyaluronic acid (silk-HA) is a novel vocal fold augmentation material used in humans since July 2020. We aim to describe indications, voice outcomes, and longevity data for silk-HA injectable when used for vocal fold injection (VFI) augmentation in a large cohort of patients with longer-term follow-up than preliminary clinical studies.
Retrospective chart review of Silk-HA injections for glottic insufficiency (GI) and follow-up between July 2020 and November 2023. Subject demographics, diagnoses, volume of material injected, VHI-10 data, time from injection, need for reinjection, and complications were collected. Blinded perceptual voice analysis of randomly selected pre- and post-intervention voice samples for unilateral vocal fold paralysis patients was performed by three voice-specialized speech-language pathologists, and changes in VHI-10 determined at various time intervals up to 1 year and beyond.
A total of 160 silk-HA injection procedures were performed: 59% female, with a mean age of 66 ± 13 (range 21–90) years. Ninety-four subjects had unilateral paralysis (58.4%); the remainder had scar, atrophy, paresis, or a combination thereof. Mean volume of silk-HA injected was 0.24 ± 0.14 cc. Major complications were rare, most notable for laryngoscopic evidence of hemilaryngeal edema (n = 6, 3.8%), with a readmission rate to hospital of 1.3% (n = 2). There was a statistically significant decrease in paired ΔVHI-10 and CAPE-V ratings for each of the postoperative follow-up intervals. A total of 24 (27.2%) repeat medialization procedures were recommended following silk-HA injection for unilateral paralysis.
This study demonstrates that silk-HA is a safe product for VFI augmentation, and effective injectable for the treatment of GI due to unilateral vocal fold paralysis. Based on the current data, it is reasonable to counsel patients that they should expect benefit for several months following the injection. If patients reach 1 year from their injection with a stable and satisfactory outcome, the majority experience ongoing benefit without need for additional procedures, however, the final duration of clinical effect appears to be years, but it is yet to be determined.
Sensitivity of Videolaryngostroboscopic Rating Tools to Differences in Dysphonia Severity
2024, Journal of Voice
This study evaluated the validity of two videolaryngostroboscopic (VLS) rating tools to detect differences in VLS ratings between normophonic speakers, mild, and moderate-severely dysphonic speakers.
Sixteen rigid VLS exams were obtained from four normophonic controls and 12 speakers with dysphonia (8 = mild, 4 = moderate-severe) secondary to laryngeal pathology. Eight clinicians rated nine vibratory VLS parameters for each exam using both the Voice-vibratory Assessment of Laryngeal Imaging (VALI) tool and a 100 mm visual analog scales (VAS). Ratings obtained for both right and left vocal folds (eg, mucosal wave, amplitude of vibration, nonvibrating portion) were averaged. One rating of overall severity of laryngeal function using a 100 mm VAS also was obtained. ANOVAs were used to evaluate differences in VLS parameters between three speaker groups (normophonic, mildly dysphonic, moderate-severely dysphonic) using these two rating tools.
There were statistically significant differences between controls and moderate-severely dysphonic speakers and for all VLS parameters except phase symmetry (P < 0.05) for both VALI and VAS ratings. Differences between mildly dysphonic and moderate-severely dysphonic and speakers were observed for 4/6 VALI ratings (mucosal wave, nonvibratory portions, phase closure, and regularity) and 5/6 parameters (mucosal wave, amplitude of vibration, nonvibratory portions, phase closure, and regularity) for VAS ratings. Significant differences between controls and mildly dysphonic speakers were not observed for VLS parameter rated using the VALI. There were significant differences between controls and mildly dysphonic speakers for 3/6 parameters (mucosal wave, amplitude of vibration, nonvibratory portion) using a VAS. Ratings of overall severity of laryngeal function differed between all levels of dysphonia severity.
Significant differences in VLS ratings were observed for comparisons of normophonic and moderate-severely dysphonic speakers and mild to moderately dysphonic speakers using the VALI and the VAS. However, the VAS scale appeared to better differentiate differences in VLS measures between normophonic speakers and those with mild dysphonia. Future studies should consider rating scale sensitivity when VLS rating tools are selected for clinical and research purposes.
Validation of Newcastle Laryngeal Hypersensitivity Questionnaire (LHQ-Br) in Brazilian Portuguese
2023, Journal of Voice
To analyze the validity of the Newcastle Laryngeal Hypersensitivity Questionnaire (LHQ-Br) in Brazilian Portuguese.
The present study sample consisted of 89 individuals from the laryngeal hypersensitivity group and 41 individuals from the healthy group. The Newcastle Laryngeal Hypersensitivity Questionnaire's validation process for Brazilian Portuguese was composed of five stages: construct validity, reliability of internal consistency, and reliability regarding reproducibility, convergent validity, and discriminant validity. The software SPSS 25.0 was used. A significance level of 5% was considered statistically significant.
In construct validation, it was observed that the instrument was unifactorial; however, it was necessary to exclude two items with a low commonality that did not fit the model. The final instrument consisted of 12 items, and a factor called total, which explained 70.55% of the variance. In the internal consistency analysis, Cronbach's alpha was 0.962. The reproducibility had an intraclass correlation coefficient (ICC) was 0.949. In terms of convergent validity, there was a negative correlation between the total domain of the LHQ-Br and the total Cough Severity Index translated and adapted for the Brazilian Portuguese (P = 0.001) and total Voice Handicap Index - 10 (P < 0.001). Discriminant validity showed that ten items and the total factor differentiated healthy group individuals from those in the laryngeal hypersensitivity group.
The 12-item LHQ-Br is valid and reliable for measuring the self-perception of laryngeal sensation associated with Brazilian patients with laryngeal hypersensitivity syndrome.
A Cross-sectional Study of Perceptual and Acoustic Voice Characteristics in Healthy Aging
2023, Journal of Voice
The human voice qualitatively changes across the lifespan. Although some of these vocal changes may be pathologic, other changes likely reflect natural physiological aging. Normative data for voice characteristics in healthy aging is limited and disparate studies have used a range of different acoustic features, some of which are implicated in pathologic voice changes. We examined the perceptual and acoustic features that predict healthy aging.
Participants (N = 150) aged between 50 and 92 years performed a sustained vowel task. Acoustic features were measured using the Multi-Dimensional Voice Program and the Analysis of Dysphonia in Speech and Voice. We used forward and backward variable elimination techniques based on the Bayesian information criterion and linear regression to assess which of these acoustic features predict age and perceptual features. Hearing thresholds were determined using pure-tone audiometry tests at frequencies 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, and 4000 Hz. We further explored potential relationships between these acoustic features and clinical assessments of voice quality using the Consensus Auditory-Perceptual Evaluation of Voice.
Chronological age was significantly predicted by greater voice turbulence, variability of cepstral fundamental frequency, low relative to high spectral energy, and cepstral intensity. When controlling for hearing loss, age was significantly predicted by amplitude perturbations and cepstral intensity. Clinical assessments of voice indicated perceptual characteristics of speech were predicted by different acoustic features. For example, breathiness was predicted by the soft phonation index, mean cepstral peak prominence, mean low-high spectral ratio, and mean cepstral intensity.
Findings suggest that acoustic features that predict healthy aging are different than those previously reported for the pathologic voice. We propose a model of healthy and pathologic voice development in which voice characteristics are mediated by the inability to monitor vocal productions associated with age-related hearing loss. This normative data of healthy vocal aging may assist in separating voice pathologies from healthy aging.

View all citing articles on Scopus

Presented at the 34th Annual Meeting of the Voice Foundation of America, June 2005.

Supported by the Department of Otolaryngology-Head and Neck Surgery and the Department of Speech Pathology & Audiology, University of Iowa.

View full text

Reliability of Clinician-Based (GRBAS and CAPE-V) and Patient-Based (V-RQOL and IPVI) Documentation of Voice Disorders

Summary

Introduction

Section snippets

Methods

Intrarater reliability

Discussion

Acknowledgments

J Voice

J Voice

J Voice

J Voice

J Voice

J Voice

Some acoustic and perceptual factors in acute-laryngitic hoarseness

J Speech Hear Dis

Differential diagnosis patterns of dysarthria

J Speech Hear Res

Some perceptual dimensions and acoustical correlates of pathologic voices

Acta Otolaryngol Suppl

Perceptual and acoustic correlates of abnormal voice qualities

Acta Otolaryngol