The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects

Maydeu-Olivares, Alberto; Kramp, Uwe; García-Forero, Carlos; Gallardo-Pujol, David; Coffman, Donna

doi:10.3758/BRM.41.2.295

The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects

Published: May 2009

Volume 41, pages 295–308, (2009)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects

Download PDF

Alberto Maydeu-Olivares¹,
Uwe Kramp²,
Carlos García-Forero¹,
David Gallardo-Pujol¹ &
…
Donna Coffman³

1197 Accesses
45 Citations
1 Altmetric
Explore all metrics

Abstract

Despite a hundred years of questionnaire testing, no consensus has been reached on the optimal number of response alternatives in rating scales. Differences in prior research may have been due to the use of various psychometric models (classical test theory, item factor analysis, and item response theory) and different performance criteria (reliability, convergent/discriminant validity, and internal structure of the questionnaire). Furthermore, previous empirical studies on this issue have tackled the experimental design from a between-subjects perspective, thus ignoring intra-individual effects. In contrast with this approach, we propose a within-subjects experimental design and a comprehensive statistical methodology using structural equation models for studying all of these aspects simultaneously, therefore increasing statistical power. To illustrate the method, two personality questionnaires were examined using a repeated measures design. Results indicated that as the number of response alternatives increased, (1) internal consistency increased, (2) there was no effect on convergent validity, and (3) goodness of fit worsened. Finally, the article assesses the practical consequences of this research for the design of future personality questionnaires.

Article PDF

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

References

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Google Scholar
American Psychological Association, American Educational Research Association, & National Council on Measurement in Education (APA, AERA, & NCME) (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.
Article PubMed Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Google Scholar
Chang, L. (1994). A psychometric evaluation of four-point and sixpoint Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18, 205–215.
Article Google Scholar
Churchill, G. A., Jr., & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375.
Article Google Scholar
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory and NEO Five-Factor Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources.
Google Scholar
Costa, P. T., & McCrae, R. R. (1999). Inventario de Personalidad NEO revisado (NEO PI-R) e Inventario NEO reducido de Cinco Factores (NEO-FFI): Manual profesional. Madrid: Tea Ediciones.
Google Scholar
Cox, E. P., III (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407–422.
Article Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Article Google Scholar
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49, 71–75.
Article PubMed Google Scholar
D’Zurilla, T. J., Nezu, A. M., & Maydeu-Olivares, A. (2002). The Social Problem-Solving Inventory—Revised (SPSI-R): Technical manual. North Tonawanda, NY: Multi-Health Systems.
Google Scholar
Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum.
Google Scholar
Jöreskog, K. G., & Sörbom, D. (1979). Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books.
Google Scholar
Kramp, U. (2006). Efecto del número de opciones de respuesta sobre las propiedades psicométricas de los cuestionarios de personalidad. Unpublished doctoral dissertation, University of Barcelona.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental tests scores. Reading, MA: Addison-Wesley.
Google Scholar
Maydeu-Olivares, A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279.
Article Google Scholar
Maydeu-Olivares, A., Coffman, D. L., & Hartmann, W. M. (2007). Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods, 12, 157–176.
Article PubMed Google Scholar
Maydeu-Olivares, A., Rodríguez-Fornells, A., Gómez-Benito, J., & D’Zurilla, T. J. (2000). Psychometric properties of the Spanish adaptation of the Social Problem-Solving Inventory—Revised (SPSI-R). Personality & Individual Differences, 29, 699–708.
Article Google Scholar
McCallum, D. M., Keith, B. R., & Wiebe, D. J. (1988). Comparison of response formats for Multidimensional Health Locus of Control Scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736.
Article PubMed Google Scholar
McDonald, R. P. (1999). Test theory. A unified treatment. Mahwah, NJ: Erlbaum.
Google Scholar
Moustaki, I., & Muircheartaigh, C. (2002). Locating “don’t know,” “no answer” and middle alternatives on an attitude scale: A latent variable approach. In G. A. Marcoulides & I. Moustaki (Eds.), Latent variable and latent structure models (pp. 15–40). Mahwah, NJ: Erlbaum.
Google Scholar
Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraw-Hill.
Google Scholar
Muthén, B. [O.] (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132.
Article Google Scholar
Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 205–234). Newbury Park, CA: Sage.
Google Scholar
Muthén, B. [O.], du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript, University of California, Los Angeles.
Olsson, U. (1979). On the robustness of factor analysis against crude classification of the observations. Multivariate Behavioral Research, 14, 485–500.
Article Google Scholar
Pavot, W. G., Diener, E., Colvin, C. R., & Sandvik, E. (1991). Further validation of the Satisfaction With Life Scale: Evidence for the cross-method convergence of well-being measures. Journal of Personality Assessment, 57, 149–161.
Article PubMed Google Scholar
Peter, J. P. (1979). Reliability: A review of psychometric basics and recent marketing practices. Journal of Marketing Research, 16, 6–17.
Article Google Scholar
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15.
Article PubMed Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100.
Google Scholar
Sancerni, M. D., Meliá, J. L., & González Roma, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11, 167–175.
Google Scholar
Sandin, B., Chorot, P., Lostao, L., Joiner, T. E., Santed, M. A., & Valiente, R. M. (1999). Escalas PANAS de afecto positivo y negativo: Validación factorial y convergencia transcultural [The PANAS Scales of Positive and Negative Affect: Factor analytic validation and cross-cultural convergence]. Psicothema, 11, 37–51.
Google Scholar
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.
Google Scholar
Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.
Article Google Scholar
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180.
Article Google Scholar
Steiger, J. H., & Lind, J. M. (1980, May). Statistically based tests for the number of common factors. Paper presented at the meeting of the Psychometric Society, Iowa City, IA.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.
Article Google Scholar
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality & Social Psychology, 54, 1063–1070.
Article Google Scholar
Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test—retest reliability. Educational & Psychological Measurement, 64, 956–972.
Article Google Scholar
Yuan, K.-H., & Bentler, P. M. (2004). On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified. Educational & Psychological Measurement, 64, 737–757.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Psychology, University of Barcelona, Passeig de la Vall d’Hebron, 171, 08035, Barcelona, Spain
Alberto Maydeu-Olivares, Carlos García-Forero & David Gallardo-Pujol
University of Chile, Santiago, Chile
Uwe Kramp
Pennsylvania State University, University Park, Pennsylvania
Donna Coffman

Authors

Alberto Maydeu-Olivares
View author publications
You can also search for this author in PubMed Google Scholar
Uwe Kramp
View author publications
You can also search for this author in PubMed Google Scholar
Carlos García-Forero
View author publications
You can also search for this author in PubMed Google Scholar
David Gallardo-Pujol
View author publications
You can also search for this author in PubMed Google Scholar
Donna Coffman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Maydeu-Olivares.

Additional information

This research was supported in part by Grant SEJ2006-08204/PSIC from the Spanish Ministry of Education to A.M.-O.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maydeu-Olivares, A., Kramp, U., García-Forero, C. et al. The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects. Behavior Research Methods 41, 295–308 (2009). https://doi.org/10.3758/BRM.41.2.295

Download citation

Received: 28 May 2008
Accepted: 06 October 2008
Issue Date: May 2009
DOI: https://doi.org/10.3758/BRM.41.2.295

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects

Abstract

Article PDF

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects

Abstract

Article PDF

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation