Abstract
Despite a hundred years of questionnaire testing, no consensus has been reached on the optimal number of response alternatives in rating scales. Differences in prior research may have been due to the use of various psychometric models (classical test theory, item factor analysis, and item response theory) and different performance criteria (reliability, convergent/discriminant validity, and internal structure of the questionnaire). Furthermore, previous empirical studies on this issue have tackled the experimental design from a between-subjects perspective, thus ignoring intra-individual effects. In contrast with this approach, we propose a within-subjects experimental design and a comprehensive statistical methodology using structural equation models for studying all of these aspects simultaneously, therefore increasing statistical power. To illustrate the method, two personality questionnaires were examined using a repeated measures design. Results indicated that as the number of response alternatives increased, (1) internal consistency increased, (2) there was no effect on convergent validity, and (3) goodness of fit worsened. Finally, the article assesses the practical consequences of this research for the design of future personality questionnaires.
Article PDF
Similar content being viewed by others
References
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
American Psychological Association, American Educational Research Association, & National Council on Measurement in Education (APA, AERA, & NCME) (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Chang, L. (1994). A psychometric evaluation of four-point and sixpoint Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18, 205–215.
Churchill, G. A., Jr., & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375.
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory and NEO Five-Factor Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources.
Costa, P. T., & McCrae, R. R. (1999). Inventario de Personalidad NEO revisado (NEO PI-R) e Inventario NEO reducido de Cinco Factores (NEO-FFI): Manual profesional. Madrid: Tea Ediciones.
Cox, E. P., III (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407–422.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49, 71–75.
D’Zurilla, T. J., Nezu, A. M., & Maydeu-Olivares, A. (2002). The Social Problem-Solving Inventory—Revised (SPSI-R): Technical manual. North Tonawanda, NY: Multi-Health Systems.
Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum.
Jöreskog, K. G., & Sörbom, D. (1979). Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books.
Kramp, U. (2006). Efecto del número de opciones de respuesta sobre las propiedades psicométricas de los cuestionarios de personalidad. Unpublished doctoral dissertation, University of Barcelona.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental tests scores. Reading, MA: Addison-Wesley.
Maydeu-Olivares, A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279.
Maydeu-Olivares, A., Coffman, D. L., & Hartmann, W. M. (2007). Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods, 12, 157–176.
Maydeu-Olivares, A., Rodríguez-Fornells, A., Gómez-Benito, J., & D’Zurilla, T. J. (2000). Psychometric properties of the Spanish adaptation of the Social Problem-Solving Inventory—Revised (SPSI-R). Personality & Individual Differences, 29, 699–708.
McCallum, D. M., Keith, B. R., & Wiebe, D. J. (1988). Comparison of response formats for Multidimensional Health Locus of Control Scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736.
McDonald, R. P. (1999). Test theory. A unified treatment. Mahwah, NJ: Erlbaum.
Moustaki, I., & Muircheartaigh, C. (2002). Locating “don’t know,” “no answer” and middle alternatives on an attitude scale: A latent variable approach. In G. A. Marcoulides & I. Moustaki (Eds.), Latent variable and latent structure models (pp. 15–40). Mahwah, NJ: Erlbaum.
Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraw-Hill.
Muthén, B. [O.] (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132.
Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 205–234). Newbury Park, CA: Sage.
Muthén, B. [O.], du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript, University of California, Los Angeles.
Olsson, U. (1979). On the robustness of factor analysis against crude classification of the observations. Multivariate Behavioral Research, 14, 485–500.
Pavot, W. G., Diener, E., Colvin, C. R., & Sandvik, E. (1991). Further validation of the Satisfaction With Life Scale: Evidence for the cross-method convergence of well-being measures. Journal of Personality Assessment, 57, 149–161.
Peter, J. P. (1979). Reliability: A review of psychometric basics and recent marketing practices. Journal of Marketing Research, 16, 6–17.
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100.
Sancerni, M. D., Meliá, J. L., & González Roma, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11, 167–175.
Sandin, B., Chorot, P., Lostao, L., Joiner, T. E., Santed, M. A., & Valiente, R. M. (1999). Escalas PANAS de afecto positivo y negativo: Validación factorial y convergencia transcultural [The PANAS Scales of Positive and Negative Affect: Factor analytic validation and cross-cultural convergence]. Psicothema, 11, 37–51.
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.
Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180.
Steiger, J. H., & Lind, J. M. (1980, May). Statistically based tests for the number of common factors. Paper presented at the meeting of the Psychometric Society, Iowa City, IA.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality & Social Psychology, 54, 1063–1070.
Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test—retest reliability. Educational & Psychological Measurement, 64, 956–972.
Yuan, K.-H., & Bentler, P. M. (2004). On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified. Educational & Psychological Measurement, 64, 737–757.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported in part by Grant SEJ2006-08204/PSIC from the Spanish Ministry of Education to A.M.-O.
Rights and permissions
About this article
Cite this article
Maydeu-Olivares, A., Kramp, U., García-Forero, C. et al. The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects. Behavior Research Methods 41, 295–308 (2009). https://doi.org/10.3758/BRM.41.2.295
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BRM.41.2.295