Introduction
Health-related quality of life (HRQOL) is a multidimensional concept that can be described as the degree of influence of a medical condition or treatment to the usual or expected physical, emotional and social well-being. Factors that play a role in a quality of life (QoL) vary according to personal preferences. For many, however, having enough visual ability to do those things that they want to do is a high priority. Quality of vision is an integral part of HRQOL and the impact of ophthalmic diseases on QoL has been documented in a series of studies [
1‐
3]. Patients often do not perceive the same benefit as recorded by objective measures such as visual acuity, visual field testing because the objective measurements do not evaluate the patient’s perceptions of their own disease. Numerous instruments that evaluate patients’ subjective perceptions regarding QoL have been developed so far. Although generic instruments can effectively assess HRQOL for persons with nonocular conditions, they usually cannot fully capture HRQOL in those with visual impairment [
4‐
7]. Measuring the vision specific QoL gives us a wider view of the effect of the disease or the effect of the treatment on a patient’s life.
Many specific questionnaires for patients with visual impairment have been developed and offered to the ophthalmologists over the past twenty years [
8,
9]. National Eye Institute visual function questionnaire NEI VFQ-25 was originally developed by the National Eye Institute mainly for the English-speaking North American populations [
10]. It is shorter version of previously developed 51-item version [
11]. The NEI VFQ-25 is a questionnaire that assesses eleven dimensions of visual function and has been proposed as a means to assess the efficacy of treatment for different ocular conditions [
12]. The NEI VFQ-25 was developed in the USA and has been translated into a number of different languages: Italian, French, German, Spanish, Turkish, Chinese, Japanese, Greece, Portuguese [
13‐
20]. To our knowledge, none of the vision-targeted health status questionnaires have been translated into Serbian, and neither has been developed in Serbian. Therefore, we decided to translate the NEI VFQ-25 questionnaire into Serbian and to assess its psychometric characteristics.
Methods
The NEI VFQ-25 has 25 items that measure vision-targeted HRQOL and are grouped into 12 subscales: general health (GH, one item); general vision (GV, one item); ocular pain (OP, two items); difficulty with near-vision activities (NV, three items); difficulty with distance-vision activities (DV, three items); limitation of social functioning because of vision (SF, two items); mental health problems because of vision (MH, four items); role limitations because of vision (RL, two items); dependency on others because of vision (DP, three items); driving difficulties (DR, two items); difficulty with color vision (CV, one item); and difficulty with peripheral vision (PV, one item). Each subscale score is converted to a score between 0 and 100, and higher score indicates better vision-specific HRQOL. The composite NEI VFQ-25 score is the mean score of all items except for the general health item. There are 12 optional items, presented in Appendix one of the questionnaire. An investigator may select to add these items to a specific subscale if the subscale represents the main dimension of vision-targeted HRQOL that is felt to be most important for the condition under study.
Development of the Serbian version
The Serbian version of NEI VFQ-25 was translated in accordance with standard methods that have been adopted internationally [
21], including forward translation, backtranslation, examination of the translation quality and adjudication by bilingual speakers, and a pilot test on ten patients who visited the outpatient service of our clinic for a check-up. The results of the pilot-testing indicated that the instrument was well accepted, as it was short in duration (about 10 min) and all items were easy to understand. Pilot testing was used as cognitive debriefing and adaptation of the questionnaire to the experience of Serbian patients mandated slight modification of only two questions. Thus, item ‘13’ (How much difficulty do you have visiting people at their homes, at parties, or in restaurants?) was translated as: (How much difficulty do you have visiting people at their homes, gatherings or restaurants?). Due to low popularity of golf in Serbia, golf has been changed into riding bicycle in item A7. This study was performed in accordance with the Declaration of Helsinki. The Ethical Committee of the Faculty of Medicine, University of Belgrade reviewed and approved the study. All participants provided signed informed consent before enrolment.
Study design and population
The study was conducted between December 2013 and July 2014 at Eye clinic of Military Medical Academy, Belgrade and 105 patients were included. In order to assess the reliability and validity of the translated NEI VFQ-25, we used a sample of four patient groups: patients with cataract (C), age-related macular degeneration (ARMD), glaucoma (G) and diabetic retinopathy (DR). All surveys were administered by two trained physicians using a face
-to
-face interview method. The following instruments were used: the Serbian version of the NEI VFQ-25, the questionnaire with 12 optional items related to different aspects of vision-specific HRQOL, and the SF-36 health survey questionnaire. The SF-36 was chosen because it is one of the most widely used measures in health services research and has been already translated into the Serbian language and validated [
22]. This questionnaire includes 8 subscales: general health, physical function, physical role activities, usual emotional role activities, mental health, social function, vitality, and bodily pain. Each of the subscales is scored on a 0 to 100 scale, in which 100 indicates the best possible score and zero indicates the worst function.
Eligibility criteria included an age of 40 years and older, presenting visual acuity (VA) of 0.6 or worse in the better eye, Serbian speaking, no cognitive or hearing impairment, no motion impairment, and no history of laser or incisional eye surgery within 3 months. All patients underwent a complete ophthalmologic examination, including best corrected VA testing, slitlamp biomicroscopy, dilated fundus examination, and Goldmann applanation tonometry. All glaucoma patients exhibited glaucomatous disc cupping and visual field examination utilized the G2 program, Octopus 101 Perimeter System (HAAG-STREIT AG, Koeniz-Berne, Switzerland). Glaucoma patients with any ocular pathology other than mild nuclear sclerosis were excluded. Patients with age related macular degeneration (ARMD) had at least one of the following features consistent with ARMD, namely, geographical atrophy in the macula, a pigment epithelial detachment or choroidal neovascularization. Patients with late sequelae of ARMD, such as scarring in the macula, were included in the study, and pseudophakia was not considered as an exclusion criterion for ARMD patients. The pattern of cataract was noted as nuclear, subcapsular, or cortical. The severity of age-related cataracts was graded with the Lens Opacities Classification System III (slit lamp, standard testing conditions) [
23]. Cataract patients with any other ocular pathology were excluded. Grading protocols for DR were modifications of the Early Treatment Diabetic Retinopathy Study adaptation of the modified Airlie House classification of DR [
24]. Diabetic retinopathy was classified as 1: nonproliferative DR (NPDR), mild, moderate, or severe; or 2: proliferative (PDR). Fundus fluorescein angiography was performed in diabetic patients who had macular involvement.
Statistical analysis
The statistical analysis consisted of reliability and validity analyses which were done with SPSS version 21.0 for Windows (SPSS Inc. Chicago, IL).
Descriptive analysis and item analysis
The item analysis was performed using the data from the different subject groups. The percentage of missing values was examined for each item. We also examined whether each item’s distribution of responses was strongly skewed (large ceiling effect or floor effect).
Reliability
Cross-sectional data from the four patient groups were used to quantify reliability. Cronbach’s alpha coefficient was used to assess internal consistency for each subscale [
25]. The item-total score correlations were explored by Spearman’s correlation analysis. According to the general guidelines suggested by Colton, correlations ranging from 0.00 to 0.25 indicate little or no relationship; those from 0.25 to 0.50 suggest a fair degree of relationship; values of 0.50–0.75 are moderate to good; and values above 0.75 are considered good to excellent [
26]. To assess test–retest reliability, intraclass correlation coefficients were used. The test–retest data were obtained from clinically stable patients with age-related cataracts, in surveys performed 2 weeks apart. The time interval was recommended by Streiner and Norman [
27,
28].
Validity
Multi-trait analysis was used to evaluate convergent and discriminant validity according to Campbell ad Fiske [
29]. Each item was hypothesized to belong to only one multi-item subscale and correlations between the score on that item and the scores on all the subscales were computed. For each item, if the correlation between the score on that item and the score on the subscale to which that item belongs is 0.4 or higher, that item is said to have ‘passed’ the test of convergent validity. On the other hand, for each item, if the correlation between the score on that item and the score on the subscale to which that item belongs is greater than the correlations between the score on that item and the scores on all the subscales to which that item does not belong, then that item is said to have ‘passed’ the test of discriminant validity. To assess concurrent validity, correlations between scores on the NEI VFQ-25 and scores on the SF-36 subscales were computed. We hypothesized that the NEI VFQ-25 ‘Mental health’, ‘Social functioning’ and ‘Dependency’ scores would be associated more strongly with the SF-36 subscale scores that measured similar domains. The clinical validity was examined by correlation of clinical measurements (visual acuity (VA) and visual field deficit) and scores of all subscales. We computed the correlations between subscale scores and VA with best correction in the better and worse eye and deficits in visual fields as measured by the Octopus perimeter in the better and worse eye. Finally, we used factor analysis to assess the uni-dimensionality of the scale, in preparation for computing a composite score. Factor analysis was done using 11 subscales (‘Driving’ was not included), with the maximum-likelihood solution and varimax rotation. The ‘Driving’ subscale was not included because 73.3 % of the responses on this subscale were missing.
Rasch analysis
Alongside the traditional methods, the psychometric properties of the Serbian NEI VFQ-25 were also evaluated by Rasch analysis. The purposes of Rasch analysis are to maximize the homogeneity of the trait and to allow greater reduction of redundancy at no sacrifice of measurement information by decreasing items and/or scoring levels to yield a more valid and simple measure. Rasch analysis consists of the following components: category threshold order, person separation, unidimensionality, targeting, and differential item functioning (DIF). Winsteps (version 3.90) was used to perform Rasch analysis using the Andrich rating scale model [
30]. Numerical responses for each item were recoded so that one was assigned as the lowest possible response and five as the highest. The ranking of response categories was reversed when necessary so that higher scores always represented higher levels of visual functioning.
Category Threshold Order
The first step was to examine the ordering of the response category threshold. Disordering of categories occurs when categories are underused, have unclear definition, or when the number of categories exceed the number of levels that participants can distinguish. Disordered thresholds can be a cause of item misfit. Therefore, in a case of disordered thresholds, combining adjacent categories was done until thresholds were ordered; this was made before further analyses were carried out.
Person separation
Person separation is a measure of questionnaire’s precision and can be used to estimate how many groups or strata of person ability can be discriminated. A person separation reliability of 0.8 was the minimum value of discrimination for an instrument in this study; it means that three strata can be discriminated, and a reliability coefficient of 0.9 indicates four strata. The person separation index is the ratio of the variance in the person measures for the sample to the average error in estimating these measures. A person separation index of ≥2.0 represents the minimum acceptable level of separation.
Unidimensionality
Unidimensionality refers to whether the questionnaire measures a single underlying construct. Dimensionality is assessed by using item-fit statistics (mean square statistics) and by principal component analysis (PCA) of the residuals (difference between the observed and expected responses). There are two types of fit statistics, infit and outfit. Infit statistic is more sensitive to the pattern of responses to person-targeted items and less sensitive to the presence of outliers and therefore is considered more informative. Instrument was evaluated using the parameters proposed by Pesudovs et al. [
6‐
31]. Fit statistics between 0.7 and 1.3 are considered acceptable [
30] though a more yielding criterion of between 0.5 and 1.5 is also considered useful for the measurement [
32]. Data are considered unidimensional if most of variance is explained by the principal component (>60 %) and if there is no significant explanation of the residual variance by the contrasts to the principal component. The unexplained variance by the contrast should be less than two eigenvalue units.
Targeting
Targeting refers to how well the difficulty of items in the scale matches the abilities of the persons in the sample. It can be evaluated by visually inspecting person-item maps and by measuring the difference between person and item mean values. A difference between means of more than 1 logit points out notable mistargeting.
Differential Item Functioning (DIF)
DIF was carried out to assess whether the items function similarly for persons at the same level of ability regardless of their characteristics. For DIF testing, the respondents were stratified by sex, age (≤70 years and >70), systemic comorbidity (present/absent) and better eye visual acuity (≤0.4 and > 0.4). DIF was considered absent if a difference was less than 0.5 logits, minimal if it ranged from 0.5 to 1.0 logits and notable if it was greater than 1.0 logit [
33]. The 12 subscales were analyzed separately using the same procedures and criteria for reliability and validity that were used for the overall questionnaire. However four subscales (general health, general vision, color vision, and peripheral vision) contain only one item each and do not fulfill the criteria to perform Rasch analysis. The person separation reliability was used to evaluate the appropriateness of use of the subscales.
Discussion
Traditional clinical measures of vision may fail to assess many aspects of visual disability that are identified by individuals as being important for their daily functioning and well being [
2,
4]. Many specific questionnaires for patients with visual impairment have been developed and offered to the ophthalmologists over the past twenty years [
8]. To this date no questionnaires measuring vision related QoL have been developed in Serbian, and none of the vision-related QoL questionnaires have been translated and validated into Serbian. Keeping in mind the growing interest among medical professionals in Serbia for vision related QoL we decided to translate and validate the NEI VFQ-25 into Serbian.
The primary objective of our study was the evaluation of the reliability and validity of the NEI VFQ-25 in native Serbian populations with a series of most common ophthalmic diseases. Proper adaptation of the instrument to the Serbian population required a slight modification of some items. Due to suggestion proposed during the cognitive debriefing sessions item 13 “visiting with people in their homes, at parties, or in restaurants” has been changed into “visiting with people in their homes, at gatherings, or in restaurants”. In item A7 that includes sports, playing golf was changed to riding a bicycle. Minor modifications of some items during the translation and validation of the NEI VFQ-25 were also considered necessary in other populations [
16‐
19]. Similar to the original validation studies in other populations, relatively high missing rates were encountered in the ‘Driving’ subscales. In our study relatively high missing rate (32.4 %) was found in item 14 related to ‘Distance vision’ (going out to see movies, plays, or sports events). One of possible explanation could be connected with poor economic situation in our country. However, the missing rates of the other items were comparably lower than the ones encountered during the translation and validation of the same instrument in other populations [
16‐
19]. High ceiling percentages were encountered in some items (
i.e. ‘Color vision: difficulty matching clothes’, ‘Mental health: Amount true: embarrassment’) and moderate skewing of data was detected. The reliability of the Serbian version of the NEI VFQ-25 was tested by internal consistency (IC) and item-scale correlations. Cronbach alpha values as measure of the IC of the scale, were satisfactory in almost all of the subscales and the overall index. The lowest value of Cronbach’s alpha was detected in ‘Social functioning’ (0.643) subscales. After inclusion of optional items for this subscale, Cronbach’s alpha value was higher than 0.7. The subscales of the Serbian version of NEI VFQ-25 presented variable but adequate internal consistencies indicating high reliability of the instrument in the population studied. The high test-retest reproducibility of the NEI VFQ-25 is a critical characteristic for a questionnaire to be used in follow-up studies. A correlation coefficient greater than 0.80 for two administrations of a scale one to two weeks apart suggests adequate stability [
30]. The test-retest reliability ranged from 0.808 to 0.986 in our study. All subscales had intraclass correlation coefficient abowe 0.8. Good test–retest reliability was indicated by the high values of the intraclass correlation coefficients. Regarding the construct validation of the questionnaire, none of the items failed either the convergent or the discriminant tests. Similar findings were observed in other studies [
17,
19]. The ability of the questionnaire to demonstrate the problem of different levels of VA loss also indicated a satisfactory clinical validity. Strong correlations were detected between BCVA of the subjects and the all subscales except ‘General health’ and ‘Ocular pain’. Similar correlations between VA and NEI VFQ-25 subscales have been detected by previous investigators during the validation of the instrument in other languages as well [
16‐
20]. We also tested the validity of our version by comparison of its subscales with scales of similar content of the SF-36. The ‘General vision’ subscale showed high correlation with physical component of SF-36. The ‘Mental health’ and ‘Dependency’ subscales showed high correlation with almost all subscales of SF-36. Other NEI VFQ-25 subscales were moderately correlated with similar SF-36 subscales, except ‘Driving’ which was not correlated with any of them. It could be because of high rate of missing responses in ‘Driving’ subscale.
Factor analysis indicated that the most of the subscales that are influenced by central vision and peripheral vision correlated with the first factor, while the ‘Color vision’, ‘Ocular pain’ ‘Social functioning’ and ‘Dependency’ subscales were included in the second factor. These results are consistent with the results of previous studies, that most of subscales of NEI VFQ-25 belong to the same underlying dimension, especially connected with central vision [
18,
20].
Besides traditional methods, Rasch analysis was also applied to assess psychometrics properties of NEI VFQ-25. Rasch analysis focuses on analysis at a person and item level versus test level. As opposed to traditional psychometrics, Rasch provides detailed information on rating scales, items, persons, and other factors such as rater severity [
35]. Rasch analysis revealed a substantial weakness of the questionnaire that should be taken into consideration when interpreting the results.
Items belonging to the ‘General health’, ‘Driving’ subscale, ‘Distance activities’ (Going out to movies/plays/sports events), and ‘Mental health’ subscale (Embarrassment) did not fit the overall scale. Similar results were reported by other authors [19, 31, 33,]. A high percentage of missing values for subscale ‘Driving’ was also found in different population [
31,
33,
34]. The categories for two rating scales (Difficulty Scale and Agreement Scale) had to be collapsed to a four-category response scale (6 items), which is in agreement with some previous studies [
36,
37]. There are also studies in which categories had to be collapsed to a dichotomous scale [
33]. Rasch analysis in our study reveals multidimensionality of the NEI VFQ-25 questionnaire. This result is consistent with findings in earlier studies [
31,
33,
34]. The problem with multidimensionality is that the use of composite score requires that only a single construct is being measured. The results of our principal component analysis indicated that five items loaded positively onto the first contrast and belonged to the ‘Role difficulties’ (three items), and ‘Mental health’ (two items) subscales. Similar results were found in study published by Marella et al. [
33] and study of Pesudovs at al. [
31] in which several of the items loaded positively onto the first contrast and belonged to the ‘Role difficulties’, ‘Mental health’ and ‘Dependency’ subscales. Examination of targeting showed that most of items cover people with low and moderate visual ability and most of uncovered percentage represents persons with high visual ability. However, this finding indicates that this instrument is suitable for medical application where it should measure disabled persons more precisely than healthy people. The NEI VFQ-25 was designed to have 12 subscales, but only three (Role difficulties, Near activities and Driving) met the criteria for valid measurement in our study. Bearing in mind that only a small percent of total study population answered driving items we have to be careful in drawing conclusion. Authors who revealed multidimensionality of the NEI VFQ questionnaire by PCA suggested that the NEI VFQ was an instrument with two scales ‘Visual functioning’ and ‘Socioemotional’ [
31,
33,
34]. According to this finding we also constructed the visual functioning scale and the socioemotional scale. Our results were similar with the previous reported findings [
31,
33]. The psychometric characteristics of the visual functioning scale were slightly better compared to the socioemotional scale. Targeting was suboptimal in both scales. The similar results were found by other authors and indicated that the reengineered versions were not perfect [
31,
33,
34]. However, one of the most important tasks in the designing of the questionnaire is to enable that the questionnaire measures only a single underlying construct. This is where the use of Rasch analysis plays a critical role, and has been shown to have higher precision in the evaluation of the quality of the patient-reported outcomes. Bearing in mind that developing of slightly different versions of the same questionnaire can be confusing in some way and may make comparison between studies in different populations difficult, there is a need for valid scales of the English version of the NEI VFQ. Khadka, McAlinden and Pesudovs [
38] carried out systematic review of all the available ophthalmic patient-reported outcome (PRO) questionnaires to assess the quality of the following psychometric characteristics: content development, performance of the response scale, dimensionality, measurement precision, validity, reliability, targeting, differential item functioning, and responsiveness. The aim of this review was to inform researchers and clinicians on the choice of the highest quality PRO instrument suitable for their purpose. They recommended six revised scales (Long form visual function scale and Long form socio-emotional scale derived from NEI VFQ-39 and NEI VFQ-25, and Short form visual function scale and Short form socio-emotional scale) and four valid subscales of NEI VFQ (Near vision, Distance vision, Role difficulties and General Health).
Nevertheless, certain limitations of our study may have to be considered. First of all, we used cross-sectional survey to collect data and we were not able to determine long-term change of QoL associated with visual impairment. Second, our study included common ophthalmic diseases and it is unclear whether these findings are applicable to patients with diseases other than cataract, diabetic retinopathy, ARMD and glaucoma. Furthermore, a sample of persons with these ophthalmic conditions may not represent the full clinical spectrum of each disease. Finally, we did not investigate whether the mode of questionnaire administration (e.g. self-administered versus face-to-face interview) may influence on the results.
In conclusion, the results of our study indicate that the Serbian version of NEI VFQ-25 is a valid and reliable instrument for the assessment of vision specific QoL in native population according the traditional psychometric methods. However Rasch analysis indicates substantial weaknesses of the questionnaire, particularly in the measurement of dimensionality. Therefore, total score derived from all items seems to be unsuitable and an issue of concern. Measuring of both Visual functioning and Socioemotional constructs should be considered. Despite previous results indicating multidimensionality and some deficiencies in psychometric properties, NEI VFQ-25 is still widely used as an outcome measure among large number of ophthalmologic conditions. This is in some way reasonable because it represents a vision-related quality of life. On the other hand, improving the psychometric properties of the instruments is important and enables researchers to be more precise and accurate in measuring the outcome. Further research should be performed to increase the measurement properties of the the Serbian version of the NEI VFQ-25.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
BK contributed to conception and design, acquired data, analyzed and interpreted data and drafted the article. MV and MR gave a final approval of the version to be published. GT and JJ performed statistical analysis and revising manuscript critically. JDK critically revised importance of intellectual content. MS collecting, analyzing and interpreting data. AG contributed to conception and design and provided critical revision of the intellectual content of the manuscript. All authors provided final approval of the version to be published.