Introduction

Glaucoma is one of the leading causes of irreversible blindness, with nearly 70 million people suffering from this chronic ophthalmologic condition worldwide.1 Ninety percent of all cases are primary open-angle glaucoma (POAG).2 POAG is often called ‘the silent thief of sight’, as in the early disease phase, typically no symptoms are experienced.2, 3 Patients may experience progressive worsening of their vision, initially peripherally (ie, vision outside the center of gaze), but eventually involving the central vision.4

Objective endpoints of vision loss, such as the measurement of visual acuity and visual field, may fall short in capturing the impact of glaucoma on the patient's daily life.5 The reduced vision is a debilitating condition substantially affecting a patient's ability to perform activities that are dependent on peripheral vision or perception of contrast, such as driving, performing household tasks, reading and may have also a great impact on a person's quality of life (QoL).6 The patient's perspective is therefore important in order to fully understand the impact of glaucoma and its treatment on their functioning and well-being, and should be more integrated in clinical practice and research evaluations because some treatment effects are only known by the patients and are not detectable or interpretable by the health care provider.

The US Food and Drug Administration (FDA) recently recommended the term ‘patient-reported outcomes (PRO's)’ as an umbrella term covering a broad range of health data reported by the patient.7, 8 PRO self-report questionnaires have been developed to assess several aspects of the patients’ health status, for example, the patients’ perception of side effects, the functional impact of illness, the impact of illness on QoL, treatment satisfaction and adherence.9 Although generic PRO instruments capture a broad range of health status aspects, allowing comparisons among different diseases, they do not capture the patient's perception on specific aspects of a disease or health problem, such as glaucoma. Disease-specific instruments are more sensitive to capture small changes in the condition-specific health status, and may help to interpret and capture clinical outcomes of glaucoma or its treatment comprehensively if well developed and validated. It is also likely that they are more acceptable for the patient than generic instruments, because of their clear relevance to the patient's situation.10 PRO's are therefore a unique indicator of the disease's impact on a patient's life and are essential for evaluating treatment efficacy or side effects. Hence, instruments measuring PRO's may provide essential disease and treatment information and their results can be considered as a key-element in treatment decision making and research.9

However, it may be challenging for clinicians or researchers to evaluate which PRO's are most appropriate for their intended clinical evaluation or research project. Clinicians may benefit from a clear and comprehensive overview of the quality of existing glaucoma-specific PRO's. The aim of this systematic review is therefore to summarize the literature in view of PRO instruments for glaucoma and to provide guidance on how specific outcomes are best assessed based on published evidence about their content and validity.

Materials and methods

Search strategy

The databases PUBMED, CINAHL, Psycinfo and Embase were systematically searched (from 01-01-1980 to 31-12-2010) for relevant articles using the following search string (glaucoma OR ocular hypertens* OR visual impairment OR vision impairment) AND (adheren* OR nonadheren* OR non-adheren* OR complian* OR noncomplian* OR non-complian* OR persistenc* OR impression OR well-being OR mobility OR utility OR preference OR ADL OR symptom* OR activities of daily living OR satisfaction OR pain OR performance status OR disability OR functional status OR quality of life OR health status OR patient based OR self-report OR patient report OR patient related OR patient-reported outcome OR PRO OR score OR questionnaire OR scale OR measure OR instrument) AND (valid* OR reliable OR reliability OR psychomet* OR test–retest OR acceptability OR reproducibility OR sensitivity OR effect size OR responsive*). Next, the reference lists of the selected publications were hand searched for additional relevant articles. Finally, the names of the instruments described in the selected publications as well as the names of the first authors were used as separate search terms.

Study selection criteria

Inclusion- and exclusion criteria

Full-text papers written in English were included if they described the developmental process and/or psychometric properties of a glaucoma-specific self-report instrument capturing a PRO. Vision-specific instruments, developed for a broad range of ophthalmic conditions including glaucoma, and generic instruments adapted for use in glaucoma patients specifically, were also selected for review. Additionally, if glaucoma instruments were further validated in other eye disease populations (eg, cataract), relevant validation data that is of importance for the field of glaucoma (ie, testing of unidimensionality) was integrated in this review.

Papers were excluded if: (1) the instruments were only used in studies, without reporting information on their development or validation; (2) the instruments were developed to assess the need for or the effect of vision-related rehabilitation services; (3) the instruments were specifically developed for children; (4) existing PRO-instruments were translated to another language or adapted for a specific population, (5) the instruments were developed for use in a specific minority population (eg, the population of a developing country), (6) only a subset of items of an instrument was further validated or (7) if a specific scoring algorithm or item response theory was tested in an already existing PRO-instrument without further validating the tool.

Study selection

The researcher scrutinized the titles and abstracts of all identified citations (see Figure 1). The full text was obtained of any article that was deemed potentially eligible by the reviewer. The full text of all retrieved papers was then evaluated on its eligibility based on the previously mentioned inclusion- and exclusion criteria. It should be noticed that for the purpose of this review, we took the most recent paper addressing the development or validity of a selected instrument or its revised version into consideration. Yet, additional information on its development and validity described in previously published papers on the selected tool was integrated.

Figure 1
figure 1

Flow from electronic database searches to final inclusion of eligible studies.

Data extraction, classification and evaluation

The retrieved instruments were classified according to the conceptual framework of Acquadro et al,9 distinguishing PRO's on the perceptions of symptoms, functional impact of illness, utility and preference measures for treatment options, QoL and well-being, patient and treatment satisfaction and adherence.

Next, all the selected and classified PRO's were assessed on their quality using both the FDA-guidelines11 and the framework of Pesudovs et al.12 The FDA-criteria were applied, as they were specifically developed for PRO-instruments used for supporting medication labeling claims. The latter quality assessment tool was specifically created to determine if existing instruments are adequate for their intended use in the intended target population.12 These outlined quality criteria emphasize the importance of both the developmental history and the psychometric characteristics of PRO's and put forward the more modern methods of scoring and validation, namely Rasch-analysis.13 This is a validation technique gaining more and more attention in the ophthalmic literature and transforms ordinal scores into interval scores to strengthen the instruments’ content and validity. More specifically, all PRO instruments were evaluated against the following criteria: (1) were the purpose of the instrument and its target population well defined; (2) were adequate steps taken in defining the content of the instrument, the rating scale and the scoring system and (3) is the instrument performing well in view of validity and reliability.12 There are several existing guidelines and published standards for evaluating and judging these psychometric properties of PRO-instruments,14, 15, 16, 17, 18, 19 but ideally good PRO-instruments require scientific evidence concerning: construct-, criterion-validity, responsiveness and reliability.20, 21 According to the FDA-guidelines validity, reliability and responsiveness testing should be repeated when a PRO instrument is modified: (a) to measure another concept; (b) to be used in a different population or condition, (c) changing the item content or instrument format, or (d) in terms of mode of administration, culture or language application.11 A more detailed overview of the definitions of the above mentioned quality criteria including the required psychometric characteristics is given in Table 1 .

Table 1 Quality assessment tool for evaluation of PRO-instruments12

Results

The search strategy yielded a total of 53 articles addressing 27 PRO's (Figure 1). In all, 18 instruments were specifically developed for patients with glaucoma, and 9 instruments for use in patients with diverse ophthalmologic conditions, including glaucoma. Three major categories of PRO's were distinguished, more specifically PRO's addressing functional status related to vision (n=7); overall QoL (n=11) and other factors related to disease and treatment (eg, symptoms, side effects, adherence, satisfaction, self-efficacy) (n=9).

PRO's addressing functional status related to vision

In all, 7 out of the 27 retrieved instruments were developed to assess functional status (Table 2 ). Functional status refers to the person's ability to undertake activities designed to meet basic needs, fulfill life roles, and maintain health and well-being.22

Table 2 Patient-reported outcomes addressing functional status related to vision (see Table for an explanation of the criteria and their rating)

All the selected functional status instruments contain a set of items referring to visual activities, which have to be rated by the patient as being difficult or problematic. The Visual Activities Questionnaire23 slightly differs from the other instruments as it contains more vision-related items (eg, reading small print), compared with the other six instruments, which are focusing more on important mobility situations in daily life (eg, walking at night). Only two of the selected instruments were validated according to modern validation standards, referred to as Rasch-analysis.

Table 2 shows that the instruments assessing functional status, demonstrated poor quality with regard to its developmental process, as none of the questionnaires used a conceptual framework or comprehensive consultation with patients to guide their item generation process. The selection of items for the final questionnaire were mostly not or only partially supported by adequate statistical techniques, such as factor analysis or Rasch-analysis and only one of the rating scales was statistically justified (ie, Glaucoma Symptom Identifier (GSI)). In view of validation, only the Independent Mobility Questionnaire (IMQ) and GSI were tested using appropriate Rasch-analysis, demonstrating convincing validity evidence of a Rasch-scaled glaucoma measure.24, 25, 26 The GQL-15, validated according to the more classical standards, demonstrated satisfactory validity and reliability evidence. Yet, the GQL-15, as opposed to what it intends to measure, does not assess QoL. QoL is a multi-dimensional concept, yet the GQL-15 only contains items representing visual activities, which is only one dimension of QoL.27, 28 The same is true for the GSI, pretending to measure the impact of glaucoma symptoms on QoL, yet mostly containing items related to visual activities as well.26

PRO's addressing QoL

QoL is a multi-dimensional concept29 referring to the degree of overall life satisfaction that is positively or negatively influenced by individuals’ perception of certain aspects of life important to them (Table 3 ), including matters both related and unrelated to health.30, 31

Table 3 Patient-reported outcomes addressing quality of life (see Table for an explanation of the criteria and their rating)

The literature search yielded 11 QoL-instruments developed for patients either with a visual impairment including glaucoma (n=3) or for patients suffering from glaucoma specifically (n=8). Four QoL-instruments were analyzed or revised using Rasch-analysis and seven instruments according to the classical validation techniques.

The process of instrument development was very extensive in most of the classically tested QoL-PRO's with almost all instruments using the patients’ input in view of item generation except for the GHPI32, 33 and the QoL–VFQ.34 The latter instruments should therefore not be considered to measure QoL in glaucoma patients because of their poor quality regarding their item content. The majority of remaining questionnaires demonstrated acceptable to good item selection procedures and were tested on unidimensionality except for the original and widely used NEI-VFQ 51 items35, 36 and 25 items37 questionnaires, the GUI38 and the NHVQoL.39, 40 Yet, this latter criterion is fundamental to test if all items tap the same underlying construct in order to be able to calculate valid (sub) scale scores. Therefore, Marella et al41 tested the NEI-VFQ-25 on its dimensionality using Rasch-analysis, which resulted in two factors (ie, visual functioning and socioemotional traits) and hence could not confirm the 12 domains from the original form. These analyses resulted in statistically ordered response scale-thresholds and validity evidence. From the NEI-VFQ 51, a selection of 27 items was used to further validate the questionnaire, resulting in a unidimensional 17 item questionnaire with statistically tested rating scales. Yet, only limited validity evidence was provided.42

Two types of instruments can be distinguished based on the scoring systems within our selected tools. Seven questionnaires have to be rated on a simple ordinal scale, while two instruments are preference based measures (ie, ViSQoL43, 44 and GUI38) where all patients are asked to choose between different health situations (eg, perfect health vs worst possible health/death). However, none of them demonstrated statistically justified response scales. As a glaucoma-specific QoL-tool, the Glaucoma QoL Questionnaire (Glau-QoL)45 demonstrated good developmental characteristics (except statistical evidence for the rating scale) and strong validity evidence, while the VisQoL43, 44 demonstrated high quality scores as a measurement tool across diverse ophthalmic conditions, yet additional Rasch-analysis might be mandatory to strengthen validity evidence of both instruments.

The revised VCM146 and IVI47 based on Rasch-analysis showed both an extensive and high quality development process and a statistically justified rating scale, yet item-person targeting was poor. This means that the selected items were suboptimal for the intended population, possibly requiring adding and/or removal of items.

PRO's addressing other aspects

Nine instruments were developed to assess either topical treatment or disease-related factors, yet only one of them was tested using modern test theories (ie, Rasch-analysis) (Table 4 ). Five instruments assess frequency of and perceived distress related to side effects, satisfaction with eye drop treatment, adherence to eye drop treatment or a combination of these aspects. Two instruments focus on symptoms of glaucoma, while two other tools were developed to assess self-efficacy and outcome expectation.

Table 4 Patient-reported outcomes addressing side effects, satisfaction and adherence with eye drop treatment and symptoms of glaucoma, self-efficacy and glaucoma outcome expectations (see Table for an explanation of the criteria and their rating)

Both the Treatment Satisfaction Survey for Intra Ocular Pressure (TSS-IOP) and COMTOL assess side effects and satisfaction with glaucoma treatment, yet the COMTOL-questionnaire was only validated in patients treated with pilocarpine or timolol, meaning that not all the instrument domains could be adequately psychometrically evaluated. Yet, if studies aim to compare different eye drops, the TSS-IOP should be chosen as it shows acceptable reliability and good validity across all eye drop classes. Except for side effects, the content of both instruments differs, as the TSS-IOP addresses satisfaction and bothersomeness with factors related to eye drops (eg, eye drop effectiveness) and the COMTOL questions activity limitations (ie, driving) because of eye drops as well as the impact of side effects and activity limitations on QoL. Compared with the COMTOL, the TSS-IOP demonstrated both a higher quality developmental process in view of identifying and selecting items and showed better validity evidence.48, 49, 50 The Glaucoma Satisfaction Questionnaire (Glausat) was created to primarily assess patient satisfaction with eye drop treatment. Besides containing items addressing side effects and general treatment satisfaction, the Glausat also contains items describing the ‘ease of use’, ‘efficacy’, ‘expectations and beliefs about treatment’, ‘impact on HRQoL’, ‘medical care’ and ‘general satisfaction’. Its developmental strategy was satisfactory, yet validation evidence is limited requiring further improvement of the instrument.51

The adherence instrument of Schwartz et al (2009) and the EDSQ questionnaire primarily focused on adherence with eye drop treatment and both their developmental process was theory driven. Only the EDSQ used patient input to generate the items in order to strengthen its content. Yet, both instruments demonstrated significant pitfalls in view of validity. First, the adherence questionnaire of Schwartz et al (2009) showed an inadequate item selection process, a non-statistically justified rating scale and provided poor validity evidence.52, 53 Second the EDSQ, however, well developed was not able to significantly discriminate between patients with different adherence-profiles. Further adaptations and validation of both instruments seem necessary, preferably by means of modern psychometric techniques.

Both the Symptom Impact Glaucoma (SIG) and Glaucoma Symptom Scale (GSS) address visual as well as non-visual symptoms, referring to problems related to the disease process (eg, difficulties with seeing in the dark) and problems directly caused by the topical treatment (eg, red eyes), respectively. The SIG was tested using conventional validity tests, while the GSS underwent both classical and modern validity testing. The SIG adheres more to the quality criteria in view of instrument development compared with the GSS, as it is based on a conceptual framework and patients were involved during the item generation process. However, the item-selection and rating scales were not statistically justified for both instruments. The SIG demonstrated poor validity evidence based on classical tests and the Rasch-analysis of the GSS, elucidated poor item-person targeting in a sample of glaucoma patients, requiring further adjustment of the instrument.32, 33, 47, 54 Both instruments do not seem adequate for assessing the presence and bothersomeness of glaucoma symptoms according to the predefined quality criteria.

Sleath et al55 developed two scales, more specifically one focusing on glaucoma medication self-efficacy and one addressing glaucoma outcome expectations. Self-efficacy refers to the confidence in using the eye drops (eg, overcoming barriers, carrying out specific tasks required to use eye drops correctly). Outcome expectations on the other hand are whether an individual believes that a certain behavior (eg, taking eye drops) will have a positive impact on a health condition (eg, glaucoma). Both tools were developed based on already existing questionnaire with limited involvement of patients. The item selection procedure was based on floor- and ceiling effects and principal component analysis, yet response scales were not statistically tested on disordered thresholds given that the investigators chose the classical approach of validation. Validity evidence is not convincing with only limited evidence on convergent validity for the self-efficacy scale.55

Discussion

Objective measures such as visual field defects and visual acuity only provide limited information about the impact of glaucoma and its treatment on patient's daily life. Therefore, integrating the patients’ perspective by using PRO-instruments gain more and more attention in clinical studies as well as in clinical practice. In clinical studies for instance, it is no longer sufficient to demonstrate that a new drug is significantly more effective than another drug based on traditional medical endpoints. Other treatment effects coming from the patients such as side effects and tolerability of eye drops, and the impact of a specific eye drop treatment on the QoL are important to capture and should therefore, according to the FDA regulatory agencies, be assessed in a structured and consistent way. Subsequently, PRO's will be increasingly used as relevant endpoint measures as they are: (1) unique indicators of disease impact, (2) essential for evaluating treatment efficacy, (3) useful for interpreting clinical outcomes and (4) a key element in treatment decision making, which should be based on a combination of objective and patient-reported subjective parameters.9

In order to facilitate the choice for PRO instruments with a high quality to be included in future clinical trials, the primary aim of this review was to provide an overview of all existing PRO-instruments developed for glaucoma specifically or for a broad range of visual impairments including glaucoma, as well as to scrutinize their developmental process and psychometric properties by rating these characteristics based on the FDA-guidelines and quality criteria outlined by Pesudovs et al.12 To our knowledge, this is the first literature review addressing all PRO's available for glaucoma patients.

This review demonstrates that PRO instruments exist covering all categories of PRO's as described in the framework of Acquadro et al.9 Yet, most of the PRO-instruments were developed in view of assessing QoL (n=11), followed by seven instruments with a focus on functional status and nine instruments assessing treatment and disease-related factors (ie, side effects, treatment satisfaction, symptoms, adherence). The latter category seems to be less well addressed, given that this category covers a broad set of PRO's.

This review revealed that the vision-related literature and the glaucoma literature in particular, contain PRO-instruments with different levels of quality, which should therefore be selected and used with caution. An evaluation of these instruments based on a comprehensive framework of quality criteria elucidates that not all retrieved PRO-instruments have been developed or validated following one of the available validation guidelines. According to this framework, PRO-instruments should meet following important criteria: (1) a clear description of its aim and intended use; (2) a conceptual framework or definition of the concept of interest; relevant to the study population; (3) comprehensive consulting with patients and a literature review in view of generating items, and adequate statistical techniques to support item selection; (4) evidence for unidimensionality using appropriate statistics; (5) statistically justified response scales and subsequent scoring system and (6) evidence on validity, reliability and responsiveness.

Of the 27 instruments found, only a few fulfill partially these quality criteria. Overall, the tools for assessing functional status demonstrated poor quality both in view of their development as well as for their validation process. Yet, further adaptation and testing could improve instruments with potential, such as the GQL-15, IMQ and the GSI24, 25, 26, 27, 28 Within the QoL-measures, both the Glau-QoL and VisQoL had an extensive and theory based development process, but were generated and validated according to classical techniques.43, 44, 45 Applying additional Rasch-analysis could strengthen their content and validity. The NEI-VFQ, which is a widely used QoL-instrument, has initially never been tested on its dimensionality,35, 36, 37 which is a major flaw in its development. Other investigators convincingly demonstrated by using modern psychometric techniques that the original tool should be adapted and revalidated.41, 56 The TSS-IOP pops up as the highest quality instrument to assess side effects across different topical treatments,48, 49 yet might be improved as well using Rasch-analysis. If interested in assessing adherence with eye drop treatment, both the adherence questionnaire of Schwartz et al and the EDSQ should be improved, given that both intend to predict nonadherence, but that the discrimination between adherent and nonadherent patients remains difficult.52, 53 Both the scales developed by Sleath et al55 promise to measure self-efficacy and outcome expectations respectively, yet more validity evidence should be provided first to strengthen this statement.

Where do most existing PRO instruments show weaknesses and which pitfalls should future instrument developers avoid?

First, using a conceptual framework, derived from patient input in qualitative studies, as a starting point during the instrument developmental process looks like an exception in the glaucoma-related literature, given that <50% of the instrument developers use it. This confirms previous research of Ferrans57 on QoL-PRO's in cancer patients showing that most of the instruments in the literature do not use a theoretical approach. Nevertheless, developing and using an appropriate and clearly defined conceptual definition/framework is important in order to know what concept to measure and how to measure it. In a conceptual framework, the interrelationships between items within a domain and of domains within a PRO-concept are depicted in a way that the concept of interest can be operationalized and appropriate psychometric analysis can be performed.11, 58 It should provide the rationale for, and specification of, the PRO-outcomes of interest (eg, side effects) in the population of interest (eg, glaucoma patients undergoing eye drop treatment) for a particular decision (eg, choice of appropriate eye drop treatment). Hence, not using a framework can cause difficulties with (1) grouping and scoring of items into domains, (2) the analysis and (3) the interpretation of PRO-scores if one doesn’t know what is assessed.58

Second, many investigators only use expert opinion and/or a literature review to generate a preliminary list of items, yet the crucial factor to ensure a good breadth of relevance, which is the perspective of patients, was neglected in 11 instruments. According to the FDA11 and the applied quality criteria,12 PRO-instrument item-generation is incomplete without patient involvement (eg, patient interviews or focus groups) and should incorporate the input of a wide range of patients with the condition of interest to represent appropriate variations in severity and in population characteristics (eg, age, gender).11

Third, only a limited number of investigators used an appropriate item reduction strategy on a pilot questionnaire using statistical techniques such as factor analysis or Rasch-analysis (60%). Yet, this approach is needed to determine if all items tap the underlying construct being measured. Items discriminating poorly (ie, large floor and ceiling effect), items with large percentages of missing data, unreliable and invalid items need to be discarded. Using these techniques will improve item quality, measurement precision and item-person targeting (ie, targeting of item-difficulty to person-ability).12

Fourth, a lot of instruments contain several dimensions covering a set of items. Too often, those dimensions are created based on the opinion of experts or patients, without performing any analyses to test for unidimensionality (ie, factor analysis, Rasch-analysis or chronbach's alpha) within a scale or subscale. This is necessary to demonstrate that all the items included, fit within a single underlying construct in order to be able to calculate valid subscale scores.

Fifth, traditional summary scoring still remains the most popular scoring system in the ophthalmologic literature, yet most of the instruments did not statistically justify their rating scales and scoring systems (n=20). Summing items hypothesize that all questions have equal importance. Response categories are often accordingly scaled and have equal values with uniform increments form one category to the other (eg, distance between score 1 and 2 is the same as distance between response option 3 and 4).12 Rasch-analysis can therefore be used to detect redundant and disordered thresholds. Differently calibrated response categories can help to provide a more valid scale, compared with a ‘one size fits all’ scoring approach.

Sixth, further validation of PRO-instruments is of course only meaningful if the item content of the new tool was obtained by following the adequate steps as mentioned above. If not, instrument-developers should first optimize their instrument before obtaining evidence on their performance in view of validity, reliability and responsiveness. It is obvious that most of the instruments selected for this review did not follow the ideal developmental process as described in the framework of Pesudovs et al.12 Without first improving the content of the tool, most of the investigators already started to validate their instrument. In that perspective modern psychometric testing may help to improve the content of the instrument and may help to provide stronger reliability and validity evidence. More specifically, Rasch-analysis that is a modern psychometric statistical technique provides a transformation of the ordinal raw score into a linear interval scale permitting the use of parametric statistical techniques. This approach improves the accuracy of scoring and removes noise from the measure, which in turn improves sensitivity to change and correlation with other variables. Additionally, the instruments’ validity can be assessed by analyzing the fit of items to the overall construct and the item-person targeting (ie, targeting of item-difficulty to person-ability).47 Therefore, instruments that are only tested using the conventional techniques are not necessary invalid, yet could still be improved using Rasch-analysis. However, this review revealed that most authors did not try to improve the quality of their instrument, even if the results from validity and reliability tests show unsatisfactory evidence.

Seventh, most papers reported only a limited amount of information related to the practical use of the instrument, more specifically concerning: (1) instructions for users describing how to complete and to score a questionnaire and how to interpret the results; (2) the burden of questionnaire administration, such as the duration of questionnaire administration (only mentioned in six instruments), the font size, the presence of new instructions for each item and the formatting and; (3) understandability and readability of the questionnaire tested in the patient population of interest. This information would be helpful for researchers and clinicians to allow them selecting the best instrument for its intended goal.

Hence, there is still a lot of room for improvement of the quality of existing instruments and newly developed PRO instruments should learn from the drawbacks of others. The quality criteria outlined in the framework of Pesudovs et al12 can certainly help investigators in these efforts.

Recommendations

Following the conceptual framework of Acquadro et al,9 the glaucoma PRO-literature covers all classes of PRO's, yet most of the instruments only adhere to a limited extent to the predefined quality criteria. Ideally, researchers should start from a conceptual framework and should most importantly use the patients’ perspective in view of item generation, for example, by organizing focus groups and in depth interviews. This review clearly shows that this aspect has to improve in future studies and should also be more clearly reported in future papers. The same is true for item-reduction techniques and scoring systems of instruments, which should both be statistically justified. Psychometric testing is limited in some of the PRO's, yet it seems that modern test theories gain more and more attention in the vision-related literature to optimize instruments in terms of item-content and to provide stronger validity evidence.

Other future directions in instrument development could be glaucoma-specific ‘item banking’, referring to Rasch-analysis on all items extracted from several existing questionnaires measuring the same construct. In this approach, all item are calibrated onto a single scale and can be selected manually or by a computer algorithm to target the ability of the patients under test.59

This review adds to the state of the art literature, as it is the first overview of all PRO's available for glaucoma patients, wherein their quality is rated following the FDA-guidelines11 and the comprehensive framework developed by Pesudovs et al.12 Therefore, this overview could serve as a guidance instrument for ophthalmologists and researchers, who plan to use them in pharmaceutical studies or during clinical practice.