Introduction
The proper use of patient-reported outcome (PRO) measurement in clinical settings has become increasingly important to obtain more comprehensive information to guide clinical decision-making, treatment planning, and clinical management [
1]. Traditionally, PRO data are collected through face-to-face interviews or patient self-report to paper-based questionnaires, which is labor intensive and time consuming. With the emergence of computer technology, electronic methods of data collection (e.g., touch-screen response or interactive voice response) are becoming more popular and viable alternatives to conventional surveys carried out in clinical practice [
1‐
3].
Electronically administered questionnaires allow data to be automatically entered real time into a database, after which the score is immediately calculated; thus, data coding errors and the workload of health professionals are reduced [
3,
4]. The time required by the patient to complete the electronically administered questionnaire such as electronic patient-reported outcome (ePRO) questionnaire, is also reduced for routine clinical practice [
3,
5]. In addition, increased use of the ePRO questionnaires in clinical assessments may promote integration of PRO and clinical information. Once patients have completed the ePRO questionnaires during their clinic visits, their item responses will be automatically scored and summarized for potential clinical use with other patient-related information. The integrated results are readily available in easily interpretable reports that can be viewed together by the clinician and their patient during clinical encounter. Therefore, the process can enhance the efficiency and quality of healthcare and patient-physician communications [
3,
6,
7]. Nonetheless, the equivalence of the ePRO version and its original paper-and-pencil version should be thoroughly evaluated, and the patient preference and acceptance should also be examined before shifting from paper-and-pencil data to ePROs without demonstrating its feasibility [
6,
8]. Some studies have examined and validated the measurement equivalence of paper-and-pencil-based version and touch-screen computer-based version; the results showed that the data collected from paper- and computer-administered PROs were very similar and the touch-screen version was well accepted by most subjects [
6‐
10].
Prostate cancer is a common disease among men in many Western countries and developed Asian countries. The standardized incidence in Taiwan (adjusted by the 2000 world population) has increased from 1.86 per 100,000 men in 1979 to 28.77 per 100,000 men in 2010 [
11]. Moreover, long-term survival results have shown that health-related quality of life (HRQOL) has become an important outcome measure in different clinical settings [
12]. The European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Study Group developed the EORTC QLQ-PR25, a 25-item questionnaire designed for use among patients with localized and metastatic prostate cancer, is a commonly used tool to assess HRQOL in patients with prostate cancer. It includes four domains that assess urinary symptoms, bowel symptoms, treatment-related symptoms, and sexual activity and functioning. The results of international field validation have been published in 2008 [
13]. The paper-and-pencil versions of the EORTC QLQ-PR25 have satisfactory reliability and validity [
12‐
14]. In this study, we used the Taiwan Chinese version of the EORTC QLQ-PR25 questionnaire published by Chie et al. in 2010 [
12]. It was also shown to be reliable and valid to assess HRQOL using the modern test theory approach in our previous study [
15]. To the best of our knowledge, the psychometric properties and feasibility of the touch-screen version of this questionnaire for prostate cancer patients have not been well established, and no data have been reported in Taiwan. Therefore, in this study we sought to assess the measurement equivalence and feasibility of the paper-and-pencil and touch-screen versions of the EORTC QLQ-PR25 in patients with prostate cancer in Taiwan.
Discussion
The measurement equivalence between paper-based versions and touch-screen versions of the questionnaires has been previously demonstrated in various diseases [
4,
7,
18,
32], but to the best of our knowledge no data on the EORTC QLQ-PR25 in Taiwan have been reported. Our results showed the percentages of global agreement in all EORTC QLQ-PR25 domains were > 85%. All results from equivalence tests were significant, indicating measurement equivalence, except for Sexual functioning domain. The results of measurement equivalence were confirmed using the modern test theory approach. Only one out of 24 items exhibited DIF between the two modes. The overall rate of acceptance and preference for the touch-screen mode were quite high.
In order to develop a computerized version of the EORTC QLQ-PR25 for use in this study, we completed a required user’s agreement that permits us to use the questionnaire or for any change such as computerizing the questionnaire, and be held responsible for the quality of the measurement [
33]. Permission was granted, per its policy, to allow us to use and migrate the paper form of the EORTC QLQ-PR25 to a tablet format for research purpose. Moreover, the U.S. Food and Drug Administration (FDA) released a PRO guidance in 2009, which suggested that a small randomized study is needed to provide evidence to confirm the new instrument’s adequacy for any change of instrument such as changing an instrument from paper to electronic format [
34].
In our randomized cross-over designs, subjects were randomized into one of the two following sequences: paper then touch-screen or touch-screen then paper. Therefore, each subject served as his own control. Cross-over trials were conducted within participant comparisons, whereas parallel designs were conducted between participant comparisons. The influence of confounding covariates and the majority of between-patient variation could be eliminated using a cross-over design [
34,
35]. In this study, our results from mixed model analysis showed that no mode-by-order interaction effect was present; thus, the carry-over effect did not exist. Moreover, when we refitted the main effect by interaction term removal, the order effect did not exist. These results showed that the cross-over randomized design in our study is methodologically appropriate. Fewer patients may be required in the cross-over design to attain the same level of statistical power and precision. Moreover, this design permitted opportunities of head-to-head trials, and subjects receiving multiple treatments can express preferences for or against particular treatments [
35,
36].
In this study, we used intraclass correlation coefficients (ICCs), i.e., Pearson correlation coefficients when there were only two evaluations within a subject, to estimate the between-mode (touch-screen to paper; paper to touch-screen) test-retest reliability (0.40-0.84 for each item, 0.45-0.78 for each domain, and 0.80 for total score), which was consistently lower than the within-mode (paper-paper) test-retest reliability (0.61-0.93 for each item and 0.85 for total score) of the EORTC QLQ-PR25 questionnaire in another study [
37]. The between-mode test-retest results may reflect the limitations of the original instrument rather than those of the data collection mode.
The equivalence testing that was used in reliability analysis was performed by two one-sided
t-tests, which tested paired differences to be equal [
38]. All the results from equivalence tests were significant indicating good equivalence, excluding Sexual functioning domain. It should be noted that of the six Sexual activity and functioning items in the EORTC QLQ-PR25, four items (SX52–SX55) are conditional and apply only to sexually active respondents. In this study, approximately half of the patients reported not being sexually active and did not respond to those 4 items, resulting in fewer responses to those conditional items in this domain. Moreover, quite a few responses were missing or showed variance in this domain due to the participants’ embarrassment or reluctance to share details about their private sexual life in this study population [
39,
40]. The above reasons may have resulted in less precise measurement in the Sexual functioning domain.
Moreover, DIF analysis was used to examine the difference in the difficulty papameters between two modes of administration (paper-and-pencil vs. touch-screen). Application of DIF analysis in this study allowed a thorough evaluation of measurement equivalence. We first estimated the difficulty calibration of each item for each mode separately using the rating scale model to avoid correlation problem. Then, DIF analysis was applied to assess the equivalence of item difficulties between these two modes. Although the problem of correlated data potentially existed, as the same patients in these two groups responded to the same survey twice due to the cross-over design, we mainly performed a comparison of the difficulty calibration measures, which is not so sensitive to the correlated effects of the cross-over design. For example, when performing the parallel comparison between two groups, the correlated data problem was mitigated to some extent due to the different sequence of administration, i.e., half the patients completed the questionnaire using the touch-screen mode first, followed by the paper mode, and the other half completed the paper mode first, followed by the touch-screen mode. In addition, the influence of confounding covariates and much of the between-patient variation could be eliminated using a cross-over design, which could also increase the efficiency for estimating and testing [
35,
41]. DIF also benefits from this type of design.
Furthermore, there are two possible reasons to explain the inconsistency between DIF analysis and equivalence/agreement analysis. First, these two approaches are conceptually different. DIF was used to test the scale measurement properties for each individual item between the two modes, while equivalence analysis was used to show the mean score difference of patients between two modes. Second, equivalence test as applied in this study began with a null hypothesis that the two mean values were not equivalent, and attempted to demonstrate that they were equivalent within a practical, preset limit. This test is conceptually opposite to the independent t-test in the DIF analysis and would lead to the result that the Sexual functioning items were more limited in equivalence/agreement, but the DIF test failed to identify this.
In this study, touch-screen administration required approximately 30% more time to complete than the paper questionnaire (20.5 min vs. 14.7 min). We speculate that the difference in time to complete the questionnaire may be because the touch-screen/paper group was given the touch-screen version questionnaire first, and most respondents (82%) had no experience using a computer. Furthermore, the patients may have spent more time because this was their first exposure to the EORTC QLQ-PR25 questionnaire; thus, the amount of time taken for the touch-screen mode was longer than that for the paper mode in the touch-screen/paper group (20.5 min vs 14.7 min) (
p < 0.001). However, we found that when the touch-screen mode was performed in the second assessment, the time it took to complete the questionnaire decreased significantly (
p = 0.002) (Table
1). We anticipate that the time to complete the questionnaire via touchscreen will decrease over time as patients get used to the system. In addition, the results can be analyzed in real-time to facilitate clinical diagnosis and improve patient-physician relationships. A previous study reported that the transfer of assessment data to a computer may differentially influence the responses of older patients [
42]. Previous studies of patients aged from 48.1 years to 65.0 years showed the touch-screen mode was a reliable and user-friendly method of assessing quality of life [
43‐
46]. Our results showed that 97% of the patients reported that the touch-screen version was user-friendly and approximately 67% reported preferring the touch-screen version to the paper-and-pencil version despite the older average age and lack of previous experience using a computer in our study population. Moreover, most patients (92%) reported that the touch-screen was easy to use, which was similar to the results reported by Pouwer et al. [
43], which showed that the touch-screen questionnaire was easier for patients to complete even if they had rarely or never used a computer.
Patients often expect the physician to know the results of a questionnaire shortly after they have completed it; however, PROs assessed using the paper-and-pencil mode cannot be transferred to the physician’s clinic in real-time. Various studies have confirmed that incorporating routine standardized HRQOL assessments in clinical oncology practice can facilitate the communication and discussion of HRQOL issues and can increase physicians’ awareness of their patients’ quality of life. Computer measurements are well accepted by patients who generally consider ePROs to be useful tools with which to inform their doctor about their problems [
47‐
49].
A limitation of this study was that the washout period between the two modes of administration was only 120 minutes, which might not have been sufficiently long to completely eliminate the carry-over effects. It is possible that some residual memory or carry-over effects from the first administration were still present when the patients were asked the same set of questions during the second administration. As a result, the level of agreement between the two administrations is possibly inflated [
35,
41]. A longer interval, however, may require patients to spend too much time waiting, which might therefore discourage them from completing the second administration. Asking patients to come back on another day would likely greatly reduce the subjects’ willingness to participate. Most importantly, because of the longer waiting time, the patients’ condition may change and result in different answers, thereby affecting the consistency of the responses. Therefore, after careful consideration, we chose an interval of 120 minutes. Moreover, once patients had finished the first questionnaire, they were taken to the education room to watch health education videos which helped to “dilute” the memory of the first administration. Thus, we believe that the memory effect was likely lessened due to the 120-minute interval between tests and the use of health education videos.
Competing interests
All authors declare that they have no competing interests.
Authors’ contributions
YJ Chang and CL Peng performed the statistical analyses and drafted the manuscript. WM Liang and CH Chang designed the study, wrote the protocol, and revised the manuscript. HC Wu was the coordinator of this research and conducted the field work. HC Lin, JY Wang, and TC Li participated in the design of the study, wrote the protocol, and supervised the execution of the study. YC Yeh was responsible for data collection and interpretation. All authors contributed to and approved the final manuscript.