Background
More than 2 million U.S. troops have been deployed in recent conflicts in Iraq and Afghanistan (Operation Enduring Freedom/Operation Iraqi Freedom/[OEF/OIF]). The toll of these wars is high, with 31,800 troops wounded (as of May 2010)[
1] and an expected 790,000 expected to seek disability benefits for service related health problems[
2]. Returning service members have been reported to face a wide range of problems in returning to community life including psychological problems, mild traumatic brain injury, marital and financial difficulty, problems with alcohol or substance abuse, and motor vehicle accidents [
2‐
5].
A recent survey found that more than half (52%) of OEF/OIF Veterans had problems controlling anger, 49% reported that their participation in community activities had been impacted, and 42% reported problems getting along with an intimate partner [
6]. A quarter of returning Veterans reported problems in employment and almost as many (20%) reported legal problems[
6].
It is a Department of Veterans Affairs (VA) priority to help these OEF/OIF Veterans return to full participation in community life roles. Thus, measurement of community reintegration is needed to track Veteran health and social functioning and assess the impact of treatment and policy. The Community Reintegration of Service Members (CRIS) is a new measure of community reintegration developed with VA funding to measure participation in life roles as defined by the International Classification of Health and Functioning (ICF)[
7].
Items on the CRIS cover 9 aspects, called chapters in the taxonomy of Activities and Participation as described by the ICF: (1) Learning and Applying Knowledge, (2) General Tasks and Demands, (3) Communication, (4) Mobility, (5) Self-care, (6) Domestic Life, (7) Interpersonal Relationships, (8) Major Life Areas, and (9) Community, Social and Civic Life. The CRIS's three scales measure three dimensions: (1) objective and (2) subjective aspects of participation as well as (3) satisfaction with participation. Items from the CRIS measure are shown in Additional File
1, Appendix A. The Extent of Participation scale asks the respondent to indicate how often he or she experiences or participates in specific activities. The Perceived Limitations in Participation scale asks the respondent to indicate his or her perceived limitations in participation. Lastly, the Satisfaction with Participation scale asks the respondent to indicate the degree of satisfaction with different aspects of participation. In designing the CRIS fixed form scales, we included only those items that demonstrated intraclass correlation coefficients (ICCs) > 0.6 in our pilot same-mode test-retest reliability studies [
7].
Previous research showed that the three fixed form CRIS scales demonstrated strong reliability, conceptual integrity and construct validity[
7,
8]. These findings suggest that the CRIS measure possesses strong psychometric properties and support its use as a standardized assessment measure for the monitoring of community reintegration outcomes of Veterans and wounded warriors from recent conflicts.
All testing of the CRIS measures prior to this study utilized in person survey administration. However, administration of the CRIS measure by telephone would expand the utility of the CRIS by lowering the cost and decreasing the burden of administration;[
9] and therefore, ultimately increasing the likelihood of the measure's adoption. Telephone surveys do not require travel, are not affected by geographic distribution of subjects, and are easily monitored for quality. Thus, they may be a more economical means of conducting interviews[
10]. That said, we were concerned, based on the prior literature, that telephone and in-person administration might yield varying results due to: (a) the CRIS's complex response format which could be confusing by telephone administration, [
11] (b) cognitive demands of completing the survey by telephone, [
12‐
14] and (c) greater potential for social desirability bias for in-person interviews [
15,
16]. Previous studies have reported an ordering effect in repeat administration of quality of life measures using telephone versus mail administration [
17], and telephone versus web administration, [
18] and recommend that mixing of questionnaire modes be avoided when gathering certain types of data [
17,
19]. Thus, we examined potential ordering effects in our analyses.
No prior studies have examined the effect of interview mode, or the effect of mode ordering on the responses of subjects to questions related to their community reintegration. Thus, the overall purpose of this study was to test the equivalence of mode of survey administration of the CRIS measure. Specifically, we examined concurrent criterion validity of the telephone administration of the CRIS, examined whether patient responses to the CRIS measure varied by mode of survey administration (telephone or in-person); and examined whether or not order of survey mode administration (telephone or in-person) was associated with differences in score means and variances. We hypothesized that 1) CRIS scores derived from the telephone administration would be equivalent to those derived through in-person administration and 2) order of survey mode administration would not influence CRIS scores.
Discussion
This study tested the comparability of telephone and in-person modes of administration of a new measure of community reintegration for veterans, called the CRIS. We found, based upon ICCs ranging from 0.85 to 0.92, that summary scores for the three CRIS subscales were largely comparable between modes. The cut-point for acceptable reliability coefficients varies by field of study, with separate values acceptable for different applications. Generally, speaking ICCs above 0.85 are considered acceptable to make decisions about individuals [
21]. Nunnally recommends a minimum reliability of 0.70 for use of a scale in research and 0.90 for use in clinical practice [
22]. As a point of reference, only two of the widely used scales of the SF-36 have reliabilities above 0.90 [
23].
To confirm that our sample size of 102 persons was adequate, we conducted post-hoc power calculations. For the reliability analysis, we estimate that we have achieved power of 80% to detect an ICC of 0.9 under the alternative hypothesis (which is the approximate value for CRIS subscale ICCs), when the ICC under the null hypothesis is 0.81, using an F-test with alpha = 0.05, and two samples of 50 persons each [
25].
We found that 141/151 (93%) of items had ICCs of 0.5 or above, indicating moderate reliability at the item level. However, we did note that 10 of 151 CRIS items (< 7%) had ICCs below 0.5, indicating potential non-equivalence of telephone and in-person administration modes for these items. These items included ones about working, risk taking, and multitasking. These findings should be interpreted cautiously because confidence intervals for the ICC estimates in the current study were wide, and the higher bound of the confidence limits for all items exceeded 0.5. Three items with ICC point values below 0.5 were questions about participation in work or work situations. We believe that these items had very large confidence intervals due to the low percentage of respondents who were working (37%) and the smaller number of subjects who answered each of these questions.
The CRIS scales utilize a complex response format consisting of 7-point Likert-like response scales. There are multiple types of response scales in the measure, each with differing categories of responses (See Additional File
2, Appendix B for response scales). Prior research on telephone versus in-person administration reports both advantages and disadvantages of each mode as well as equivalence between modes. De Vaus suggests that in-person interviews may be preferable for surveys of complex questions with multiple response categories because telephone respondents may have difficulty remembering multiple categories when they answer questions with a large number of response categories[
11]. While telephone respondents may have response cards mailed to them in advance of an interview, for practical purposes this is less than optimal because it requires advance planning and assumes that respondents refer to the cards appropriately during the interview. Because of this, we did not mail response cards in this study. In contrast, in-person respondents have a visual aid, in the form of the response scale displayed in front of them as they answer each item, as well as an interviewer who can respond to facial expressions suggesting confusion and who can point to the appropriate response display while explaining the item.
Telephone respondents have been reported to be less patient with interviews and to avoid conversation that may lengthen the interview[
12]. Some data suggest that telephone interviews are generally completed more quickly than equivalent in-person interviews [
13]. Telephone respondents are in an uncontrolled environment, may be distracted during interviews by things in their environment or they may be multi-tasking at home-by watching TV, cooking or even interacting with others while responding to the interviewer. Thus, they may be less likely to exert the mental effort to answer questions carefully[
13]. A respondent answering a long survey may lose motivation, become fatigued and/or lose focus and be unable to sustain the mental effort needed to carefully consider and answer survey questions[
14]. When these things occur, the respondent may be more likely to respond in a manner that they believe would seem acceptable or reasonable to the interviewer. Non-verbal cues provided through face-to-face interviewing could potentially enhance the motivation of subjects, keeping them more engaged and thus more likely to respond carefully. Furthermore, the more controlled environment of a face-to-face interview can minimize distractions. While we had no way to monitor telephone a respondent's behavior (i.e. potential distractions from multi-tasking), our results suggest that the potential effect on survey responses was negligible.
While in-person respondents may be motivated by the development of greater rapport and enhanced task performance,[
15] the presence of an interviewer may create other biases. Face to face interviews may be more biased due to respondents' desire to express socially acceptable characteristics, and may be influenced by the gender and other observable characteristics of the interviewer[
11]. Previous research suggests that social desirability bias is more likely to occur when questions relate to sensitive topics such as sexuality, drug use and risk taking behavior; topics that are included in the CRIS [
16].
Greater physical distance between the respondent and the interviewer may provide a greater sense of safety and lead to responses that are more candid. Thus, one would expect that face-to-face interviews would diminish social distance and lead to greater social desirability bias in survey responses because the respondent is observed directly by the interviewer who can respond to non-verbal signs of approval, or disapproval in the form of facial expression or body language. This is confirmed by reports that suggest that the greater anonymity associated with telephone surveys yield more candid reports of risky or socially disapproved behavior [
25,
26]. However others researchers have reported the opposite effect, indicating that respondents to in-person interviews were more likely to report vulnerabilities such as disability, than respondents to telephone interviews [
13,
27]. It is possible that potential social desirability bias related to sensitive behavior might impact several of the CRIS items, particularly those related to risky behavior and frequency of sexual activities [
16].
While it is possible that the lower ICC values of the items related to risk taking behavior and driving safety that we observed in this study might be attributable to social desirability bias, we do not believe that this was the case. If social desirability was a factor, we might expect that subjects would report higher functioning (i.e. higher scores) during the in-person interview as compared to the telephone interview. We would also have expected to find a lower ICC value for the item related to frequency of sexual relations. Our examination of the raw data shows that the mean of the responses to the question, "How often did you engage in risky behavior?" was lower (mean = 6.1, sd = 1.6) for the in-person then it was for the telephone administration (mean = 6.5; sd = 1.2). The mean of the responses to the items, "Others expressed distress while being a passenger in my car," were nearly identical: 5.6 (sd 1.5) for the in person administration and 5.6 (sd 1.4) for the telephone administration. None of these differences were statistically significant. Thus, we believe that the lower ICCs resulted from the wide confidence intervals around the point estimate, rather than differences between modes of administration.
There were five additional items with ICCs below 0.5. Because these items related to multitasking, remembering what was read, keeping track of daily tasks and activities, and limitations in volunteer work we would not have expected them to be particularly affected by social desirability bias. Examination of the raw data (not shown) shows nearly identical means scores for the groups, suggesting that the lower ICC values were not a substantial concern, and reflected a lack of precision around the estimates in this sample. Additional research is necessary to confirm this finding.
Our study design limits inferences about whether or not potential differences in item responses between modes were attributable to the mode of survey administration or to the actual test-retest reliability of the item. Test-retest reliability is not an inherent property of a measurement instrument, but can vary by population[
28]. However, prior research using repeat administration of the in-person CRIS in a very similar sample showed that all items had ICCs of > 0.6[
7]. Further research testing equivalence of mode of administration is needed to confirm our current findings.
Linda Resnik, PT, PhD is a Research Health Scientist at the Providence VA Medical Center and Associate Professor (Research) in the Department of Community Health, Brown University, Providence, RI
Melissa A. Clark, PhD is Associate Professor, Department of Community Health and Obstetrics and Gynecology, Brown University
Matthew Borgia, BS is a graduate student in the Department of Biostatistics, Brown University
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
LR obtained funding for this study, conceptualized the design, oversaw the project, oversaw the analyses, and took the lead in writing the manuscript. MC assisted in conceptualizing the study design, interpreting the analytical results, and participated in writing and review of the manuscript. MB participated in data cleaning, data analysis, interpretation of results, writing and review of the manuscript. All authors read and approved the final manuscript.