Background
Surveys represent one of the most efficient and inexpensive research methods available to collect representative, high quality data from large numbers of research participants. They therefore frequently serve as the backbone used to define the scope and magnitude of many potential public health problems. In the United States, for example, large national surveys have been used to estimate what proved at the time to be surprisingly high levels of mental illness within the general population [
1], physical violence within families [
2], and sexual assault among women [
3]. Even the United States Census, which serves as the basis of apportioning Congressional representatives and taxes to each state, is survey-based. Typically, survey data are either collected by interviewers using face-to-face or telephone communication with the participant or via the participant’s own self-report.
Regardless of the topic studied and how the information is collected, scientifically correct, survey-based prevalence estimates require that research participants be representative of the population from which they are drawn, that participants actually answer the questions that are asked of them, and that they answer those questions honestly. On average, research participants disclose sensitive and personal information, such as mental health symptoms, drug misuse, and history of sexual assault more frequently when responding to self-administered questionnaires than when taking part in face-to-face or telephone interviews [
4‐
7]. Studies suggest that disclosure of sensitive information on self-administered questionnaires is enhanced yet more when participants respond anonymously instead of confidentially [
5,
8‐
11]. This implies that anonymous, self-administered surveys may be the optimal method for accurately cataloging information about certain public health problems, such as the prevalence of physical or sexual abuse or of mental health symptoms.
Although by no means proven, most survey researchers take the stance that methods that generate higher prevalence estimates for stigmatizing or sensitive information are probably more accurate than methods that generate lower estimates. This stance, however, rests upon a rather unlikely assumption that all people carry the same propensity to participate in survey research. Particularly when a survey topic is sensitive, survey respondents tend to differ substantially from non-respondents [
12]. Therefore, three mechanisms might explain why anonymous surveys generate higher prevalence estimates of stigmatizing or sensitive information compared to non-anonymous surveys: 1) propensity to participate in research is in fact equal across all members of a sampling frame, and anonymous methods promote more honest self-disclosure among the participants with stigmatizing experiences; 2) sampling frame members with stigmatizing experiences are more reluctant than others to participate in surveys, but anonymous methods reduce this inherent reluctance (under selection is reduced); 3) anonymous methods disproportionately increase the propensity of people with stigmatizing experiences to participate in the survey relative to those without such experiences (over selection is induced). The first two mechanisms reduce bias; the last introduces bias. Without information about non-respondents’ characteristics relative to respondents’, however, one cannot determine which possibility is correct. Unfortunately, under typical anonymous conditions, such information is unavailable.
Anonymous surveys carry other drawbacks relative to confidential surveys. For example, unlike confidential survey methods, anonymous survey responses cannot be linked to administrative or other non-survey data, thus limiting anonymous data’s richness and utility. Also, unless creative methods are employed, researchers often cannot track or send follow-up mailings to non-respondents of anonymous surveys, thus obtaining inferior response rates (e.g., [
13]). While low response rates do not necessarily correlate to poor data quality, risks for non-response bias do increase with lower response rates.
Two methods to bypass the tracking limitation in anonymous surveys have been described. In one, participants return a completed survey and a separately mailed postcard. Only the postcard contains a unique identifier, which is used to track respondents [
14‐
16]. However, this method increases respondent burden, which can reduce response rates. Furthermore, participants may find it confusing and hence return only one item –e.g., the survey or the postcard, but not both. Receipt of equal numbers of postcards and surveys do not necessarily mean the same people returned both. Even when both are returned by the same person, the survey may be received considerably earlier than the postcard. The participant may therefore be subjected to additional mailings until the postcard is received, which may be annoying, and the researcher may incur unnecessary mailing expenses. Finally, unbeknownst to the researcher, some respondents may return more than one survey, leading to the overweighting of those individuals’ responses.
A second approach uses tracking envelopes, which simplifies respondent burden, circumvents the problem of postcards and surveys returning at different times, and avoids analyzing multiple responses from a single participant [
17]. In this approach, the envelope contains a unique identifier, but not the survey. The two are returned together but separated immediately upon opening. Received surveys are then intermixed in some random fashion to avoid any possibility of linking them back to their original envelopes. If one participant returns more than one envelope-survey pair, all but the first is discarded. Until the envelope and survey are separated, however, the survey is not truly anonymous. Participants must rely on the researcher’s integrity to maintain anonymity, and they may be less willing to disclose sensitive information relative to the postcard tracking method, where privacy is absolute. Each approach has pros and cons, but the two’s effect on response rates, survey completeness, or disclosure of sensitive information have never been directly compared.
In the present paper we address these issues using a novel technique we developed, the pre-merged questionnaire, which allows comparisons between respondents and non-respondents even under anonymous survey conditions. The study involved a potentially sensitive, self-administered questionnaire asking about several traumatic experiences, including sexual assault during military service. The population of interest was male US Gulf War I era veterans with possible posttraumatic stress disorder (PTSD) who had previously applied for Department of Veterans Affairs (VA) disability benefits. We had reason to believe that sexual assault experiences were particularly high in this population [
18]. However, we also feared that traditional rape myth beliefs [
19], which may be especially strongly held by military service members socialized into a masculinized subculture, might either deter male sexual assault survivors’ participation in the research or impede their disclosing of such experiences.
Using 3 levels of increasing privacy tied to the tracking methods described above, we hypothesized that response rate and participant representativeness, the number of sensitive questions actually answered by participants, and the proportion of participants disclosing potentially sensitive information would increase in a dose-response manner from the lowest to highest privacy condition. Because higher incentives consistently improve survey response [
20], we also tested the impact of two incentives, $10 versus $20, on survey response. We expected the response rate, number of sensitive questions answered, and proportion of participants disclosing sensitive information would be higher among those receiving the $20 incentive compared to the $10 incentive.
Discussion
In this randomized controlled trial, more survey privacy was not associated with statistically significantly higher response rates compared to less privacy, nor did tracking/privacy condition affect the proportion of respondents who actually answered our sensitive questions. Instead, each tracking/privacy condition attracted its own unique pool of respondents, which in turn may have influenced our group-specific estimates of sexual abuse, childhood physical abuse, combat, and mental health problems—despite the fact that all participants originated from the same sampling frame. Estimates of sexual abuse, for example, were more than 2 times higher in the Anonymized-Envelope condition than in the other two conditions.
As expected, the higher incentive resulted in a substantially higher response rate than the lower incentive, but there was no association between incentive and the proportion answering our sensitive questions. As with the tracking/privacy manipulation, each incentive appeared to attract its own unique pool of respondents, with the larger incentive attracting younger workers for pay who were less likely to say they were receiving disability benefits compared to the smaller incentive. Statistically, prevalence estimates for potentially sensitive or stigmatizing material did not differ significantly by incentive, despite some numerically large differences. For example, more than half of respondents randomized to the $10 incentive screened positive for depression, compared to about a third of respondents in the $20 arm.
According to leverage-salience theory [
29], individuals attend to different criteria when deciding to return a survey and, further, assign to each criterion different weights and importance. These are known as “leverages”. In the present study, each tracking/privacy and incentive condition appeared to trigger a different set of leverages, so that unique subpopulations selectively participated in each of the study’s arms. When considering sensitive material, therefore, one cannot assume that the survey method generating the highest estimate is most accurate.
Since Anonymous-Postcard respondents did not differ significantly from non-respondents on available measures, one might be tempted to conclude that this tracking/privacy method generated the most representative sample of respondents and hence most accurate prevalence estimates. If so, one would also have to conclude that the Anonymized-Envelope approach over recruited sexual abuse survivors. History of sexual abuse was 13.6% among Anonymous-Postcard respondents and 33.3% among Anonymized-Envelope respondents. However, we have shown elsewhere that, even when using Anonymized-Envelopes, survey respondents underreport their military sexual assault experiences by a factor of three [
30]. This suggests that the Anonymized-Envelope method either reduces under selection of veterans with sexual abuse histories or optimizes more “honest” reporting among those who have such histories—or both—compared to the Anonymous-Postcard method. It may do so, however, at the expense of either over excluding veterans with a history of childhood physical abuse or discouraging “honest” reporting of childhood abuse. In the present study the Anonymized-Envelope method generated a substantially lower, albeit not statistically significant, estimate of childhood physical abuse of 59.4% compared to the Anonymous-Postcard’s estimate of 75.8%.
In general, tracking/privacy condition and incentive level appeared to affect respondent representativeness independently, with incentives’ principal impact being the recruitment of younger and healthier participants. These findings may be reassuring to Human Studies oversight boards, who might otherwise worry that large incentives coerce the sickest and most vulnerable into survey research participation. Halpern et al. [
31] has shown that higher payment levels do not override research participants’ risk perceptions when considering whether to enroll in clinical trials, and, furthermore, poorer, presumably more vulnerable participants are actually less sensitive to higher incentive levels than are wealthier participants. Similar findings have been reported for those deciding whether to respond to a survey [
32].
The present study offers proof-of-concept for pre-merged questionnaires’ utility. However, pre-merged questionnaires will prove most powerful when they incorporate administrative information that is highly related to the survey’s topic (e.g., sexual abuse, childhood abuse) instead of basic demographic information. Because we did not have such information for the present study, we cannot say whether our differing estimates for these sensitive data across the three tracking/privacy conditions were a function of reducing or inflating selection biases, a function of enhancing or impeding “honest” reporting, or both. Future research will be needed to explore these issues further. It may well be that different tracking/privacy methods will prove best for different sensitive topics.
We used a computerized system to manage the tracking and administrative data interface in the present study, but the pre-merged questionnaire concept could easily be applied to manual methods. For example in a study using up to three survey mailings per subject, one could pre-print 3 stickers per subject, file them under each subject’s name, and then throw away any remaining stickers once the subject’s postcard or envelope ID was returned. By study’s end, only non-respondents’ stickers would remain.
Pre-merged questionnaires carry important limitations. Researchers must be selective in what data they encode to keep the sticker from becoming uniquely identifying. If too much information is included, participants might become identifiable based on their unique combination of administrative data. We dichotomized age and service branch in the present study for this reason. Pre-merged questionnaires also cannot capitalize on new information. Health care visits occurring after a survey is mailed cannot be linked into a dataset, for example. Nonetheless, the technique offers an advance over usual anonymous methods, particularly in its ability to assess for non-response bias, and it could easily be applied to other sensitive topics.
This study’s strengths include its randomized, controlled design and demonstration of a unique technique to overcome what has historically been an important limitation of anonymous methods –namely, an inability to evaluate non-response bias. We also compared two tracking methods that can be used in anonymous surveys. Limitations include its relatively small and unique sample. Since we did not have access to verifying information, we cannot say how honestly participants reported their experiences. Findings’ generalizability to other sensitive topics, to non-veterans, or to women is also uncertain. The study was powered to examine main effects of incentives on response rates, and we may have made Type II errors when examining secondary outcomes, effects of the different tracking/privacy conditions, and potential interactions. When findings appeared suggestive, however, we described them in the text. We also made multiple comparisons, which may have inflated our Type I error.
Conclusion
We anticipated that greater privacy and larger incentives would be associated with higher response rate, better participant representativeness, more survey completeness, and greater disclosure of potentially sensitive information. Results showed no association between privacy and response rate or survey completeness, supported the association between greater privacy and participant representativeness, and yielded mixed effects for the disclosure of sensitive information. A larger incentive was associated with higher response rate and better participant representativeness but no association with survey completeness. In the intermediate privacy arm, lower incentive—not higher—was associated with reporting more PTSD symptoms. Otherwise, we found no statistically significant associations between incentive and disclosing potentially sensitive information.
Having shown that different tracking/privacy conditions yielded different estimates of sensitive information, we cannot, unfortunately, tell which estimate was most accurate. Traditionally, higher disclosure rates of sensitive or stigmatizing information have been interpreted as being more accurate than lower rates, but our data suggest that apparently different disclosure rates may simply be a function of the subpopulations successfully recruited into a survey. This possibility needs greater investigation. Pre-merged questionnaires bypassed many of the limitations historically associated with anonymous survey methods and could be used to explore non-response issues in future research.
Competing interests
The authors declare they have no competing interests.
Authors’ contributions
MM obtained funding; designed the study; oversaw data collection, analysis, interpretation; and drafted the manuscript. MAP also assisted in obtaining funding. MAP, AKB, ABS, SN, and JPG contributed to data collection, analysis, and interpretation of data. MRP contributed to analysis and interpretation of data. MAP, AKB, ABS, SN, JPG, MRP read and approved the final manuscript.