Background
In large state-of-the art public health surveys, such as UNICEF's Multiple Indicator Cluster Surveys [
1], USAID's Demographic and Health Surveys [
2], and the US Centers for Disease Control and Prevention's (CDC) National Immunization Survey [
3] and National Health Interview Survey [
4], much effort and expense are dedicated to maximizing representativeness and validity of results. This is done by incorporating strategies to reduce the impact of various sources of error, such as sampling error, measurement error, nonresponse error, and noncoverage error. However, in smaller field surveys worldwide, limited time and financial resources, poor accessibility of households, and lack of available transportation and staff may limit the extent to which these strategies can be implemented. While these surveys often provide practical information used as management tools for evaluating and targeting health services, in extreme cases survey error could result in severe bias, proliferation of misinformation, and suboptimal public health response.
Previously published research and discussion have focused on sampling error and the sampling methods used in field surveys [
5‐
12]; however, several other issues are frequently overlooked. In this study we examine additional sources of survey error and the potential impact of three "shortcuts" that are sometimes taken when conducting household surveys:
• Use of only a single or inaccurate source of information for critical outcomes, which may cause measurement error
• Not revisiting households that are unavailable for interview at the initial visit, which can contribute to nonresponse error, and
• Excluding areas that are difficult to access or are far from primary population centers, which can result in noncoverage error.
To demonstrate these points, we used data from a vaccination coverage survey of young children conducted in the US Commonwealth of the Northern Mariana Islands (CNMI) in July 2005. In this survey, three sources of vaccination history information were used, multiple follow-up attempts were made to interview household members, and all inhabited areas were included in the sampling frame. We examine the impact of these efforts and when possible, estimate the added time that was required to incorporate each of them into the survey. We also illustrate how a simple sensitivity analysis can be used to estimate potential bias in surveys that exclude portions of a population. Finally, we discuss the factors that might influence the magnitude of the impact of each of these "shortcuts" in various settings.
All surveys are subject to sources of error [
13], and understanding how error may bias survey results is crucial for drawing appropriate conclusions and response. The objectives of this report are to increase awareness of potential sources of survey error, to encourage survey researchers to implement strategies that minimize these sources of error, and to remind data users of the need to critically assess the extent to which survey results may be biased.
Methods
Setting and survey description
CNMI is located in the Western Pacific Ocean between Australia and Japan, and is composed of a chain of 15 islands that extends 400 nautical miles. The population of CNMI is approximately 70,000 people, with an annual birth cohort of approximately 1300 children on its three inhabited islands (1,130 on Saipan, 85 on each of Rota and Tinian) [
14]. Rota and Tinian, located 73 and five miles from Saipan, respectively, are accessible via daily commuter flights from Saipan. Due to their relatively small populations and remoteness, many health services are provided to residents of these islands through periodic outreach activities.
The primary objective of this survey was to estimate the percentages of children aged one, two, and six years who had received the standard vaccination series recommended for their ages. Among children aged one year (12–23 months), we evaluated receipt of vaccines recommended by age 12 months: three doses of diphtheria-pertussis-tetanus vaccine (DPT), three doses of inactivated poliovirus vaccine (IPV), two doses of hepatitis B vaccine, and three doses of Haemophilus influenzae type b vaccine (Hib). Among children aged two years (24–35 months) we evaluated receipt of vaccines recommended by age 24 months: four doses of DPT, three doses of IPV, one dose of measles-mumps-rubella vaccine (MMR), three doses of hepatitis B vaccine, and three doses of Hib. For children aged six years, we evaluated receipt of vaccine doses required for school entry: five doses DPT, four doses of IPV, two doses of MMR, and three doses of hepatitis B vaccine.
On Saipan, the largest of the three islands, a population-based cluster survey was conducted. District population estimates were obtained from the most recent census, conducted in 2000. Thirty clusters were selected by systematic random sampling with probability of selection proportional to estimated size of district populations. Next, households were chosen by systematic random sampling in the selected clusters. A household was defined as a group of persons who live and eat together; children who usually resided in one household but were temporarily staying in another were included in the household where they usually resided. A household was considered eligible for the survey if there was at least one child aged one, two, or six years currently residing there. A target sample size of 22 children per age group per cluster was set so that estimated vaccination coverage would be within 7% of true coverage with α = .05, assuming a design effect of two and an expected response rate of 90%. The sampling interval (i) was determined for each cluster by dividing the estimated number of age-eligible children by the target sample size. Interviewers proceeded in a serpentine manner throughout all inhabited areas of each selected cluster and visited every ith household to determine eligibility. All eligible children in selected households were included. Interviewers continued within a cluster until the entire cluster was canvassed, regardless of the number of households visited or children surveyed. On Rota and Tinian every household was visited due to small population size.
Interviews were conducted by nine teams of two health workers. Before conducting the interview, interviewers explained the purpose of the study and the respondent was given the opportunity to ask questions about the survey. Written informed consent from a parent or legal guardian was requested for participation in the household survey and to obtain vaccination records from the electronic registry and the health department. This study was approved by CDC's institutional review board, and additional details have been reported elsewhere [
15].
Using multiple sources of vaccination history information
Vaccination histories were obtained from three separate sources of information. First, household-retained vaccination cards were reviewed and transcribed during the household interviews. Second, health records were obtained from the public health department for study children by matching the child's name, date of birth, and hospital identification number; vaccination histories were abstracted from these records. Third, vaccination histories for study children were also obtained from the computerized vaccination registry, which was implemented in 1989. In cases where the vaccination cards, health records, and vaccination registry were not identical, vaccinations recorded in the three sources were combined. We evaluated vaccination coverage based on each source of data independently, as well as vaccination coverage when all sources of information were combined. We also examined agreement of sources, assessing the number of independent sources that reported each child as completely vaccinated.
Revisiting households not available at the initial visit
If no adult respondent was home at the time of the initial visit, interviewers returned to the household on evenings or weekends to interview the household. At least two such follow-up visits were made for each household. We evaluated eligibility ascertainment rates, eligibility rates, and vaccination coverage rates that would have been achieved if the study had been conducted with and without these follow-up visits.
Including areas that are difficult to access
We estimated vaccination coverage rates for each of the three inhabited islands of CNMI. Rota and Tinian are difficult to access due to their remoteness from the main island of Saipan. To determine the effect of including these difficult-to-access areas, we compared vaccination coverage based on results from Saipan alone with results obtained by estimating vaccination coverage as a weighted average of all three islands.
Statistical analysis
Data entry and cleaning were conducted in Epi Info 2002 version 3.3.2 (Centers for Disease Control and Prevention, GA) and SAS version 8.0 (SAS Institute, NC). Analyses were conducted in SUDAAN version 9.0.0 (Research Triangle Institute, NC).
Bivariate analyses were used to estimate vaccination coverage and 95% confidence intervals for children identified with and without follow-up, for each island and for CNMI overall, and by source of vaccination information. All analyses account for the sampling design and were weighted to adjust for differences in probability of selection.
In a survey with excluded populations, a simple sensitivity analysis can be conducted to determine the likely magnitude of potential survey bias. We conducted a sensitivity analysis of potential bias due to exclusion of children in households for which eligibility was not determined. We assumed that children in households with unknown eligibility were twice as likely to be unvaccinated (lower bound) and half as likely to be unvaccinated (upper bound) as those included. By combining these estimates with those for children who were included, we calculated lower and upper bounds of the likely results accounting for the potential bias. For illustrative purposes, we conducted similar sensitivity analyses based on children accessed at initial visits to households, and those living on the main island of Saipan. We then compared these ranges with the observed coverage for the entire sample.
Discussion
Public health surveys are widely used by national, state, and local health departments as a basis for programmatic and policy decisions – to evaluate services, to target additional services, and to assess the success of these services in improving the health of the population. To be effective, surveys must produce reliable results that can be generalized to the population of interest. Ensuring methodological rigor is critical to reduce the potential for bias and thus provide a complete and accurate picture of the current situation, reveal strengths and weaknesses of health programs, and enable data-driven solutions. Several factors must be carefully considered when designing a public health survey: 1) the goals and intended uses of the study, 2) statistical properties of the methods, and 3) the relative feasibility of the options, given time and resource constraints, technical expertise, and geographic conditions. A successful survey must balance these factors to ensure that feasibility is maximized while methods are rigorous enough to produce results that are sufficiently precise and free of bias to be used for key program and policy decisions. A poorly conducted survey can produce erroneous results and may prompt inappropriate public health action. Overestimating the health problem or misidentifying risk groups can lead to limited resources being wasted that could be used more efficiently elsewhere, while underestimating the problem can lead to a false sense of security and lack of action needed to protect the population. Equally problematic, findings from a poorly conducted survey will be difficult to defend and may be disregarded, resulting in lack of political will to take action to correct the identified problems.
In this study, use of multiple sources of data had a substantial impact on the results. The discrepancy of results based on the different sources of data is striking, and emphasizes the importance of identifying reliable sources of information. Furthermore, estimates that would have been obtained with any one of the three sources would have been substantially lower than those achieved by combining the sources, and would have called for a different public health response. Completeness and accuracy of data are critical to the validity of any survey, whether the data represent respondent opinions, written records, physical measurements, or laboratory tests. While medical records may be the most accurate source for vaccination history information, they were nevertheless incomplete in this study. Furthermore, household surveys are conducted to ensure a population-based sample, and thus avoid inherent biases of sampling from medical records. Therefore, most vaccination coverage surveys rely on household-retained vaccination cards to obtain vaccination histories. While approximately 70% of one and two-year-old children in CNMI had vaccination cards, vaccination coverage based on these cards alone was less than half of that obtained by combining cards with the two other sources of information. Abstracting vaccination information from health department records increased total person-time to complete the survey by 6%; adding data from the electronic registry did not increase survey time.
We found that conducting follow-up visits to households not available at the initial visit did not substantially affect the outcome of interest; however, it improved our ability to determine household eligibility and substantially increased response rates. In addition, households whose eligibility was determined after follow-up were twice as likely to be eligible as those determined at the initial visit. It may not be possible to infer characteristics of the nonrespondents; high response rates help to minimize the size of this population and thus ensure the representativeness of those included in the survey. As a result, high response rates confer greater credibility and generalizability of the survey results. In addition to revisiting households, response rates can often be improved by promoting awareness of the survey, using well-designed questionnaires that minimize respondent burden, and providing adequate training to interviewers regarding the survey goals and interviewing techniques.
Excluding populations that are difficult to access is often tempting, due to high travel cost and time, as well as safety concerns in some settings. However, excluding portions of the population can lead to biased results if the characteristics of the excluded population differ from those included. In the United States, the federal Office of Management and Budget (OMB) has developed standards and guidelines for statistical surveys conducted by government agencies, recommending that the sampling frame cover at least 95% of the target population [
16]. In the current study, exclusion of the two remote islands would have yielded survey coverage of 88% of the population, increasing the likelihood for biased results. Despite the opportunity for bias, excluding these islands actually had little effect on the overall estimates for CNMI, due in part to the divergent results on the two remote islands. Nevertheless, including these difficult-to-access areas provided important information for public health response: outreach activities appear to be working well on Tinian, while substantial problems were revealed that need to be addressed on Rota. Including Rota and Tinian increased the total person-time to complete the survey by 29%.
The effects of each of the three "shortcuts" evaluated in this study may vary depending on the primary outcomes of interest, population subgroup, and local situations. For example, in a setting with more complete health history documentation, or in a survey relying on physical measurements or laboratory tests, a single source of information may be sufficient. Estimating the effect of incomplete information or measurement error can be difficult. In surveys limited to one source of data, effort should be made to investigate and describe the level of accuracy and completeness of that source through objective measures and expert opinion.
The potential effect of excluding a portion of the population, either due to nonresponse or noncoverage, could be substantial, depending on the size of the population excluded relative to that included, and the difference in outcome characteristics between those included and those excluded. Thus, while low response rates and low coverage rates do not necessarily result in bias, they are an important indicator of the potential for bias. The US OMB suggests that if >15% of the population is excluded, or if overall response rates are <80%, a noncoverage or nonresponse bias analysis should be conducted [
16]. Ideally, this would involve observing or measuring some characteristics of nonrespondents to better model their likely survey responses. However, if such information is not available, a simple sensitivity analysis, such as presented here, can be conducted to determine the limits of the bias introduced. Assuming that the likelihood of nonvaccination among children who would have been excluded from the survey was half to twice as much as for those included yielded ranges that, for the most part, contained the actual observed coverage estimates. However, assumptions regarding the excluded population should be made with care. For example, if outreach services are believed to be poor, lower bounds could be established that assume that none of the difficult-to-access children had been vaccinated.
This study was subject to several limitations. First, combining multiple sources of information could be problematic. In this survey, if vaccination doses were counted twice due to mistakes made in recording of dates, combining sources could have erroneously increased the total reported number of vaccination doses. However, this concern is limited to the few children who were not considered completely vaccinated in any single source independently, but required vaccinations recorded in multiple sources to be combined (2%, 5%, and 9% among children aged one, two, and six years, respectively). Second, even with three sources of data, there may have been incomplete information. For example, fully-vaccinated children who recently moved to CNMI may not have had accurate health or registry records in CNMI. Third, as with any survey, selected children may not have been representative of all children in CNMI. Survey weights based on probability of selection were used to ensure that results from surveyed children were representative of all children. However, some bias may remain due to missed households or differential nonresponse; bias could be further reduced through use of nonresponse-adjusted and poststratified weighting schemes. Nevertheless, interviewers were able to determine household-reported eligibility for 94% of households, and >99% of those with eligible children agreed to participate, minimizing potential bias. Finally, we were not able to evaluate person-days required to conduct follow-up visits, as this information was not documented during the survey.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
EL supervised the study, developed the study design, participated in acquisition of data, conducted the statistical analysis, and drafted the manuscript. SS participated in acquisition of data and assisted with the study design and interpretation of the data. KS assisted with the study design, statistical analysis, and interpretation of data. MS and MM participated in acquisition of data. All authors participated in critical revision of the manuscript and approved the final manuscript.