Background
Chronic diseases exact a toll on the population, yet most national surveillance systems addressing the level of prevalence lack the geographic detail necessary to allow public health officials to intervene effectively in terms of health services allocation, especially in rural areas. Health officials must depend on data from the National Health Interview Survey (NHIS), the National Health and Nutrition Examination Survey (NHANES), and the Behavioral Risk Factors Surveillance System (BRFSS) to calculate the nationwide prevalence of chronic illnesses [
1], although the limitations of these surveys for measuring minority populations are well-known [
2,
3]. Due to the nature of survey design, statistics cannot be derived for rural areas, although data for selected metropolitan areas have been made available [
4]. Given survey data limitations, the population of nearly two-thirds of US counties is excluded from the sample population. Shifting disease surveillance to the county level with county data has the potential to create a surveillance system that more accurately characterizes the public health burden of chronic illnesses, identifies high-concentration areas, improves health care resource targeting, and advances disease prevention and control at a localized level.
The achievement of national health goals is directly tied to the ability to target intervention strategies to people residing in specific geographic areas [
5]. The higher the data resolution, the more effectively resources could be allocated. Most adults diagnosed with heart disease, high blood pressure, and diabetes report taking prescription medication for their illnesses (heart disease -- 81% in 1987 and 77% in 2001; high blood pressure -- 94% in 1987 and 97% in 2001 [
6]; diabetes -- 83% in 1987 and 93% in 2001 [
6], 85% in 2001 [
7] or 83% in 2001-2002 [
8], depending on the survey). Reliable prescription data at the sub-national level could be a valid proxy measure for the prevalence of these chronic illnesses. We tested the viability of using data on prescriptions filled as a proxy measure for illness prevalence rates by comparing prescriptions-filled rates with state-level BRFSS data, using population estimates to supplement survey estimates.
As a point of clarification, cancer, the second-leading cause of death in the US, was not one of the selected chronic conditions in the prescriptions-filled dataset. Most cancer drugs are used in hospitals, clinics, and physician offices and thus would not reflect the residence of individuals with cancer, a central feature of this research.
The Dartmouth Atlas of Health Care in the United States is the most comprehensive study of smaller area geographic variation in diseases, but it was carried out at the level of Hospital Service Area and Hospital Referral Regions [
9‐
11]. A study similar to the Dartmouth Atlas examined prescription drug use in Michigan but did not address major chronic disease prescription medications in particular [
12]. Despite significant regional variations found in a study of Medicare data for a group of male Hispanics experiencing renal failure, data constraints mean that geography typically gets introduced into research models only as a consideration for rural versus urban populations [
13]. Even the rural-urban distinction yields important insights for medically underserved populations. Rural African Americans are less likely to control their diabetes and hypertension than their urban counterparts, and American Indians have significant regional variations in risk and prevalence of diabetes [
14‐
16]. Recognizing spatial differences in diabetes treatment, the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) is funding research on regional variations in health outcomes among diabetic minorities [
17], and the Agency for Healthcare Research and Quality is funding research on geographic patterns in recurrent strokes [
18].
Some state-specific studies have been produced, although they differ from the depth of analysis that we propose. For example, the Prescription Drug Atlas [
19] describes the geographic distribution of a number of drug classes, based on a convenience sample of Express Script plan customers. The Centers for Disease Control and Prevention (CDC) results from the 1988 and 1989 BRFSS revealed the predominance of states with high diabetes rates grouped east of the Mississippi River, with none of the highest-rate states in the West [
20,
21]. More recently, CDC researchers decomposed the BRFSS to 100 metropolitan statistical areas (MSA) to examine the prevalence of Type 2 diabetes [
22]. Researchers from the State Center for Health Statistics in North Carolina used results from their 10 most populous counties and then aggregated the remaining 90 counties into three regions [
23]. A recent review of the Indicators for Chronic Disease Surveillance noted that the current system is restricted to state and national levels of geography for surveillance, confirming this lack of county-level specificity [
24].
Our population-level methodology overcomes some problems found in state-level studies. As Geiss et al noted, "Data from population-based studies are generally considered more reliable than data from selected groups within the population because the latter may not represent the community with respect to factors such as age and health status" [
25]. The full dimensions of the relationship between diagnosis and prescription drug treatment are unknown, but a significant proportion of the diagnosed population can be identified via prescription drug use.
We mapped chronic illness prevalence at a finer geographic scale than MSA or hospital region using prescription data. Statistical analysis using health data mapping enables social and medical scientists to more accurately identify and display areas with high and low disease prevalence rates. This methodology cuts across nominal data categories to potentially reveal cross-sectional and longitudinal geographic patterns and clusters that are otherwise masked. Spatially based chronic disease prevalence data could also be used to address the critical issue of racial and ethnic disparities in health and health care at the county level [
26,
27].
Methods
We used one dataset to select the prescription drugs of interest, a different dataset of prescriptions filled at the county level as a proxy measure for chronic diseases, a third dataset to perform an age truncation with a national medical care survey, and a final dataset to calculate correlations as a means of validating the prescription data measure.
IMS Health, Inc., collected prescription drug data from nearly 30,000 suppliers covering 225,000 sites (e.g., drug manufacturers, wholesalers, retailers, pharmacies, mail order, long-term care facilities, and hospitals). A fuller description of IMS Health's products was previously published [
28]. To determine the prescription drug classes appropriate for major chronic diseases, we used IMS Health's National Disease and Therapeutic Index (NDTI), a database derived from an ongoing office-based physician panel providing national-level estimates of disease and treatment patterns for office-based physicians. (We chose not to use an expert pharmacy panel to avoid training or practice bias.) The data in NDTI captured all medications associated with a patient visit for a particular treatment. The leading therapeutic classes (also known as Uniform System of Classification or USC) used for the three diseases were identified from this dataset.
The USC classes chosen for diabetes were: (USC 39211) Sulfonylureas, (USC 39220) Biguanides, and (USC 39230) Insulin sensitizers. (Specifically, the drug types included: animal insulins, human insulins, human insulin analogues, sulfonylureas, meglitinides, amino acid derivatives, biguanides, insulin sensitizers, alpha-glucosidase inhibitors, diabetes therapy combinations, and diabetic accessories.)
The classes chosen for heart disease were: (USC 31100) Renin Angiotensin Systemic Antagonist, (USC 31400) Beta and Alpha blockers, and (USC 32000) Cholesterol reducers and Lipotropics. (Specifically, these included Angiotensin-Converting Enzyme (ACE) inhibitors (along with diuretics and other), angiotensin II type I receptor antagonist (alone and in combination), peripheral vasodilators, calcium blockers, beta blockers, alpha-blockers, beta/alpha blockers (with diuretics), alpha blockers (alone and in combination), central acting agent (alone and in combination), antihypertensive (other), HMG-COA reductase inhibitor (3-hydroxy-3methylgluatryl coenzyme A reductase), bile acid sequestrants, fibric acid derivative, cholesterol absorption inhibitor, cholesterol red combination, lipotropics, and antihyperlipidemic agent (other).
The classes chosen for cerebrovascular disease were: (USC 11110) Anticoagulants, (USC 11200) Antiplatelets, and (USC 20200) Seizure disorders. (Specifically, the drug types included: anticoagulant (oral), unfractionated heparins, fractionated heparins, heparines for flushing, injected anticoagulants (other), antiplatelets (oral and injected), fibrinolytic, Vitamin K & related (oral and injected), hemo mod other (injected, oral, topical), l-dopa, antiparkinson (other), movement disorders (other), seizure disorders, anti-ALS, Alzheimer-type dementia, and neurological disorders (other).
Based on the results from the NDTI, we purchased monthly, county-level prescriptions-filled data for 1999-2003 for the chosen drug classes in IMS Health's Xponent database. Total US prescription sales were determined from IMS Health's independently sourced Drug Distribution Data and obtained from pharmaceutical manufacturers, drug wholesalers, and chain warehouses, reported at the individual outlet level. This count included 72% of all prescriptions filled at the individual retail level (February 2006), a stable rate since 1998 to the present. To estimate the remaining 28%, IMS weights retail data to generate estimates representing total dispensed prescription volume at the national, sub-national, and prescriber level. IMS used volume data and pharmacy distance measures to determine applicable weights for sample pharmacies to estimate the dispensed prescription volume for each nonsample pharmacy. Weights were derived through an IMS Health proprietary, patented geo-spatial methodology (personal communication, Stuchlak W: Senior Principal, IMS Management Consulting, IMS Health, Plymouth Meeting: March 31, 2006; PA).
According to IMS Health, retail pharmacies account for 67% of total national prescription sales for therapeutic categories used in the treatment of chronic illnesses. About 23% of sales occur via mail, 8% occur in clinics, long-term care, prisons, universities, and nonfederal hospitals, and 2% occur within federal facilities (e.g., Veterans Administration) [
29].
The prescription dataset does not contain patient demographics. To place the prescription data on an equal footing with the BRFSS, which surveys only adults, we determined the percentage of those under 18 who receive diabetes medications (the medication most likely among the three to be taken by children). We used the National Ambulatory Medical Care Survey (NAMCS), a national probability sample survey of visits to office-based physicians conducted by the National Center for Health Statistics [
30]. In addition to patient demographics, NAMCS collects data on the therapeutic class of drug prescribed. We aggregated 2000-2002 files to Census-regions level. The denominator, age 18+ state resident population, was derived from the US Census Bureau's inter-censal population estimates. For a full explanation of the methodology, see "Methodology for the State and County Total Resident Population Estimates (Vintage 2009): April 1, 2000 to July 1, 2009" [
31]. Among patients under age 20 (the closest age cut-point), the usage of diabetes medications was always below 1%, indicating that children could be dropped from the base population.
A rolling 12-month average was calculated to smooth the rates and to account for multiple-month prescription fills. The refined rate of prescriptions-filled calculation is:
We did not attempt to adjust for the three-month order because it was not widely available in 1999-2003. We cannot fully account for those who only fill their prescriptions for only part of the year beyond our rolling 12-month calculation. As an example, in 2003, prescriptions in the retail channel, which were overwhelmingly 30-day supplies at the time, accounted for 86% of all prescriptions for the Angiotensin Receptor Blocker drug class. Mail order prescriptions, which are typically for 90 days, only accounted for 10% of ARB prescriptions, and long-term care accounted for the remaining 4% of ARB prescriptions.
BRFSS, a state-level monthly telephone survey of adults about behaviors associated with health risks and the incidence of medical conditions, provided our reference point [
32]. BRFSS data have been routinely used to create state-level estimates of chronic illness, health risks, and national prevalence rates, and more recently, for metropolitan and micropolitan area prevalence rates; however, due to the survey design, the BRFSS continues to undersample rural residents. We used the following three survey questions from the 1999-2003 BRFSS: "Have you ever been told by a doctor, nurse or other health professional that you had: (i) diabetes, (ii) coronary heart disease or (iii) high blood pressure?"
An important shortcoming of existing survey data is that they are not spatial in nature. According to Stan Openshaw, an early leader in exploratory data visualization: "People DIE each year because no one BOTHERS to properly analyze DISEASE and DEATH data for unusual localized concentrations" [[
33], emphasis in original]. Maps are often used to display health information, but typically there is an inadequate effort to empirically identify spatial patterns or apply spatial statistics to test hypotheses. We mapped and compared state-level results from BRFSS and IMS data; IMS data were also mapped at the county level. After visually assessing maps presented here, we conducted a statistical test of spatial autocorrelation to test whether prescriptions-filled rates were equally likely to occur at any location, using the Global Moran's I [
34].
Discussion
Based on the overall correlation results, we argue that prescription rates have the potential to be a useful and informative proxy for disease-specific diagnosed prevalence at both the state and county levels. Although we did not begin with a priori correlation goals, the results are sufficiently encouraging to warrant further investigation. With refinement of the methodology (see Future Directions below), we envision a powerful tool for health care planners, especially in rural areas. We did have an a priori expectation to see spatial autocorrelation (ie, geographic clustering) in rural areas because of the widespread belief that rural areas are at a disadvantage with respect to access to health care. However, this was clearly not the result, suggesting greater access to prescription drugs in rural areas than was originally thought. Finding no spatial autocorrelation at the national level is informative. But equally useful and not yet analyzed is spatial autocorrelation within states or regions that could prove enlightening to state health officials. Also unexamined are the geographic changes over time at the state, regional, and national levels. This methodology shows promise, especially for those individuals who are responsible for addressing the allocation of health resources and reducing health inequalities. This methodology's greatest potential is in providing a measure of diagnosed prevalence for rural counties, areas not sufficiently sampled in national surveys.
Limitations
The progression from illness to prescription treatment is a series of steps - at any point, an individual can abandon the progression. Briefly, these steps include: (a) the patient's recognition of a medical need, (b) the patient decides to seek medical care, (c) the patient has access to medical care (overcoming physical, temporal, financial, and social constraints), (d) the patient is diagnosed, (e) the appropriate treatment includes a prescription drug, (f) the patient has access to prescription drugs (overcoming physical, temporal, financial, and social constraints), and (g) the patient fills the prescription and refills it regularly, having taken it as prescribed. However, to have been diagnosed, a patient must meet the first five of these steps. Thus, using prescription drug data to estimate prevalence is risky only in that the patient must have access to prescriptions and take them as prescribed, refilling regularly (not sharing or skipping doses).
The single biggest limitation to this methodology is that data regarding prescription drugs filled is limited to diagnosed and treated disease. Several studies have supplemented the diagnosed diabetes data in NHANES ("... have you even been told by a doctor or health professional that you have diabetes or sugar diabetes?") with blood samples drawn from the respondents (a fasting plasma glucose level of >126 mg/dL was coded as having diabetes). In a study of 1992-2002 NHANES data, Cowie et al found that some 30% of the nation's crude prevalence of total diabetes was undiagnosed [
49], based on the additional blood tests. This finding was echoed in a study using 2003-2006 NHANES data by Danaei et al, [
50] who found that 32% of total national diabetes is undiagnosed. A slightly higher percentage, about 40%, was found by Cowie et al using 2005-2006 NHANES data and blood tests [
51]. As a methodological extension, Danaei et al applied their NHANES analysis methodology to state-level BRFSS data (2003-2007). They were able to report undiagnosed diabetes at the state level by age, sex, race, and insurance status. While this is an advance in terms of smaller geography (state vs. national), the BRFSS data are limited to state or large metropolitan areas. Conversely, since the prescription dataset does not contain demographic data, a further extension of Danaei et al's methodology is not feasible for county-level analysis. Researchers have also used the family history data in NHANES to estimate undiagnosed and pre-diabetes; however, this methodology is limited to national results [
52‐
55].
The BRFSS survey data also have their own limitations in that individuals must have a landline telephone to participate; must be randomly chosen to participate; must answer the surveyor's call; must agree to participate if they do answer the phone; and must respond accurately to health status questions (ie, no faulty memory, avoidance of questions, etc.).
Measuring prescription-fill rates to use as a proxy measure of the prevalence of specific chronic illnesses is a crude methodology, fraught with possible disconnects, as the list above suggests. Nonetheless, we have accomplished an important validation exercise. Beyond what is listed above, we are aware of the coverage limitations of this prescription dataset, specifically the selection of the basket of drugs used to treat each disease. More specifically, we worked with IMS Health to create a basket of drugs that represents the best practices for the time period studied. IMS Health is the leading firm in the collection, analysis, and dissemination of prescription drug data in the US. They are aware of prescribing practices and trends in prescription drugs. Because the match between drug and illness is as much knowledge of what is occurring in the industry as it is the current medical guidelines, at the onset of the project, and very specifically, we chose IMS Health to select drug classes. Based on the objectives for the research and the specific disease states it planned to address, IMS recommended specific categories of drugs to include in the data extracted and provided. Though treatment practices change, most evidence indicates that they change slowly and should not have a dramatic effect on what we have presented here. Finally, when flows of individual drugs are examined, altering one or two in the calculations would have no effect on the overall conclusion of this manuscript, which is a new and potentially valuable, albeit imperfect, methodology to measure population health.
These issues need to be addressed in future refinement of this methodology. Indeed, many drugs are used to prevent or slow the onset of chronic illness. One future solution might be to link prescription-filled data with a survey of physicians' prescribing practices (e.g., of your patients for whom you prescribe heart disease medications, what percentage are preventative versus treatment for existing disease?).
The BRFSS and the IMS Health data measure slightly different things. Limitations of the BRFSS measure include recall bias, as well as lack of data on whether the respondent still had the disease or whether they were treating it with medications. The IMS data measure those who are treating their disease with medication due to disease progression, in conjunction with the ability to pay for care and treatment.
The correlations in 1999 were consistently lower than in later years. The base population in 1999 was a population estimate--as opposed to a census count, which occurred in 2000 --which may partially account for the differing magnitude between 1999 and subsequent years. The percentage change in population between 1999 and 2000 was as high as 12% in some counties, whereas in year-to-year comparisons (2000-2003), the state population differed by no more than 4%.
The national estimates from surveys were higher than prescriptions-filled rates due to incremental losses in the base population. There are several reasons prescriptions might not be counted. First, there are people outside the medical system, whether they excluded themselves prior to diagnosis or at the point of the doctor's office (those who were not diagnosed or not treated) or possibly at the point of the pharmacy (individuals who did not fill their prescriptions). A second category of exclusion was the distribution of drugs used in a clinical setting, such as a hospital (e.g., emergency room or surgery), clinic (e.g., chemotherapy and radiation) or the doctor's office (e.g., samples), although this would account for only a small percentage of chronic disease drugs, as opposed to other drug categories. The third category of exclusion from the prescriptions-filled dataset is nonparticipation in the prescription tracking program by the pharmacy or by the patient (e.g., mail-order purchases or purchases made outside the US); however, IMS Health accounts for this exclusion through estimation of the total prescriptions for the retail channel.
Additionally, all prescriptions were not filled in the county of patients' residence, although no test has yet been made of what percentage of prescriptions were filled outside of the county of residence [
56]. Finally, there is the issue of off-label usage; we assume people who fill chronic disease prescriptions have that disease, as off-label usage is not perceived to be an issue with these particular classes of drugs.
Variations between self-reported prevalence rates of BRFSS and those currently in drug therapy for those diseases remain unaccounted for and are likely due to the BRFSS method of oversampling densely populated metropolitan areas (with the IMS data capturing all prescriptions) and variation between the rate of self-reported prevalence versus drug therapies. In other studies, the differences between treatment patterns [
57], prescribing practices, and dispensing practices [
58] have all been noted in state-level studies [
59]. Finally, this methodology begs further exploration and refinement. We paired drug classes with appropriate BRFSS questions, but those choices can be challenged and should be tested.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
REC developed the concept, collected portions of the data, participated in the analysis, and initiated the initial and subsequent drafts. JSC conceived the concept, participated in the analysis, and substantially revised the manuscript drafts. WLJ refined the concept, participated in the analysis, led the spatial investigation, and revised manuscript drafts. TB interpreted the data and spatial analysis and provided critical commentary to subsequent drafts. RT devised measurement methodology and data sources and provided substantial methodological comments on the drafts. LGP refined measurement methodology and data sources and provided substantial methodological comments on the drafts. AGC refined the concept and measurement, participated in the analysis, and contributed substantial input on the implications. All authors reviewed and approved the final manuscript.