Background
The concept of low-value care, defined as services that provide no benefit to patients or can even cause harm [
1], has received much attention in recent years in Western countries. Reducing the use of low-value care is expected to contribute to cost containment and more efficiency in health care [
2‐
4]. It leads to a reduction in medical spending without harming health outcomes and it may stimulate a reallocation of resources to high-value services [
3]. In this way, measuring low-value care for which the non-effectiveness is proven provides information on a specific type of inefficiency, i.e. spending with no benefit, which can be used besides other, more indirect, types of efficiency analysis such as traditional cost-effectiveness studies or analyses of practice variation.
Internationally, several initiatives have been launched to reduce low-value service utilization, among which the Choosing Wisely (CW) campaign in the US. Similar initiatives have originated in 12 other countries including the United Kingdom, Canada, Australia and the Netherlands [
3,
5]. In the CW campaign, participating specialty societies produce lists of recommendations that are to be discussed in the doctor’s office, as for example, ‘don't order diagnostic tests at regular intervals (such as every day), but rather in response to specific clinical questions’ [
6]. Ideally, these lists of recommendations would meet the CW criteria: 1) each of the services is within the specialty’s purview, 2) each of the services is frequently used or costly, 3) each recommendation is based on sufficient evidence, and 4) the process for developing the recommendation list is documented and is made available to the public if requested [
7]. In general, the recommendations aim to increase awareness among both doctors and patients [
4] and subsequently influence the decision whether or not to use a specific service.
Besides these rather generic
recommendations, studies have tried to assess the prevalence and geographic or practice variation in low-value care utilization (e.g. [
8‐
11]) using direct
measures of low-value care. The aim of the direct measures differs from the aim of recommendations. Where recommendations aim to create awareness among physicians and patients, low-value care measures may be widely used, for example in payer-provider contracts [
12,
13] and for monitoring low-value care initiatives [
3,
14].
To meet these aims, low-value care measures need to be methodologically sound [
15,
16]. Otherwise, using these measures might create misinterpretation, underuse of indicated services, patient selection or damage the patient-physician relationship [
17]. To date, only one study [
18] reviewed the state of low-value care measurement by performing a scan of the published and grey literature. They found 37 specified measures and 123 services that may be developed into measures, covering mainly diagnostic or therapeutic areas. Furthermore, another study [
19] identified a set of low-value services and demonstrated significant variance in its utilization between hospital referral regions in the US.
Still, major knowledge gaps exist in the literature on measuring low-value care. First, there is lack of knowledge regarding the validity of current low-value care measures [
15,
16]. As Baker et al. [
14] pointed out earlier, low-value care measures must at least be rigorously evidence-based. In addition, they must be able to detect variation between providers, regions or countries, reflect actual cases of the concept of interest, be supported by correlations to other measures indicating the same concept, and not be subject to substantive systemic bias (i.e. importance, coding or criterion validity, construct validity and risk adjustment) [
20]. Therefore, specific standards for how to develop and assess low-value care measures should be developed [
14,
17]. Second, it is unclear whether current low-value care measures cover the whole continuum of care. This is important, because it was argued that low-value care use is present in all sectors along the care continuum [
14,
21]. However, the low-value service recommendations from the CW initiative cover mainly specialist care in the cure sector [
7].
In this study, we aimed to start filling these gaps by performing a systematic review of the recent scientific literature on low value care measurement. Our objective was twofold. Firstly, to assess the scope of low-value care recommendations and measures in the literature by categorizing them according to health care function (such as curative care, long-term care and rehabilitation). Secondly, to assess the quality of the measures by 1) analysing their development process and the evidence that underlies the measures and 2) analysing the evidence regarding the validity of a selection of the included measures.
Methods
Study design and search strategy
A systematic review of the literature was performed, focusing on English-language articles published between January 2010 and January 2015. As recommended by Cochrane [
22], we performed our search in multiple databases including EMBASE, Medline, SciSearch, BIOSIS Previews and GLOBAL Health. We developed a search strategy to identify articles matching a variation of the following search terms: 1) initiatives, design, measuring, indicators, instrument, identifying, index; 2) waste, overuse, overutilization, misuse, low-value; and 3) health care, cure, care, prevention. Additional file
1 gives a detailed description of the search strategy.
Article selection
Two researchers (EFdV & RJPH) independently reviewed the relevance of the articles by screening titles and abstracts. As recommended by Cochrane [
22], we included articles from peer-reviewed journals only. The full-text was retrieved when both researchers considered the paper relevant. Articles were eligible for review when they met the following predefined criteria: 1) the low-value service recommendation or measure in the paper matched the definition ‘services that provide no benefit to patients or may even cause harm [
1]’; 2) the low-value service recommendation or measure was described using clinical details such as diagnosis, patient population and treatment. We removed duplicate articles and replies or commentaries and theoretical or discussion articles that did not present any low-value service recommendations or measures. Any disagreement between the reviewers was resolved by discussion and consensus.
We extracted general characteristics of the articles (i.e. name of first author, year of publication, country, aim of the paper, methods) and the measures (i.e. the name of the measure, the numerator, the denominator, exclusion criteria and direction). In addition, we retrieved the original source or reference of the measure.
Recommendations versus measures
The literature search yielded both recommendations and measures for low-value care. We considered a description of low-value care as ‘measure’ when at least a numerator and denominator were specified as such. We identified the scope of both recommendations and measures, while the quality assessment was performed for the measures only.
Categorizing low-value care recommendations and measures by function in health care
All recommendations and measures were categorized using the Classification of Health Care Functions (ICHA-HC) as defined by the Organization for Economic Co-operation and Development (OECD), the World Health Organization (WHO) and Eurostat [
23]. The ICHA-HC provides a framework to classify services according to their purpose or function and is commonly used to compare medical services internationally. It covers the entire continuum of the health system, i.e. curative care, rehabilitative care, long-term care and preventive care. We subcategorized curative care into general (i.e. primary) care and specialized care. General care involves basic care such as routine examinations, basic maternity care, routine diagnosis and follow-up, prescriptions and vaccinations (unless they are covered under a preventive program) [
23]. Specialized care involves more complex technology and is often a breakdown from the basic fields (e.g. neurosurgery or allergology) [
23]. In addition, the measures were categorized according to the non-functional categories ancillary services (i.e. laboratory, imaging, transport), and medical goods (i.e. pharmaceutical and therapeutic appliances).
Assessing the quality of low-value care measures
We assessed the quality of the measures by 1) analysing their development process and the level of evidence underlying the measures, and 2) analyse the validity of a selection of the measures.
Development process and level of evidence
We distinguished two groups: A) articles that translated low-value service recommendations into low-value care measures, and B) articles that used measures previously developed by institutions. For both groups we reviewed how the measures were developed.
For group A, we searched for evidence underlying the recommendations. We categorized each measure based on the evidence, distinguishing three levels of evidence: 1) a combination of evidence from the literature (trial or review), guidelines and from CW, United States Preventive Services Task Force (USPSTF) or National Institute of Clinical Excellence (NICE) recommendations, 2) evidence from the literature (trial or review) or guidelines, and 3) evidence not found. As criteria for developing CW recommendations do not prescribe the level of evidence required [
7] we labelled measures with CW, USPSTF or NICE evidence only, as ‘unknown’. We valued the first level highest, and the third level lowest.
For group B, we distinguished the same levels of evidence. However, here we specifically searched for elements of a quality label indicating the soundness of the measure. A National Quality Forum (NQF) endorsement corresponds with the qualification of ‘minor or no evidence gaps’ [
20]. Measures with such qualification have the strongest evidence base regarding importance, face validity, criterion validity, construct validity and risk adjustment [
20]. Therefore, NQF endorsed measures were valued highest. The Agency of Healthcare Research and Quality (AHRQ) and the Centers of Medicare and Medicaid Services’ (CMS) Quality provide information on the level of evidence by specifying the literature underpinning the measure. Therefore, measures from these sources were valued second best.
For both groups, our assessment was limited to the evidence provided in the reviewed article and the first document retrieved by reference tracking.
Validity
We selected a subset of five unique measures in order to gain insight in the quality of the low value care measures. Ideally, we would extensively assess each measure regarding their validity. However, for 115 measures this was beyond the scope of this review. Therefore, we chose five unique measures that appeared most frequently in the reviewed articles, assuming more information on validity to be available for these measures. For these five measures, we searched for evidence regarding the measures’ validity by reviewing the original source and reference tracking. In addition, we performed a PubMed search using key words from the name of the measures (i.e. diagnosis and procedure) and “low-value” or “overuse”, augmented with “validity”. Specifically, we searched for studies that aimed to assess the validity of the selected low-value care measures. Hereby, we distinguished between the most commonly used types of validity (as seen in e.g. [
20,
24,
25]): face validity, coding/criterion validity (i.e. reflect actual cases low-value care) and construct validity (i.e. supported by correlations to other measures indicating low-value care) [
20]. Face validity refers to the empirical or clinical rationale of the measure, and therefore we used the information from Table
2 for this criterion.
Discussion
To the best of our knowledge, this is the first systematic literature review identifying, categorizing and assessing the scope and quality of low-value care measures. We obtained 115 low-value care measures from the literature. Out of these 115 measures, 87 focused on the cure sector (primary and specialized care), 25 on secondary prevention and 3 on long-term care. Most measures (n = 62) originated from low-value care recommendations, while 53 were previously developed by institutions as the National Quality Forum. Three measures were assigned the highest level of evidence, as they were underpinned by both guidelines and literature evidence. For other measures, such a level of evidence was not transparently apparent. We do not conclude that these measures are invalid, because validity tests may not have been performed at all. Nevertheless, a lack of evidence is present at least. Our search yielded no information on coding/criterion validity and construct validity for the included subset of measures in this emerging field. Despite this, most measures are currently used in practice.
Low-value care measures have received increased attention and are now used for monitoring purposes, alignment of financial incentives [
13,
29] and, in the foreseeable future, in shared saving programs such as the Alternative Quality Contract (AQC) [
30]. In this manner, low-value care measurement may incentivize providers and insurers to shift resources from low-value services to high-value services [
31]. Our findings show that more attention is needed for the evidential underpinning and quality of these measures. Otherwise, the lack of transparency and evidence will reduce acceptance of low-value care measures by its users. Additionally, using measures of low quality, might lead to negative consequences including underuse of indicated services, cost-shifting, damages to the patient-physician relationship, provider dissatisfaction, adverse health effects, or patient selection [
17].
Our review showed that more than half of the low-value care measures originated from low-value service recommendations (i.e. CW, NICE, USPSTF). This implies that the empirical evidence of many low-value care measures is based on the evidence supporting the underlying low-value service recommendations. However, criteria for the development of recommendation lists remains rather vague in the CW initiative, as well as in other similar campaigns [
7]. Therefore, more transparency regarding the evidential underpinning of the recommendations is needed. Next to the importance of evidence underlying both low-value service recommendations and measures, one should be aware that the aim of low-value service recommendations differs from the aim of low-value care measures. The aim of CW recommendations is patient and physician awareness, while the aim of low-value care measures in turn may be to inform decisions on several levels. Consequently, requirements for the quality and development of recommendations and measures approaches vary accordingly.
We found that most current low-value care measures are concentrated in the cure sector even though it was argued that low-value services are provided and used along the entire continuum of care [
21]. For example, we only found four low-value care recommendations (that could possibly be transformed into low-value care measures) in rehabilitative care and none in the health promotion domain. This is probably the result of most measures originating from the CW initiative, which has its origin in the cure sector. While we acknowledge the emerging state of the field of research, we emphasize that similar consensus-based efforts are needed to stimulate the development of measures in other settings to broaden the scope and impact of the low-value care concept.
Given the potential impact of using low-value care measures, it is essential that guidelines for developing them be created by combined efforts of the involved parties: physicians, citizens, government and insurers [
17,
32]. We do not suggest creating an evidence base for
each health care intervention demonstrating all circumstances in which it is
not effective. This will prove an undoable exercise. Expert judgement by the clinician will always remain necessary to some degree. Therefore, other types of information, e.g. from studies on practice variation in procedure rates or cost-effectiveness studies, will remain necessary to identify inefficiencies in healthcare, especially when high quality low-value care measures are not available. We do propose using expert opinion from initiatives such as Choosing Wisely as a starting point for monitoring low-value care. These qualitative information sources can be complemented with new scientific insights. For example, the insight that certain genes predict the development of breast cancer, must be used to prevent a considerable amount of low-value care utilization. Still, as soon as we start measuring and monitoring low-value care in such areas, it will be of particular interest to fully specify and define all measurement information, such as exclusion criteria, direction and evidence supporting the measure, and to make this publicly available. Furthermore, low-value care measures should be extensively tested regarding their level of evidence and validity before implementing them for use in practice, and specifically for the measures that are already in use. Recently, articles started studying aspects that are closely related to validity. As for example, Schwartz et al. [
2] who found that the sensitivity and specificity strongly depends on the definition of the measures. Notwithstanding the efforts already been made, we stress the importance of the validity of the measures specifically being studied. Another area of research would be to further standardize low-value care measures, which ideally would result in alignment of the low-value care metrics and determining specifically for what subgroup or population a service is of low-value [
2,
33]. Moreover, the guidelines should take into account any differences between countries in terms of the availability and provision of healthcare services that are likely to occur due to cultural or economic differences.
Another important issue to pay further attention to is the data requirements. Measuring low-value care utilization requires information on services provided to patients in combination with diagnosis and possibly additional patient characteristics. It is not clear to which extent current data sources can provide this information [
2,
3], since rather detailed data need to be registered and data sources, such as claims data and detailed (hospital) registration data need to be connected in order to retrieve the necessary information.
Limitations
Our study has two main limitations. First, we did not evaluate the quality of each individual measure. Ideally, we would extensively assess each measure regarding their validity. To perform this task for 115 measures was, however, beyond the scope of this review. Nonetheless, we performed a first attempt in assessing the validity for the five measures that appeared most often in the literature and highlight several important general quality issues. Second, we did not include grey literature in our search. Therefore, we may have missed relevant measures. Nevertheless, for the purpose of our review, namely to systematically map the state of affairs of low-value care measurement, we are confident that the publications we did use provided sufficient evidence.
Acknowledgments
We are grateful to Margje H. Haverkamp, PhD (M.H.Haverkamp@lumc.nl) for helpful comments on an earlier version of this manuscript.