The process of ‘mapping’ onto generic preference-based outcome measures is increasingly being used as a means of generating health utilities for application within health economic evaluations [
1]. Mapping involves the development and use of an algorithm (or algorithms) to predict the primary outputs of generic preference-based outcome measures,
i.e. health utility values, using data on other indicators or measures of health. The source predictive measure may be a non-preference based indicator or measure of health outcome or, more exceptionally, a preference-based outcome measure that is not preferred by the local health technology assessment agency. The algorithm(s) can subsequently be applied to data from clinical trials, observational studies or economic models containing the source predictive measure(s) to predict health utility values in contexts where the target generic preference-based measure is absent. The predicted health utility values can then be analysed using standard methods for individual-level data (e.g. within a trial-based economic evaluation), or summarised for each health state within a decision-analytic model.
Over recent years there has been a rapid increase in the publication of studies that use mapping techniques to predict health utility values, and databases of published studies in this field are beginning to emerge [
2]. Some authors [
3] and agencies [
4] concerned with technology appraisals have issued technical guides for the conduct of mapping research. However, guidance for the
reporting of mapping studies is currently lacking. In keeping with health-related research more broadly [
5], mapping studies should be reported fully and transparently to allow readers to assess the relative merits of the investigation [
6]. Moreover, there may be significant opportunity costs associated with regulatory and reimbursement decisions for new technologies informed by misleading findings from mapping studies. This has led to the development of the MAPS (MApping onto Preference-based measures reporting Standards) reporting statement, which we summarise in this paper.
The aim of the MAPS reporting statement is to provide recommendations, in the form of a checklist of essential items, which authors should consider when reporting a mapping study. It is anticipated that the checklist will promote complete and transparent reporting by researchers. The focus, therefore, is on promoting the quality of reporting of mapping studies, rather than the quality of their conduct, although it is possible that the reporting statement will also indirectly enhance the methodological rigour of the research [
7]. The MAPS reporting statement is primarily targeted at researchers developing mapping algorithms, the funders of the research, and peer reviewers and editors involved in the manuscript review process for mapping studies [
5,
6]. In developing the reporting statement, the term ‘mapping’ is used to cover all approaches that predict the outputs of generic preference-based outcome measures using data on other indicators or measures of health, and encompasses related forms of nomenclature used by some researchers, such as ‘cross-walking’ or ‘transfer to utility’ [
1,
8]. Similarly, the term ‘algorithm’ is used in its broadest sense to encompass statistical associations and more complex series of operations.
The development of the MAPS statement
The development of the MAPS reporting statement was informed by recently published guidance for health research reporting guidelines [
5] and broadly modelled other recent reporting guideline developments [
9‐
14]. A working group comprised of six health economists (SP, ORA, HD, LL, MO, AG) and one Delphi methodologist (RF) was formed following a request from an academic journal to develop a reporting statement for mapping studies. One of the working group members (HD) had previously conducted a systematic review of studies mapping from clinical or health-related quality of life measures onto the EQ-5D [
2]. Using the search terms from this systematic review, as well as other relevant articles and reports already in our possession, a broad search for reporting guidelines for mapping studies was conducted. This confirmed that no previous reporting guidance had been published. The working group members therefore developed a preliminary
de novo list of 29 reporting items and accompanying explanations. Following further review by the working group members, this was subsequently distilled into a list of 25 reporting items and accompanying explanations.
Members of the working group identified 62 possible candidates for a Delphi panel from a pool of active researchers and stakeholders in this field. The candidates included individuals from academic and consultancy settings with considerable experience in mapping research, representatives from health technology assessment agencies that routinely appraise evidence informed by mapping studies, and biomedical journal editors. Health economists from the MAPS working group were included in the Delphi panel. A total of 48 of the 62 (77.4 %) individuals agreed to participate in a Delphi survey aimed at developing a minimum set of standard reporting requirements for mapping studies with an accompanying reporting checklist.
The Delphi panellists were sent a personalised link to a Web-based survey, which had been piloted by members of the working group. Non-responders were sent up to two reminders after 14 and 21 days. The panellists were anonymous to each other throughout the study and their identities were known only to one member of the working group. The panellists were invited to rate the importance of each of the 25 candidate reporting items identified by the working group on a 9-point rating scale (1, “not important”, to 9, “extremely important”); describe their confidence in their ratings (“not confident”, “somewhat confident” or “very confident”); comment on the candidate items and their explanations; suggest additional items for consideration by the panellists in subsequent rounds; and to provide any other general comments. The candidate reporting items were ordered within six sections: (i) title and abstract; (ii) introduction; (iii) methods; (iv) results; (v) discussion; and (vi) other. The panellists also provided information about their geographical area of work, gender, and primary and additional work environments. The data were imported into Stata (version 13; Stata-Corp, College Station, TX) for analysis.
A modified version of the Research ANd Development (RAND)/ University of California Los Angeles (UCLA) appropriateness method was used to analyse the round one responses [
15]. This involved calculating the median score, the inter-percentile range (IPR) (30th and 70th), and the inter-percentile range adjusted for symmetry (IPRAS), for each item (
i
) being rated. The IPRAS includes a correction factor for asymmetric ratings, and panel disagreement was judged to be present in cases if IPR
i
> IPRAS
i
[
15]. We modified the RAND/UCLA approach by asking panellists about ‘importance’ rather than ‘appropriateness’
per se. Assessment of importance followed the classic RAND/UCLA definitions, categorised simply as whether the median rating fell between 1 and 3 (unimportant), 4 and 6 (neither unimportant nor important), or 7 and 9 (important) [
15].
The results of round one of the Delphi survey were reviewed at a face-to-face meeting of the working group. A total of 46 of the 48 (95.8 %) individuals who agreed to participate completed round one of the survey. Of the 25 items, 24 were rated as important, with one item (“Source of Funding”) rated as neither unimportant nor important. There was no evidence of disagreement on ratings of any items according to the RAND/UCLA method. These findings did not change when the responses of the MAPS working group were excluded. Based on the qualitative feedback received in round one, items describing “Modelling Approaches” and “Repeated Measurements” were merged, as were items describing “Model Diagnostics” and “Model Plausibility”. In addition, amendments to the wording of several recommendations and their explanations were made in the light of qualitative feedback from the panellists.
Panellists participating in round one were invited to participate in a second round of the Delphi survey. A summary of revisions made following round one was provided. This included a document in which revisions to each of the recommendations and explanations were displayed in the form of track changes. Panellists participating in round two were provided with group outputs (mean scores and their standard deviations, median scores and their IPRs, histograms and RAND/UCLA labels of importance and agreement level) summarising the round one results (and disaggregated outputs for the merged items). They were also able to view their own round one scores for each item (and disaggregated scores for the merged items). Panellists participating in round two were offered the opportunity to revise their rating of the importance of each of the items and informed that their rating from round one would otherwise hold. For the merged items, new ratings were solicited. Panellists participating in round two were also offered the opportunity to provide any further comments on each item or any further information that might be helpful to the group. Non-responders to the second round of the Delphi survey were sent up to two reminders after 14 and 21 days. The analytical methods for the round two data mirrored those for the first round.
The results of the second round of the Delphi survey were reviewed at a face-to-face meeting of the working group. A total of 39 of the 46 (84.8 %) panellists participating in round one completed round two of the survey. All 23 items included in the second round were rated as important with no evidence of disagreement on ratings of any items according to the RAND/UCLA method. Qualitative feedback from the panellists participating in round two led to minor modifications to wording of a small number of recommendations and their explanations. This was fed back to the round two respondents who were given a final opportunity to comment on the readability of the final set of recommendations and explanations. Based on these methods, a final consensus list of 23 reporting items was developed.