Background
Methods
-
Clinical routine data sources: secondary care:
-
◦ The Health and Social Care Information Centre (HSCIC)
-
◦ The NHS Wales Informatics Service (NWIS)
-
◦ The NHS National Services Scotland; Information Services Division (ISD)
-
-
Clinical routine data sources: primary care:
-
◦ The Clinical Practice Research Datalink (CPRD)
-
◦ ResearchOne
-
◦ QResearch
-
◦ The Health Improvement Network (THIN) database
-
◦ North West eHealth (NWEH)
-
-
Non-clinical routine data sources:
-
◦ The Office for National Statistics (ONS)
-
◦ HM Revenue and Customs (HMRC)
-
◦ The Department for Work and Pensions (DWP)
-
◦ The Driver and Vehicle Licensing Authority (DVLA)
-
-
‘Linked’ routine data sources:
-
◦ The Secure Anonymised Information Linkage (SAIL) databank
-
◦ The Administrative Data Research Network (ADRN)
-
Results
Clinical routine data sources: secondary care
Data access for clinical research:
The Data Access Request Service provides a method of access to a number of routinely collected datasets for England. Hospital Episode Statistics (HES) provides clinical, health and socioeconomic data for all secondary-care attendances in England. Datasets include Accident and Emergency, Admitted Patient, Outpatient, Adult Critical Care, Maternity and selected Patient Reported Outcome Measures.
Previous experience in clinical research: |
Data access for clinical research: Data access can be facilitated through The Public Health Wales Observatory. The Patient Episode Database for Wales (PEDW) provides clinical, health and socioeconomic data for all secondary-care attendances in Wales and is broadly comparable to the Admitted Patient HES dataset, with data regarding elective and emergency admissions and maternity care recorded. Additional datasets of relevance to this study include the Emergency Department and Outpatient Datasets.
Previous experience in clinical research: PEDW data have been accessed for retrospective analyses; for example, analysis of the incidence of obstetric complication rates [25] |
Data access for clinical research: The electronic Data Research and Innovation Service (eDRIS) provides a method of access to ISD datasets including Outpatient, General Acute/Inpatient, Emergency Department, Unscheduled Care, GP Out of Hours and The Prescribing Information System. Clinical, health and socioeconomic data are recorded and datasets are largely comparable to HSCIC HES.
Previous experience in clinical research: ISD data have been accessed for retrospective linkage studies; for example, analysis of the incidence of gastrointestinal bleeding and complications including mortality [27] |
Clinical routine data sources: primary care
Data access for clinical research: CPRD is a governmental research service jointly funded by the NHS National Institute for Health Research and the Medicines and Healthcare products Regulatory Agency. Following approval by the Independent Scientific Advisory Committee, CPRD provides access to de-identified primary-care clinical, health and socioeconomic data for a geographically representative 13 million patients in England for health care research.
Previous experience in clinical research: CPRD data have been used in retrospective studies for estimating health care resource use, prescription medicines and clinical outcomes [22]. Gulliford conducted two cluster-randomised trials using CPRD: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke [8] |
ResearchOne [29]
Data access for clinical research: ResearchOne is a collaboration between The University of Leeds and The Phoenix Partnership (TTP), developers of the SystmOne clinical database and IT system. De-identified clinical, health and socioeconomic data are available from primary, secondary and out-of-hours care settings for approximately 26 million patients in the UK.
Previous experience in clinical research: |
QResearch [31]
Data access for clinical research: QResearch is a collaboration between The University of Nottingham and the developers of the EMIS IT systems. De-identified clinical, health and socioeconomic data are available for approximately 18 million patents in the UK.
Previous experience in clinical research: QResearch data have been used to measure clinical outcomes in case-control and cohort studies [32] |
Data access for clinical research: THIN is a collaboration between IMS Health and In Practice Systems, developers of the IT software Vision. De-identified clinical, health and socioeconomic data are available for approximately 11.1 million patients in the UK.
Previous experience in clinical research: THIN data have been accessed to measure clinical outcomes in cohort and case-control studies [34] |
Data access for clinical research: NWEH is a collaboration between The University of Manchester, Salford Royal Foundation Trust and Salford Clinical Commissioning Group. NWEH has developed the methodology and governance framework to implement the Salford Integrated Record, an integrated primary- and secondary-care electronic medical record, into research as part of the Salford Lung Study [14]. The infrastructure permits access to secondary-care electronic medical records accessed through the HSCIC Secondary Uses Service. With participant and GP practice enrolment and consent, the Apollo [36] and Graphnet [37] data-extraction tools are employed to extract participant primary-care electronic medical records that can then be linked to data regarding secondary care. North West eHealth is unique in that data are not de-identified and, therefore, participant consent is required. Furthermore, GP practice enrolment and consent is required to permit the installation of third-party software on their systems and subsequent extraction of data.
Previous experience in clinical research: NWEH offers a number of primary-care research tools including a randomised controlled trial (RCT) recruitment feasibility assessment, but does not currently routinely provide a bespoke primary-care data-extraction service for research. However, the methodology for this process has been demonstrated [14] |
Non-clinical routine data sources
Data access for clinical research: The ONS records individual-level mortality data and aggregate economic and societal statistics that may inform clinical and health economic analyses. Mortality data can be requested through application to the HSCIC DARS. Aggregate data can be accessed via services provided by ONS such as NOMIS [39] and Data for Neighbourhoods and Regeneration [40]. The smallest reported level is the Lower Layer Super Output Area (LSOA) consisting of a population of 1000–3000.
Previous experience in clinical research: ONS mortality data have been accessed to measure mortality in retrospective and prospective studies [23] |
Data access for clinical research: HMRC is the UK’s national tax authority and responsible for taxation including National Insurance and student loan repayments and the administration of tax credits, child benefit and statutory sick and maternity pay. Individual-level data on employment and tax contributions are recorded and likely to inform health and socioeconomic analyses. The HMRC Datalab provides a means to access de-identified, aggregate HMRC data for research. An application, once ‘approved researcher’ status has been gained, must benefit the listed functions of the HMRC.
Previous experience in clinical research: There was no evidence of individual-level, HMRC data being accessed for clinical research in a scoping search performed in MEDLINE via OVID |
Data access for clinical research: The DWP is responsible for welfare including the provision of state pensions, benefits and child maintenance. Individual-level data regarding employment and welfare are likely to inform health and socioeconomic analyses and de-identified, aggregate data are available for social research.
Previous Experience in Clinical Research: There was no evidence of individual-level, DWP data being accessed for clinical research in a scoping search performed in MEDLINE via OVID |
Data access for clinical research: The DVLA is responsible for the licensing of drivers and vehicles in the UK and issuing, reviewing and maintaining guidance regarding driving licence status in the context of medical diagnoses. The legal requirement for driving licence holders to inform the DVLA of the occurrence of seizures and, subsequently, to regain normal driving privileges after a specified period of seizure freedom raises the possibility of DVLA providing an accurate data source to inform the clinical outcome measures in epilepsy research.
Previous experience in clinical research: The DVLA publish limited de-identified, aggregate datasets for research, usually involving driving restrictions. There was no evidence of individual-level, DVLA data being accessed for clinical research in a scoping search performed in MEDLINE via OVID |
‘Linked’ routine data sources
-
The Secure Anonymised Information Linkage (SAIL) Databank is an initiative developed by Swansea University and funded by the Welsh Government. SAIL provides a method of access to individual-level, routinely recorded, de-identified electronic data for patients across Wales to support research [12]. Access to clinical datasets provided by NWIS is complemented with numerous non-clinical administrative datasets including births, deaths and demographic data. Following the scoping process a formal application is submitted to the Information Governance Review Panel before access to data is granted. SAIL data have been accessed to measure clinical outcomes in retrospective research [13]
-
The Administrative Data Research Network (ADRN) is a UK-wide partnership between universities, government departments, national statistics authorities, funders and researchers, funded by the Economic and Social Research Council. ADRN provides a method of access to a number of non-clinical administrative routine datasets including employment, socioeconomic, crime and education data [11] in addition to clinical datasets detailed previously such as those recorded by HSCIC. Following development of a project proposal a formal application is reviewed by the Approvals Panel before access to data is granted
Challenges and feasibility of access
Routine data source | Summary of key application milestones | Cost structure |
---|---|---|
The Health and Social Care Information Centre (HSCIC) |
August 2015: first request to review Participant Information Sheet (PIS) and Consent Form. Sent by enquiries desk to the Data Access Request Service (DARS)
4 November 2015: second request to review PIS and Consent Form. Sent by enquiries desk to Data Access and Information Sharing Team (DAIS)
23 November 2015: no feedback yet received. PIS and Consent Form discussed with a member of the DARS team in person at a HSCIC engagement event. Informed that a full, formal application would be required in order for HSCIC to provide feedback on the PIS and Consent Form. This was completed and submitted on 26 November
7 December 2015: response regarding PIS and Consent Form. Informative teleconference with a member of the DARS team
22 December 2015: response from the DAIS team in response to the second request on 4 November 2015. Teleconference provided feedback, in agreement with that received from the DARS team on 7 December
29 February 2016: as directed by HSCIC, submission of a new formal application using the existing application process
18 April 2016: formal acknowledgment of submission. Requested to submit the application via the DARS Online Portal
22 April 2016: formal application submitted via DARS Online Portal
24 May 2016: Data Access Advisory Group (DAAG) review. Caveats to be addressed before approval
26 May 2016: caveats addressed, application updated and re-submitted
13 July 2016: DAAG approved. Hospital Episode Statistics (HES) data available for download | Standard cost recovery structure applied:
£1000 new application
£900 release fee
£500 3-year agreement
£300 per dataset per year
|
The Secure Anonymised Information Linkage Databank (SAIL) |
22 April 2015: first contact regarding application process and association with the Administrative Data Research Network (ADRN)
June 2015: informative teleconference regarding the SAIL application process and scoping procedure
7 July 2015: protocol regarding methods specific to SAIL submitted
August 2015: request to review PIS and Consent Form. Sent to information governance officer for review
September 2015: feedback on PIS and Consent Form from information governance officer. Scoping document issued by SAIL
January 2016: final review of PIS/Consent Form requested following revisions required for the other data sources
February 2016: submission of full, formal application
March 2016: feedback received following internal review with amendments suggested
April 2016: application re-submitted for formal Information Governance Review Panel (IGRP) review, outcome pending | Standard cost recovery structure applied:
£500 base cost
£291 data transfer to SAIL
£1455 individual-level data processing
£500 data transfer
|
The Clinical Practice Research Network (CPRD) |
November 2014: first contact regarding feasibility of the study, response received broadly confirming feasibility
August 2015: following protocol development, further contact regarding feasibility. Informed by CPRD that the Confidentiality Advisory Group and ethical approvals with HSCIC need to be updated to permit identifiable, linked data release and the timelines to resolve these are unclear. Furthermore, informed that compliance with HSCIC’s governance framework needs to be approved. No further contact as the issues with linked data release, cost and population coverage make CPRD not feasible for inclusion in this study | Standard cost recovery structure applied:
£7500 CPRD GOLD for <1000 patients
£4250 linked HES inpatient
£850 linked HES outpatient
£3000–5000 extraction, specification, assurance
|
QResearch | ResearchOne The Health Improvement Network (THIN) Database |
September 2015: all organisations contacted. Confirmed that data are de-identified only, with no facility to re-identify patients as would be needed for this study. Data sources are, therefore, not feasible for inclusion in this study | N/A |
North West eHealth |
October 2015: first contact, the service is not routinely offered but feasibility of the process broadly confirmed
November 2015: correspondence via email to request review of the protocol, PIS and Consent Form, confirm the methodology and determine provisional costings. Further discussion during a face-to-face meeting at NWEH
December 2016: discussion with the third party, Apollo Medical Software Solutions, regarding the development of the data query to permit the extraction of data. Response received confirming the structure of the existing data query can be used for GP practices in Salford already holding a data-sharing agreement with NWEH, but a bespoke query would be required for this study
January 2016: final review of PIS/Consent Form requested and received
May 2016: < participants consented to inclusion in the study are registered in eligible GP practices; therefore, accessing data through NWEH is not feasible for this study | Bespoke NWEH costing:
£11027 data handling
£1575 data check £1326 project manager
Apollo Medical costing:
£7200 data query development
CK Aspire costing:
£6800 GP recruitment
|
The Driver and Vehicle Licensing Agency (DVLA) |
October 2014: multiple attempts at contact to discuss the feasibility of the study, including telephone calls and email correspondence. No response received
February 2015: following discussion with a member of a DVLA expert committee, the DVLA medical advisor was contacted. The study was discussed with the DVLA data-sharing team and the response indicated that the DVLA would not have the capacity to assist with the study and the data-security requirements are ‘over and above the NHS or university’ | N/A |
The Department for Work and Pensions (DWP) HM Revenue and Customs (HMRC) |
November 2014: first contact regarding feasibility of accessing DWP and HMRC data for this study. Request transferred to the DWP External Data Sharing and Advice Centre
December 2014: External Data Sharing Advice Centre responded. Data access directly with the DWP or HMRC would not be possible and my request should be redirected to ADRN | N/A |
The Administrative Data Research Network (ADRN) |
December 2014: first contact regarding feasibility for this study. No response received
Feb 2015: further contact regarding feasibility of the study. General information provided via email
March 2015: informative teleconference to discuss the study. ADRN confirmed that the study is eligible for their service and they can request access to the DWP/HMRC linked to clinical datasets, such as HES, provided by HSCIC. They agreed to contact the relevant data sources to determine the feasibility
April 2015: further teleconference, no significant progress
May 2015: further teleconference, HMRC have declined participation, the DWP remains pending. I am informed that if the DWP does not permit access to its data I cannot apply through ADRN solely for clinical datasets and independent applications must be submitted to the relevant organisations such as HSCIC
July 2015: informed that the DWP have not been forthcoming but negotiations are on-going and they are unlikely to have a confirmed response until September. No further feedback received | N/A |
Clinical routine data sources
Non-clinical routine data sources
Discussion
Conclusions
Recommendations
General
|
Routinely recorded data are being used to measure randomised controlled trial (RCT) outcomes with the agreement, additional benefits and cost-efficiency of such data compared to data recorded through standard RCT methods being unknown
Further research should be performed to assess the agreement, additional benefits and cost-efficiency of accessing routinely recorded data to measure RCT outcomes compared to data collected through standard RCT methods
|
The costs required for data access from routine data sources vary widely, although all reportedly operate on a cost recovery, not-for-profit basis
Costs should be standardised and rationalised between routine data sources
|
The time lag before data are available in routine data sources represents a significant limitation to the access of routinely recorded data for prospective research, including RCTs
The infrastructure and procedures should be developed to reduce the time lag seen in routinely recorded data sources
|
The requirement for linkage between sources of routinely recorded data has been observed and improvements are on-going; for example, with the establishment of the Administrative Data Research Network (ADRN)
A standardised set of identifying variables could be recorded by all (clinical and non-clinical) data sources to improve the accuracy of data linkage, similar to a Core Outcome Set for clinical trials [44] |
The public mistrust in the sharing and linking of routinely recorded data will hamper future efforts to develop routinely recorded databases, despite the likely benefits to individual patients and the population
Further research and public engagement should be undertaken to define the issues of most importance to the public and develop strategies to address these
|
Clinical routine data sources
|
There are numerous requirements prior to application, and criteria to fulfil on submission, of an application, yet the guidance and support during development of an application remains limited
Formalise and improve access to guidance and review of study materials during the ‘pre-application stage’
|
There is national coverage of routinely recorded secondary-care data, yet primary-care coverage remains patchy, based on geographical area or GP IT system
Develop the primary-care data sources to provide national coverage, either through collaboration of existing sources and data linkage or development of national data sources, such as the General Practice Extraction Service
|
Non-clinical routine data sources
|
Access to non-clinical data sources to inform clinical research was not possible during this study, despite the significant potential to inform Health Technology Assessment and the increasing importance of such assessments in a health care system where resources are increasingly limited
To assist with Health Technology Assessment, and particularly the analysis of health economic outcomes, urgent research is required to consider facilitating access to individual-level, identifiable data from non-clinical sources. This would include: 1. Research regarding the public perception and acceptability of using their personal economic data for clinical research
2. Internal review within non-clinical sources, such as the DWP and HMRC, to assess the feasibility and limitations of permitting access to data for clinical research
3. Formalisation of the approval processes through the independent party, the ADRN for access to non-clinical administrative data – currently, following internal approval the ADRN then negotiates access to administrative data on a project-by-project basis
|