Background
Recruitment to clinical trials in primary care can be challenging [
1]. Recent papers in Trials have reported a variety of strategies to improve trial recruitment [
2‐
4]. Drawing on the expanding field of health informatics, we report on a strategy to identify potential trial participants using routinely collected anonymised data that complements other approaches to this question [
5]. Virtually all general practices in the UK hold patient medical records in electronic format. This level of computerisation is in line with the NHS 1998 Information for Health Strategy's goal of full implementation of person-based Electronic Health Records (EHRs) at the primary care level by 2005. Routinely collected data are recorded in both narrative and structured formats. In the structured format, data are presented in codes. The coding system adopted by the Department of Health for general practice is the Read Terminology [
6], although plans are underway to migrate to Systematised Nomenclature for Medicine - Clinical Terms (SNOMED CT) [
6]; which has been selected as the standard terminology scheme for the NHS Care Records Service and for the National Programme for IT and will eventually replace the current Clinical (Read) codes. Large volumes of routinely collected data held in electronic format are becoming increasingly available. Improvements in data quality as well as technological advancement and expertise in retrieving, transporting, storing, linking and analysing these data is leading to Health Informatics emerging as a field rich with potential for research purposes [
7].
Most randomised studies in general practice use conventional methods for patient selection, recruitment and data collection. One conventional method is through General Practitioner (GP) referral to research trials. Doctors normally recruit when patients present themselves at appointments. However a limitation with this method is that no referral will be made for those patients who do not attend their appointments, therefore little can be said about the generalisability of the data. Other recruitment strategies include manual searches through patient records or database searches using diagnostic criteria to select and recruit patients and then sending out participant information sheets. Further strategies include use of multi-media, such as the internet [
5], newsletters and also mail shots. Further strategies include use of multi-media, such as the internet, newsletters and also mail shots. All searches for potential participants that involve disclosure of identifiable information (without patients' consent) are undertaken by the patients' direct healthcare team. However, if there is no other practicable alternative to conducting the research an application can be made to the National Information Governance Board (NIGB). NIGB oversees applications for the common law duty of confidentiality to be set aside in specific circumstances, in accordance with Section 251 of the NHS Act [
8].
Many trials in primary care fail to achieve satisfactory levels of recruitment.
Difficulties with achieving the target recruitment populations within fixed timeframes were observed as common problems [
9]. A number of barriers to clinician participation have been identified including time constraints, lack of staff and training, and concern about the impact on doctor-patient relationship [
10,
11]. In addition, barriers to GP referrals in depression trials have included the unsuitability of the content and style of depression consultations and the perceived intrusiveness of introducing research into a complex consultation [
12]. It seems that the demands on patients and clinicians need to be kept to a minimum [
10].
Routine data may overcome some of these issues. It may eliminate the need for doctors to identify suitable patients when they attend the practice. The significant advantage is that larger numbers of suitable patients can be identified by this method in a shorter period of time, thus maximising recruitment and minimising costs. However, it should be noted that routine data requires validation, which needs to be factored into the resource and economic planning.
The Health Information Research Unit (HIRU)[
13] based in the School of Medicine, Swansea University has been formed to harness the potential of routinely collected data. HIRU has established the Secure Anonymised Information Linkage (SAIL) database, which is a vast data repository of anonymised person-level data, as provided by an expanding group of Data Providers [
14]. In total so far, around 700 million records, pertaining to Health and Social care events have been loaded into the SAIL Data Bank. HIRU, in conjunction with Health Solution Wales, UK (HSW) have developed a robust anonymisation system to ensure confidentiality whilst making the data available for research [
15,
16].
The purpose of this study was to construct a methodology for identifying potential participants for a trial using the routinely collected data stored in the SAIL databank and to determine if the methodology could correctly identify potential participants for a clinical trial. The trial identified for this project is the FolATED study, which is a pragmatic randomised controlled trial of folate augmentation of antidepressant therapy in the treatment of depression. It is currently being conducted in Wales, UK [
17].
Aim
To determine whether anonymised routine data can be used to accurately identify the numbers of eligible patients suitable for recruitment to an existing randomised controlled trial (RCT).
Objectives
-
To construct an algorithm to identify suitable participants for a clinical trial using routinely collected, anonymised primary care data stored in the SAIL databank.
-
To carry out a validation exercise to establish whether the algorithm could correctly identify potential participants.
Discussion
The algorithm for identifying suitable participants for the FolATED study appears to be valid based on the clinical judgment of the raters. The results from the sensitivity and specificity suggested a high degree of accuracy (>= 80%) from the algorithm. Although some minor methodological issues were encountered, we have demonstrated that it is possible to identify anonymous potential trial participants using the routinely collected primary care data.
Limitations of the proposed method
A system based on anonymised data cannot be applied directly to recruitment strategies, as for instance the data housed in SAIL can never be deanonymised.
So this method we are exploring is a two part process. Firstly creating, testing and validating an algorithm to identify suitable participants using the anonymised data in SAIL. Then making this algorithm available on a live practice based computer based facility (such as Audit+ [
26]) whereby a physician can run the query themselves and generate a list of suitable named participants within the practice, with minimal time or effort. Thus this should reduce GPs workload, with the potential of maximising recruitment. The method ensures confidentiality of personal data as the identification and recruitment process remains within the practices. This process itself requires validation. Furthermore, missing or implausible data values in the electronic records cannot be corrected as it is not possible to identify the patient.
Additionally, there was also an additional requirement for the researcher to seek clinical expertise to identify appropriate read codes. For example, medical advice was sought as to whether to include read codes relating to post viral depression and pre-senile dementia with depression in the algorithm.
Limitations of routinely collected data
There are also a number of general limitations to the use of routinely collected data. The accuracy of using proxy measures needs to be evaluated. Lack of linkage between diagnosis and therapy makes the use of proxy measures unreliable. This issue is not limited to this methodology but applies to live database searches too. In this study recent antidepressant therapy was used as a proxy measure for depression to try to capture patients who were currently depressed, as the diagnosis may not be recorded as frequently as the treatment prescribed if it is an ongoing condition. The use of antidepressants as a proxy measure for depression is unreliable because disorders cannot be linked to specific interventions i.e. drugs [
27]. An attempt to counter this was made by selecting people who had a diagnosis of moderate to severe depression in their medical history, however there was no way of knowing whether their current antidepressant therapy was related to that diagnosis. Antidepressant therapy may have been prescribed for other conditions, such as anxiety disorders, attention deficit disorder or dementia. It would be useful if there was a standardised 'problem number' field in all primary care data entry systems that linked the prescription to the diagnosis. The Meditel system has this field [
28].
A particular challenge is establishing the end date of an episode of depression and whether or not the patient is in remission. The codes that might assist in identifying this, such as depression resolved, medication stopped and medication changed, may be infrequently employed and therefore cannot be relied upon as accurate measures in themselves.
Routinely collected data are captured for administrative reasons rather than for research purposes. To be fit for research purposes the validity, accuracy and completeness of the routine data itself need to be considered. Although studies have reported that routinely collected diagnostic data held on general practice information systems are accurate and reliable for research purposes [
29‐
31], there is always room for initiatives to standardise systems and to improve data quality in primary care [
32].
The purpose of this study was to model using anonymised data a new method of identifying suitable participants using routinely collected data that would make it easier for practices to identify potential subjects for a clinical trial and consequently reduce their workload, whilst potentially maximising recruitment and reducing costs. In the future we will seek to test this algorithm on clinical data sets within primary care settings. The algorithm that was created in this study successfully identified suitable anonymous participants for the trial within the SAIL environment. However the data within SAIL can never be deanonymised. Therefore the next phase is a pilot project for the translation of the algorithm running on anonymised SAIL data to run on live clinical systems, where the individual physician can generate a list of potential identifiable participants, with minimal time and effort. The method ensures confidentiality of personal data as the identification and recruitment process remains within the practices.
Conclusions
The use of routinely collected digitally stored clinical data from primary care can be used as a means of selecting anonymous possible participants for a trial of folate augmentation of antidepressant therapy. Future work is required to run this algorithm on patient identifiable systems within the primary care practice setting and then compare this method with the traditional non-electronic method of participant identification for recruitment, in terms of numbers recruited, time, cost and reliability.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
KL and HH conceived the study and participated in its design and coordination and drafted the manuscript. RL contributed to the design. CB provided technical assistance in writing the SQL script. JM & KL carried out the analyses, performed the statistical analysis and drafted the manuscript. JC and PC were the independent clinicians who rated the eligibility of the patients identified in SAIL. All authors read and approved the final manuscript.