Engaging and co-creating
Whilst people affected by a given problem tend to have sensible ideas about how to fix it, initiatives to target underserved groups (e.g. those living in remote areas) are rarely developed with meaningful input from service users themselves [
27,
28]. Instead, managers sit down to discuss potential issues and solutions on behalf of the underserved groups, and then implement service modifications without further consultation. This is partly because it can be time-consuming and expensive to seek non-tokenistic input from others – especially from those at the margins of society [
27]. However, this needs to change. Community engagement and empowerment is one of the core tenets of Primary Health Care [
29] and all governments have committed to deliver health systems that place greater decision-making power in the hands of the people [
9,
29].
A model for continuous equity-driven service improvement should meaningfully engage with representatives of the groups found to be facing the highest barriers. Ultimately it is these service users who have the best understanding of why they cannot access care or achieve good outcomes, and they are likely to have practical ideas for how the service could be modified to better serve their population.
We note that service leaders need scientifically robust yet rapid and affordable methods for eliciting barriers and co-designing solutions, however current engagement exercises tend to cluster between two opposing poles: expensive, bespoke, in-depth qualitative research that takes many months to plan and execute on one hand, and zero/tokenistic engagement on the other. The first approach provides robust findings at a very high cost for service providers, the second is affordable but does not produce usable intelligence. Somewhere between the two lies a minimum viable product; the cheapest and fastest possible approach that delivers meaningful data based on genuine engagement.
Industry tends to use focus groups and telephone surveys for rapid market research, but we are not aware of any rapid pragmatic research methods being routinely used in health service improvement; for instance, the recent King’s Fund workshop on ‘improving services by listening to patient voices’ did not showcase any qualitative methods that could be conducted in fewer than six months [
30]. This is a strategic barrier to co-production [
31]. Our work to develop rapid yet robust methods represents a step forward, but our approach is still in the process of being tested. The IM-SEEN model stipulates that ideas for service improvements should come from engagement with affected communities, but does not dictate the exact methods as different contexts require different approaches.
Checking whether ‘service improvements’ actually improve services
Once potential solutions have been identified it is vital that they are rigorously evaluated. This should entail checking whether any changes made to the service lead to changes in outcomes – positive or negative – as well as understanding the effect size and distribution among different groups. Specifically, it is important to check that access and outcomes improve for all groups, ideally with the greatest gains observed among groups with the greatest need.
Despite widespread lip service to ‘continuous improvement’, in our experience, service modifications designed to boost equity are often conducted as one-off initiatives. Furthermore, efforts to reduce inequalities tend to be poorly evaluated [
10]. This is surprising given the rise and rise of
Plan Do Study Act cycles [
32‐
34]. Whilst the core ‘PDSA’ model is based on the scientific approach of formulating a hypothesis, collecting data to test the hypothesis, analysing and interpreting results, and making inferences to iterate the hypothesis, [
35] most quality improvement initiatives fail to quantify change appropriately and it is rare to find truly iterative examples where services have progressed through more than one or two revolutions of the cycle [
36,
37].
Even when a service does routinely gather high quality data and test hypothesis-driven innovations, the process tends to be limited by an overdependence on crude before-after testing or interviews with a handful of service users (which can offer valuable information about how/why and intervention works but tells us nothing about the mean effect size). We need to be sure that any observed changes in outcomes are driven by service modifications. More than that, we need to ask if it is ethical to modify services without recourse to robust means of evaluating impact – especially where unintended consequences could lead to harm or a deterioration in service quality or equity.
The most robust means of evaluating whether service innovations, reconfigurations, amendments, adaptations, and other ‘improvements’ actually confer benefit is by conducting randomised controlled trials [
38]. However, RCTs are generally expensive, require specialist statistical support, and can take years to run, rendering them unfeasible for most settings [
39]. When resources are available, the expensive price tag exerts a strong pressure to reserve this tool for service amendments that have a high ‘pre-test’ probability of success. This means that the least robust service modifications are systematically subjected to the weakest levels of methodological scrutiny, potentially squandering resources, incurring opportunity costs, and even exposing users to harm.
The rising use of RCTs in industry – often referred to as ‘A/B testing’—has spawned a wave of low-cost, real-time, automated approaches to running real-time pragmatic trials in order to optimise services with high-quality empirical data. The ‘test everything with controlled experiments’ approach was born of the observation that tiny service changes sometimes had large impacts on important outcomes, and that most large, expensive reforms based on promising ideas fail to deliver the intended change [
40]. Allied work from non-health areas of continuous improvement has demonstrated that multiple small improvements can lead to large overall gains – strengthening the case for multiple rapid tests of multiple service modifications [
41,
42]. This mature and powerful ‘test everything’ approach is being used to optimise search engines, improve web page click-throughs, and drive profit margins [
43‐
45] but has not yet made the transition to health service improvement.
As health programmes increasingly digitise patient flow, opportunities are emerging to embed prospective randomisation and statistical testing into administrative software [
46]. The adoption of ‘built-in’ testing would reduce the barriers for routine RCT testing. By making it easier to perform RCTs to test service modifications, we would vastly improve safety by helping managers to reliably differentiate between effective and ineffective amendments. The automation of randomization, allocation, and statistical analysis works best when algorithms can be directly embedded into clinical software, as this eliminates the delays associated with human factors.
Even automated RCTs still take time and specialist expertise to set up, and these costs mean that programmes will have fewer resources to deploy for service delivery. The time taken to design the trial and obtain ethical approval can also delay the implementation of potential service improvements. These ethical issues must be weighed against the fact that introducing interventions without robust evaluation can lead to the unknowing delivery of ineffective or harmful interventions. Nevertheless, given the work, time and costs involved in setting up a platform trial, this approach will deliver the greatest cost-benefits if used to continually assess a large number of interventions over a long period of time.
Changes and interventions that are found to be effective at improving outcomes and reducing the inequalities should be taken to scale across entire services. In summary, there is a need to develop embedded RCT testing code that can run resource-light trials in order to provide robust evidence on whether well-intentioned service modifications are helping or harming.