Skip to main content

01.12.2013 | Technical advance | Ausgabe 1/2013 Open Access

BMC Medical Research Methodology 1/2013

Direct risk standardisation: a new method for comparing casemix adjusted event rates using complex models

BMC Medical Research Methodology > Ausgabe 1/2013
Jon Nicholl, Richard M Jacques, Michael J Campbell
Wichtige Hinweise

Electronic supplementary material

The online version of this article (doi:10.​1186/​1471-2288-13-133) contains supplementary material, which is available to authorized users.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JN conceived the idea, analysed the DAVROS data, and drafted the manuscript. RJ helped conceive the idea, analysed the HES data, and helped revise the manuscript. MJC helped conceive the idea, and revise the manuscript. All authors read and approved the final manuscript.
Comparative mortality figure
Direct risk standardisation
Directly standardised rate
Hospital episode statistics
Summary hospital mortality index
Standardised mortality ratio.


In all branches of the health and social sciences, and especially in public health and health services research, we need to be able to compare outcomes of groups of patients or people with different exposures in order to understand the impact of the exposures. These exposures include different interventions and services, as well as different environments.
Comparison of outcomes can be difficult because of differences in the characteristics of the patients or populations being exposed in different ways. The distribution of these characteristics is known as the casemix, and when the casemix is associated with the outcomes, comparisons of outcomes are confounded by any differences in casemix. In this case comparisons are sometimes made by calculating a measure of the event rate in each exposure group being compared which is standardised for casemix. When the number of groups being compared is not too large this can be done by including a term for the effect of each exposure in the casemix adjustment model. However, when many groups are being compared this may not be possible and standardisation is carried out. There are numerous methods that can be used to standardise for casemix [1] but the most frequently used are direct and indirect standardisation [2, 3].
First, we rehearse some well-known problems with both direct and indirect standardisation, and then we propose a new approach which overcomes the problem with direct standardisation. We have termed this new approach Direct Risk Standardisation (DRS). To discuss and illustrate these issues, we have used the example of comparing hospital mortality, and throughout this paper we refer to the populations or exposure groups being compared as 'centres’, and people as 'patients’, but the methods are quite general.

Direct standardisation

In direct standardisation, for each centre event rates are calculated for every combination of the casemix variables and then these casemix specific event rates are combined using a set of weights which is the same for all the centres. One problem with this method is that it can’t be used when any of the casemix variables are continuous unless they are first grouped into categories. A second more serious problem with direct standardisation is that some casemix combinations in some centres may have no patients or people. This may be for a structural reason (eg gynaecological conditions in men), an organisational reason (e.g. the hospital doesn’t treat children) or a random reason (eg for some uncommon conditions there may be no cases in some hospitals in some years). Directly standardised comparisons between centres with different numbers or patterns of empty casemix cells (i.e. due to random or organisational rather than structural reasons) are no longer fair [4]. For example, suppose there are just three casemix groups (eg children, adults, elderly) and two hospitals being compared, one of which (hospital 2) admits no children and for similar patients treated by both hospitals is 20% worse than the other, as illustrated in Table 1.
Table 1
Example of effect on direct stanardisation of missing casemix combinations
Age specific death rate per 100 admissions
Directly standardised rates
Hospital 1
(0.25×20) + (0.5×10) + (0.25×20) = 15
Hospital 2
(0.5×12) + (0.25×24) = 12
Now suppose the weights used to combine the hospital mortality rates are the national proportions of child, adult, and elderly patients which are 25%, 50%, and 25% say. Then the directly standardised rates show that Hospital 2 is 20% better than Hospital 1. This has arisen because the total of the effective weights used for each hospital are different.
A partial solution to this problem is to recalculate the weights for each centre so that they always sum to 1.0. In the example in Table 1 the weights used for hospital 2 only sum to 0.75. Dividing the weights for Hospital 2 by 0.75 so that the weights used again sum to 1.0 makes the directly standardised rate in Hospital 2 equal to 16 deaths per 100 admissions indicating that hospital 2 is about 7% worse than hospital 1. As this example illustrates, recalculating the weights won’t completely resolve the problem if the missing weights apply to cells which have high or low event rates, and the method would still have the disadvantage of not being usable with continuous covariates.

Indirect standardisation

In indirect standardisation a set of standard casemix specific event rates is 'weighted’ by the local population casemix. In effect this calculates the number of events expected in the local population if the standard event rates had happened. The indirectly standardised rate is usually presented as the ratio of the observed number of events to the expected number. When the events are deaths this is known as the Standardised Mortality Ratio (SMR) and we use this term for all standardised event ratios.
A simple way to calculate an SMR is to use a logistic regression model with the casemix as covariates to estimate the probability of death in each patient across all comparators together. These probabilities are summed over all the patients in each comparator to derive the expected number of deaths in that comparator. The two methods - using locally weighted standard casemix specific event rates or logistic regression - will give the same results if the standard casemix specific event rates are derived from the pooled data and the logistic regression includes all possible interactions rather than just main effects say [5]. However, the logistic regression method has the advantages of being able to use continuous covariates and being able to simplify large complex casemix models.
The problems of indirect standardisation have previously been reported [2, 6]–[10]. Briefly, since the set of weights reflects the local population casemix they are different for each centre and so the SMRs aren’t strictly comparable between centres [8]. When the casemix is very different the SMRs may not be comparable at all [7].
The problem of non-comparability is illustrated in Table 2 which shows a simple example with two hospitals with identical casemix specific mortality rates but different casemix. Though the performance of the two hospitals is identical, the hospital with the largest proportion of high risk patients (40% vs 30%) has a lower SMR (105 vs 112).
Table 2
Example of non-comparability of SMRs
Casemix group
National standard death rate
Hospital A: Death rate
Hospital A: Casemix proportions
Hospital B: Death rate
Hospital B: Casemix proportions
1. High risk
2. Low risk
O = (0.4×0.8) + (0.6×0.2)
O = (0.3×0.8) + (0.7×0.2)
E = (0.4×0.9) + (0.6×0.1)
E = (0.3×0.9) + (0.7×0.1)
SMR = 105
SMR = 112
Thus standardisation for casemix indirectly via SMRs cannot yield fair comparisons, but the correct way (direct standardisation) also may not work because of different patterns of empty casemix combinations and is not possible with continuous covariates.
This paper explores an alternative solution to the calculation of comparable standardised rates.


Calculate event rates in risk groups rather than casemix groups

One possible approach to calculating directly standardised rates when there are empty casemix combinations is to differentiate between casemix standardisation and risk adjustment. The reason for the non-comparability of crude event rates is usually said to be because of differences in casemix in the comparators. However, non-comparability actually follows from differences in the risk distribution in the comparators. If different casemixes gave the same risk distribution, crude unadjusted comparisons would still be fair. For example, if older patients admitted to hospital for elective procedures have the same risk of mortality as younger emergency patients, then unadjusted comparisons of mortality between two hospitals one of which had a majority of older elective patients and the other a majority of younger emergencies could still be fair.
It follows from this that a solution to the difficulty of calculating directly standardised rates in the presence of empty casemix combinations is to convert the complex multidimensional casemix to a simple one-dimensional risk distribution and then directly standardise across the risk distribution. The risk is calculated using a standard logistic regression modelling approach using the casemix variables. This model is the same as would be used in indirect standardisation and can use continuous covariates as well as fixed factors. The model is fitted to the whole dataset aggregated across the centres (eg the institutions, populations, or years) to be compared. Estimates of the predicted risk for each case in the aggregated dataset are obtained and using this each person is assigned to a risk category. Observed event rates within each risk category are then calculated for each centre, and these event rates are weighted and combined using a standard set of weights. In order to make comparisons easy, an index similar to the Standardised Mortality Ratio (SMR), the Comparative Mortality Figure (CMF) can be calculated by dividing the standardised rate by the overall rate (see the Appendix in Additional file 1) [3].

Calculating directly risk standardised rates

The first step is to estimate the risk of an event for each person in the whole population (ie aggregated across all comparators) which is usually done using a logistic regression model. How should this model be specified? This is the same problem for all methods of standardisation. In conventional direct casemix standardisation, the casemix variables must be chosen and any continuous covariates have to be converted into categorical factors. For indirect standardisation a logistic regression model using the casemix variables has to be specified in order to estimate the expected numbers of events from the predicted risks. Misspecification of the model is likely to lead to invalid comparisons between centres for all methods of standardisation, including our proposed DRS method. However, for the purposes of this study, which compares different methods of standardisation rather than different models for standardisation, we have simply used the same models for each of the methods in order to ensure comparability.
The second step is to assign each case to a risk category. These risk categories are defined using the whole aggregated dataset. There are several options for defining the risk categories and choosing the weights for standardisation (see Table 3). It is important to use risk categories which don’t mean that some centres have risk categories with no patients in them since this obviates the point of the proposed risk adjustment method. For example, choosing risk categories of equal width (such as a risk from 0.0–0.1, 0.1–0.2, 0.2-0.3, etc.) may mean that there are no patients in some of the lowest or highest risk groups in some centres, and this method will not usually work. Choosing groups with equal numbers of patients in each group will mean that there are no events in some risk categories if the risk over the whole population is small, such as with in-hospital mortality which is typically about 5%. Choosing groups with equal numbers of observed or predicted events will usually ensure that there are some events and therefore some patients in each risk category in every centre unless there are some centres with very few cases.
Table 3
Methods of calculating risk categories and weights
Creating categories of risk
Weights for combining risk category specific event rates for each centre
Equal width: 0.0-0.1, 0.1-0.2, 0.2-0.3, etc.
Equal numbers of patients in each
Proportion of all patients in each category
Equal numbers of observed deaths in each
Proportion of all observed deaths in each category
Equal numbers of predicted deaths in each
Proportion of all predicted deaths in each category
We found that choosing categories with equal numbers of observed events is simpler than choosing categories with equal numbers of predicted events and gives similar results, so this is our preferred method. Of course all patients with the same casemix fall into the same risk category, and so it may not be possible to create categories with exactly the same number of events, rather this is a guiding principle for choosing categories.
The number of categories to use is also a matter of choice. The method only works exactly (ie centres with identical casemix specific risks have identical DRS rates) if all the patients grouped into the same risk category have the same risk. So the more risk groups that are used the more exact the method becomes. However, the more categories that are used the more chance that there will be some centres with some risk categories with no cases. Our results suggest that about 10 categories should be used if possible (see below), although up to 20 could be used for very large datasets.
With regard to the weights for combining the risk category specific event rates, the natural weights are the proportion of patients in each category in the whole population aggregated across comparators since this means that the standardised event rate for the whole population is the same as the observed event rate. Furthermore, with these weights the CMF for a centre is just the DRS rate for that centre divided by the observed event rate for the whole population in all centres combined.


Data and methods

We have explored this proposal using two datasets.
First we have used Hospital Episode Statistics (HES) data for approximately 6 million admissions to 146 general and acute NHS hospitals in England during 2007/8 linked to mortality files. The events that we have used are deaths 30 days post admission. The estimation of the probabilities of death has been carried out using the architecture of the standard Summary Hospital Mortality Index (SHMI) model using clinical code on admission, age, sex, mode of admission and co-morbidities (using the Charlson index treated as a categorical variable) [11]. The SHMI model has been fitted to the aggregate data for all hospitals together using standard logistic regression, and the predicted probabilities of 30-day mortality were extracted to estimate risks and calculate expected numbers of events.
The second dataset we have used is the data on 18,668 adult emergency medical admissions from 9 centres in the UK and overseas collected for the DAVROS project which has developed models for casemix adjustment [12]. We have used the model including age, ICD10, and history of malignancy together with categorical values for six physiological measurements. We have used age as a continuous covariate to illustrate the method. Cases with any missing data have been deleted, and this model has been fitted to the aggregate data for all 9 centres using standard logistic regression and the predicted probabilities of 7-day mortality extracted.
We have calculated an SMR for each hospital in the HES data and each centre in the DAVROS data using the ratio of the observed number of deaths to the sum of the predicted probabilities from the models.
In both cases we have calculated the DRS rate using approximately equal numbers of observed deaths in the aggregate data to define 20 risk categories for the HES data and 10 for the DAVROS data. We have weighted the centre-specific mortality rates in the risk categories by the proportion of patients in the risk category in the aggregate data. We have calculated the DRS CMF by dividing the DRS rate by the overall population mortality rate in the aggregate data (that is the total number of deaths divided by the total population).
Standard errors for the SMR and the DRS CMF for the DAVROS data have been calculated using the formulae in the Appendix (see Additional file 1) and by a simple bootstrap taking 1000 samples with replacement from each centre, calculating the SMRs and DRS CMFs for each sample, and calculating their standard deviation from the average value in the bootstrap samples. Standard errors for the estimates for the 146 hospitals in the HES data are not shown.
We have also re-calculated the DRS CMFs for the 146 hospitals in the HES data using between 5 and 25 risk categories in order to examine the reliability of the method with different numbers of categories. We have calculated the rank correlations between the values of DSR CMFs calculated using different numbers of categories.
For the HES data we have calculated the weights that a conventional directly standardised rate (DSR) would use (that is, the proportion of all the patients in the whole dataset falling into each possible casemix combination). In calculating the DSR for a particular hospital, if there are no patients in a casemix combination the weight for that combination is not used. So for each hospital we have summed the weights that have actually been used in calculating the DSR for that hospital. We have not done this for the DAVROS data as age has been treated as a continuous variable and a conventional DSR cannot be calculated.


HES data

Figure 1 shows the sum of the weights actually used in the conventional direct casemix standardisation for each of the 146 hospitals. In every hospital there are some casemix combinations with no patients, so the weights actually used do not sum to 1.0 in any hospital and are often less than 0.8. In response to this problem, the directly casemix standardised CMFs used here for comparing with the SMRs and directly risk standardised CMFs have been calculated by adjusting the weights so that they sum to 1.0 in each hospital.
Figures 2 and 3 compare the different methods of standardisation of hospital mortality rates in the HES data. Figure 2 shows that there is a difference between the results for conventional direct casemix standardisation and our proposed direct risk standardisation which could have an impact on the assessment of performance. For example, two of the hospitals in the worst eight for poor mortality performance using conventional direct casemix standardisation are not in the worst 40 using our proposed method. However, Figure 3 shows that the new method very closely replicates the SHMI which is an SMR.
Figure 4 shows the correlation between the DRS CMFs for the 146 hospitals in the HES data calculated using between 5 and 25 risk categories. It will be seen that there is some discrepancy between the DRS CMFs calculated using 5 categories and the most reliable estimate using 25 categories, with a rank correlation of 0.980. However, when 10 categories are used the correlation increases to 0.998 indicating that in this dataset 10 categories were sufficient to calculate a reliable standardised rate.


Table 4 shows the SMRs and the CMF calculated using the proposed DRS method for the nine centres in the DAVROS data. Again it will be seen that the CMF calculated from the directly risk standardised rate and the SMR are very similar. Table 4 also shows the standard errors (SEs) calculated from the observed data using the formulae given in the Appendix, and also calculated by the bootstrap method. It will be seen that the standard errors of the SMR and the CMF are also very similar.
Table 4
Observed values, and standard errors (SEs) and bootstrapped standard errors, for the SMR and CMF for nine centres in the DAVROS data
Observed value
SE (theoretical)
SE (bootstrap)
Observed value
SE (theoretical)
SE (bootstrap)


We have illustrated a new method for direct standardisation of event rates that is
  • As easy to calculate as the SMR
  • Creates an index, the CMF which is similar to the SMR, or a standardised rate
  • Can be calculated using continuous covariates
  • Unlike the SMR, can be used to compare populations, centres or time periods fairly
  • Has an easily estimated standard error that is similar to the SMR
The method converts the complex multi-dimensional casemix to a single dimensional risk distribution, and this can be seen as a development of the methods proposed by Hollis [13]. She proposed that W scores, which are similar to SMRs but are the difference in oberved and expected events rather than their ratio, should be calculated in a few risk categories and then combined using a standard set of weights in order to make fair comparisons between centres with different casemix. Glance [7] proposed the same approach for calculating SMRs in risk categories and then combining these to enable fair comparisons of SMRs between centres. However, rather than calculating SMRs or W scores in risk categories, it is simpler to calculate the actual event rates in each category and then combine them as we have proposed.
We can’t overcome the problem of non-comparability of SMRs or W scores by using direct casemix standardisation of the event rates because of the problem of different patterns of empty casemix groups in different centres occurring for random or organisational reasons. It could be argued that no comparisons should be made between institutions with different patterns of organisational zeros because comparisons between types of institution, such as women’s hospitals, children’s hospitals, mental health hospitals, independent treatment centres only doing elective cases, and general hospitals, are not meaningful [14]. So the real problem is the occurrence of random zeros, and in some cases this could be solved by increasing the size of the dataset, eg taking two years of data, or collapsing the casemix categories, eg taking 10 year age bands rather than 5 year bands. However, in the sorts of models we have been considering with tens of thousands of casemix combinations this may not solve the problem. Furthermore, there would still be the need to omit or categorise continuous covariates.
We have suggested that one approach to get around the problem of empty casemix combinations in conventional direct casemix standardisation might be to re-calculate the weights actually used in each centre so that they always sum to one. Unfortunately this is only a partial solution since it now means that each centre could be using a different set of weights and so, in exactly the same way as for indirect standardisation, comparisons between centres are not fair.
The Directly Risk Standardised CMF is not exact (in the sense of guaranteeing that two centres with identical casemix event rates have identical CMFs) unless all the cases in each of the risk categories have exactly the same risk as each other. This will not usually be true. However, the inaccuracy is related to the number of categories used because with more categories all the cases in a category are more likely to have the same risk as each other. We calculated the effect of using different numbers of categories in the HES data and found negligible differences between using 10 and 25 categories. We therefore suggest that typically about 10 categories should be used. However, if there are some centres with very few cases then this may lead back to the problem that in these centres there may be some risk categories with no cases and direct standardisation methods, including the DRS method, will not work. In this case it may be necessary to use fewer categories. In her example for comparing trauma centres Hollis [13] uses 6 categories for standardising W scores. An alternative would be to omit small centres with empty categories from comparisons since with very few events their standardised rates may be too unreliable for robust comparisons anyway. The alternative of reverting to indirect standardisation using SMRs is not recommended unless the casemix of the centres to be compared has been shown to be similar since studies have shown that if this is not true then there may be substantial biases in the comparison of SMRs [7, 15].
In our examples, comparing hospitals with similar casemix and large samples, we found very little difference between the SMRs and the directly risk standardised CMFs. This has been found before [10] though the authors of that study also showed that when casemix differs between hospitals, SMRs vary between hospitals providing the same quality of care. They concluded that direct standardisation was theoretically preferable, but “practically impossible when multiple predictors are included in the casemix adjustment model”. We have shown that, on the contrary, it is possible using risk standardisation.
The example we have used suggests the standard errors of the SMR and Directly Risk Standardised CMF are similar. It remains to be determined whether this is generally true. The bootstrap seems to suggest that the theoretical formulae overestimate the standard error, but the standard errors for the SMR and CMF are similar with both the theoretical values and the bootstrap ones.
The fact that we found little difference between the DRS CMFs and the SMRs, and between their standard errors, points to an important limitation of our study. We do not know to what extent these findings depend on the particular examples we have chosen. We do know that as the casemix differences between centres being compared increase, the biases in SMRs increase, and hence the likely discrepancy from the DRS CMF. But we haven’t quantified these biases, and we don’t know what characteristics determine the relative standard errors of the two methods. Hence we don’t know in what circumstances indirect standardisation should be rejected in favour of the DRS CMF. A large simulation study comparing all methods of standardisation, reflecting real life data with missing values, and looking at outcomes such as the detection of outliers would be necessary.


In conclusion, it should be reiterated that all methods of standardisation require specification of a 'risk’ model, and the choice of this model is probably more important than the method of standardisation. Nevertheless for a given model it is important that the best method of standardisation should be used and since direct standardisation using the DRS method is as straightforward as using the SMR and overcomes the problem of the non-comparability of SMRs, it should be preferred when the centres being compared may have different casemix profiles and tables of comparative performance using standardised measures are being constructed.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JN conceived the idea, analysed the DAVROS data, and drafted the manuscript. RJ helped conceive the idea, analysed the HES data, and helped revise the manuscript. MJC helped conceive the idea, and revise the manuscript. All authors read and approved the final manuscript.
Additional file 1: Appendix. It contains technical formulae for the calculation of the DRS rates, SMR, and their standard errors. (DOCX 18 KB)
Authors’ original file for figure 1
Authors’ original file for figure 2
Authors’ original file for figure 3
Authors’ original file for figure 4
Über diesen Artikel

Weitere Artikel der Ausgabe 1/2013

BMC Medical Research Methodology 1/2013 Zur Ausgabe

Neu im Fachgebiet AINS

Meistgelesene Bücher aus dem Fachgebiet AINS

  • 2014 | Buch

    Komplikationen in der Anästhesie

    Fallbeispiele Analyse Prävention

    Aus Fehlern lernen und dadurch Zwischenfälle vermeiden! Komplikationen oder Zwischenfälle in der Anästhesie können für Patienten schwerwiegende Folgen haben. Häufig sind sie eine Kombination menschlicher, organisatorischer und technischer Fehler.

    Matthias Hübler, Thea Koch
  • 2013 | Buch

    Anästhesie Fragen und Antworten

    1655 Fakten für die Facharztprüfung und das Europäische Diplom für Anästhesiologie und Intensivmedizin (DESA)

    Mit Sicherheit erfolgreich in Prüfung und Praxis! Effektiv wiederholen und im entscheidenden Moment die richtigen Antworten parat haben - dafür ist dieses beliebte Prüfungsbuch garantiert hilfreich. Anhand der Multiple-Choice-Fragen ist die optimale Vorbereitung auf das Prüfungsprinzip der D.E.A.A. gewährleistet.

    Prof. Dr. Franz Kehl, Dr. Hans-Joachim Wilke
  • 2011 | Buch

    Pharmakotherapie in der Anästhesie und Intensivmedizin

    Wie und wieso wirken vasoaktive Substanzen und wie werden sie wirksam eingesetzt Welche Substanzen eignen sich zur perioperativen Myokardprojektion? 
    Kenntnisse zur Pharmakologie und deren Anwendung sind das notwendige Rüstzeug für den Anästhesisten und Intensivmediziner. Lernen Sie von erfahrenen Anästhesisten und Pharmakologen.

    Prof. Dr. Peter H. Tonner, Prof. Dr. Lutz Hein
  • 2013 | Buch

    Anästhesie und Intensivmedizin – Prüfungswissen

    für die Fachpflege

    Fit in Theorie, Praxis und Prüfung! In diesem Arbeitsbuch werden alle Fakten der Fachweiterbildung abgebildet. So können Fachweiterbildungsteilnehmer wie auch langjährige Mitarbeiter in der Anästhesie und Intensivmedizin ihr Wissen gezielt überprüfen, vertiefen und festigen.

    Prof. Dr. Reinhard Larsen

Mail Icon II Newsletter

Bestellen Sie unseren kostenlosen Newsletter Update AINS und bleiben Sie gut informiert – ganz bequem per eMail.