Medical Methodology
The full methodology will be described in a separate clinical publication; briefly, the laser-Doppler imager records flux measurements of skin blood flow, measured in PU, for each of typically thousands of pixels corresponding to a wound area. The mean flux measurement with which we shall be largely concerned is the average flux across these pixels.
Burn wounds were assessed only once by laser-Doppler imager (moorLDI, Moor Instruments, UK) between 2 and 5 days post burn, and the burns were photographed at this time and also at 14 and 21 days post burn to record the burn at time of LDI and its subsequent healing.
Assessments were made at 14 and 21 days post burn because these times are important for clinical decisions on the burn treatment that is likely to result in least scarring: surgical (skin graft) or conservative (dressings only). A burn wound to most adults that heals within 21 days will do so with minimal scarring: 'Second degree burns that heal within 3 weeks are unlikely to leave scars' [
11] ch5, p 70. Exceptions to this include infants, where the risk of infection is higher, and patients of ethnic origins predisposed to hypertrophic scarring. For these groups, surgical management is frequently performed on wounds that are expected to take longer than 14 days to heal [
8].
Healing was assessed by clinicians and was defined as a wound with a continuous covering of epithelial cells. The boundary of healed wound at 14 days and 21 days post burn was assessed from the photographs taken at these times. Areas healed and not healed at 14 and 21 days were mapped back on to the original laser Doppler images for later analysis. This was necessary because a wound is usually heterogeneous, with parts healing at different times. These areas were then redrawn on a grey-scale photo image that was pixel-position identical with the flux image (obtained simultaneously with the flux image); flux values within the corresponding regions of interest were then exported for analysis.
Exclusions were made where burn wound boundary selections would have had a significant effect on the result, for example a boundary at a steep flux gradient. Prior to analysis, the data used here were screened for clinical factors such as wound infection, tattoos, drugs and concomitant patient sickness; and technical factors such as patient movement, edge effects and reflections.
The day of the LDI scan was determined by clinical factors, staff availability and the study protocol: the protocol restricted the day of scan to within 2 to 5 days post burn based on previous observations by others on the reliability of the LD technique.
Besides time to healing, the usual demographic variables, age and gender, were recorded, also skin pigmentation, which could affect recovery time. The medical history variables relevant to burns studies were total burned surface area, burn site on the body, and burn cause. Treatment and procedural variables were burn center and day of LDI scan.
The clinical decision to perform surgery inevitably censored the healing time in some cases. Some patients went to surgery for some of their burns before day 14 or between days 14 and 21. If the surgery was before day 14, the wound was excluded; if after day 14 we have recorded healing time > 14 days (similar to a patient who did not attend at day 21) unless there was further biopsy evidence. Where possible, biopsies were taken from wounds at surgery and some of these results for healing time have been included in the current analysis. Where the histological analysis found a full thickness wound, these are known to take longer than 21 days to heal; where histology found the wound to be superficial dermal (at the wound edge), these were classified as healing time < 14. Wounds found to be deep dermal were not included because these could heal before or after 21 days and even before 14 days if adjacent tissue had very high flux.
Exploratory Data Analysis
In total, data on 768 burn areas were available, but of these only 310 had mean flux measured, of which 299 were complete, the other 11 having a missing outcome assessment because the patient did not attend one follow-up visit or because of surgery. The analysis here is focused on the 310 burn areas for which flux measurements were available. These areas came from 100 patients.
Table
1 summarises the univariate statistics, for the 310 cases for which flux data were available. Note that for one of the five burn centers, there were no cases with flux measured. Median age was 32 years, minimum 1 and maximum 88 years, and the %TBSA (total burned surface area) had median 6%, minimum 1% and maximum 68%. A larger sample of 581 burn areas, where flux measurements were not always available, was used to boost the statistics for an exploratory analysis of the relationships among the observational variables.
Table 1
Data summary, showing breakdowns of burn areas from 100 patients.
Healing | < 14 days | 190 |
| 14–21 days | 47 |
| > 21 days | 62 |
| censored | 11 |
Burn center | 1 | 4 |
| 2 | 218 |
| 3 | 37 |
| 4 | 51 |
Scan day | 2 | 60 |
| 3 | 206 |
| 4 | 37 |
| 5 | 7 |
Burn site | limb | 127 |
| extremity | 88 |
| torso | 72 |
| face | 23 |
Gender | male | 182 |
| female | 128 |
Race | white | 294 |
| black | 16 |
Burn cause | scald | 167 |
| flame | 108 |
| chemical | 10 |
| flash | 12 |
| electrical | 2 |
| contact | 11 |
There are naturally many correlations among these variables. Demographic mix and %TBSA varied across the burn centers. Similarly, mean age varies with burn site.
There are some interesting gender differences. The type of burn site varies a little by gender, and this is statistically significant as shown by chi-squared test (
χ
2[3] = 11.7,
p = 0.008), with men relatively more likely to be burnt on the face, and women on the torso. Burn cause varies considerably between the genders (table
2). It is mainly men who experience flash, chemical, electrical and contact burns, presumably because of gendered employment. This difference is very significant (
χ
2[5] = 75.6,
p < 0.001.)
Table 2
Gender differences in burn statistics.
Mean Age | 33.4 | 32.0 |
Mean %TBSA | 9.2 | 8.5 |
Burn cause | | |
Scald | 49.7% | 50.3% |
Flame | 61.1% | 38.9 % |
Chemical | 100.0% | 0.0% |
Flash | 100.0% | 0.0% |
Electrical | 100.0% | 0.0% |
Contact | 81.8% | 18.2% |
Healing time | | |
< 14 days healing | 69% | 31% |
14–21 days healing | 47% | 53% |
> 21 days healing | 39% | 61% |
The age and %TBSA distributions are similar. It is also clear that healing time is longer for females than for males. There is however the problem of confounding, for example the slower female healing time could be at least partly the result of gender-specific burn sites and burn causes. This issue will be addressed in the survival-time modeling.
The slower female healing time is an important fact, and we briefly summarise previous findings in this area. There are a few studies on gender and morbidity, with mortality being the more frequent focus. The effect of gender, in these studies, is considered with other patient and wound variables: e.g. age, race, %TBSA and wound type, inhalation injuries and biochemical markers. Large %TBSA burns are included in the patient groups and these studies indicate that gender does influence outcome. Length of hospital stay and duration in intensive care have been used to assess morbidity. For adults, length of stay was greater for women [
12] but in children, duration in the intensive care unit was found to be greater for boys [
13]. The child gender difference was also found for mortality, higher for boys than for girls [
14]. Adult mortality has been found to be greater in women by two-fold [
15,
16], and there is debate over the age group at highest risk [
17,
18].
The findings of the current investigation of the effect of gender on healing time are therefore consistent with previous work. However, it must be stressed that there are no studies looking at the residual effect of gender once the laser-Doppler measurement is known, which is the main focus of this study
Statistical Methodology
There were 433 burn areas for which clinician's predictions of healing time using the LDI were available. Of these a subset of 310 wounds was appropriate for computing average laser-Doppler flux. The analysis here focuses on the use of flux as a predictor rather than using the clinician predictions, mainly because this shows the performance of the technology when quite divorced from clinical judgement. Also, the mean flux measurement data are probably better suited to answering questions about the role of covariates. Proportional-odds (PO) ordinal logistic regression is the most popular method of analyzing data such as these, where a dependent variable
Y that takes ordered integer values is modeled as a function of a vector
xof covariates. An introductory account is given in [
19] and general descriptions in [
20‐
22]. We seek to predict a dependent variable (healing time) that is ordered, and model the
logits of the cumulative probabilities of healing in under 14 days or in 14–21 days as a linear function of the covariates. In the 'parallel lines' model usually fitted to the
kth (of 2) cumulative probabilities
P
1 and
P
2, the logit is a function
logit(P
k
) = log(P
k
/(1 - P
k
)) = α
k
+ β
T
x,
where only the intercept a is a function of k, and the vector of coefficients βof the covariates xis not. The probabilities of the three healing times, p
1, p
2 and p
3 are given by p
1 = P
1, p
2 = P
2 - P
1 and p
3 = 1 - p
1 - p
2 = 1 - P
2.
Surgeons occasionally desire to predict probabilities of healing over different time intervals than the ones used here. We suggest that this could be attempted by regarding α as a function of time t, for example α = γlog(t/t
0). Then we can determine γ and t
0 from α
1 = γlog(14/t
0), α
2 = γlog(21/t
0). With this parameterization, probabilities of healing over two or three different intervals could be found. This model corresponds to a log-logistic distribution of healing time for fixed covariates.
In a later section of this paper, on palette derivation (a discriminant problem) we follow the methodology of [
23,
24], which dispenses entirely with the need to choose an ordinal model. However, for assessing the significance of covariates using likelihood-based inference, a model is required. Other models besides PO are also available ([
25‐
27]), and there is no overriding reason to choose the PO model. Hence two other models were also fitted, to see whether the PO model could be improved on, the probit and continuation ratio (CR) models [
26,
28]. The latter model is applicable where, as here, the three data groups are periods. As will be seen, neither of these models fitted better than the PO model. Other models, such as the multinomial model, are useful only when the categories are unordered, and are less efficient than ordinal models when used with ordinal data [
29].
The covariates used in this study, besides the LD flux measurement, are those demographic variables that are usually important in medicine, age and gender, plus those known to be important in predicting healing time, such as %TBSA and burn site, and some that might possibly be thought to be relevant, such as skin pigmentation and burn cause. It is thought that these last two are probably not important. For example, natural skin pigmentation does not affect LDI flux from debrided wounds because the pigment is removed with the epidermis.
The proper selection of predictor variables for use in a model raises some statistical problems (e.g., [
28], ch. 4. In particular, modeling choices made by the statistician after viewing the data are often not reflected in the final p-values and confidence intervals produced. Results then appear unduly significant or accurate, and traditional approaches, such as stepwise regression, can give misleading results. This problem looms large when sample sizes are small and there are many variables. Here, fortunately, we have the opposite situation.
The data posed some problems that necessitated a purpose-written computer program. An example is the existence of a few censored cases, where patients did not attend a hospital visit. In these cases, it was known only that healing took place in under 21 days or after more than 14 days. Where surgery was performed, burn severity information was obtained from biopsy results: less than 14 days was assumed for superficial dermal wounds; more than 21 days was assumed for full thickness wounds. Fitting the model by maximum-likelihood estimation meant that these cases could be included. A more serious problem was the existence of multiple burn areas on the same individual. The 310 burn areas occurred on only 100 patients. Although demographic variables that could influence healing time such as age and gender were known, there might be a further 'frailty' that varied between individuals, so that healing times would not even be conditionally independent [
30]. Obesity and smoking, for example, affect health, but were not tested here. To model frailty, we add a hypothetical frailty variable, and each individual has a random value of it. A normal distribution for the variable is the simplest choice, and as frailty turned out to be a small effect, no more elaborate modeling was done.
Taking the mean contribution from this variable as zero and the variance as
σ
2 gave a likelihood function
(2)
where the jth burn area on the ith of N individuals has healing time Y
j
and covariate vector x
j
. The assumption of a zero mean is nugatory, as a nonzero mean would be absorbed into the constants α
1 and α
2. For censored data, where for example, healing is only known to occur in under 21 days, the probability in (2) is the probability of this observed event, p
1 + p
2 or P
2.
The integration could be done by the Gauss-Hermite method as recommended by [
31]; in fact adaptive integration using the Numerical Algorithms Group (NAG) integration subroutine D01AMF was used here. The website [
32] gives full details of the NAG routines mentioned here.
The log-likelihood function ℓ = ln ℒ was maximised using the NAG function minimisers E04UCF and E04CCF [
32]. The former uses a sequential quadratic programming method and is relatively fast, and the latter uses the Nelder-Meade method and is slow but robust. The best of a large number of random restarts (20) from the current function maximum was used, to ensure that the global maximum of the log-likelihood function had been found.
The asymptotic covariance matrix for fitted model parameters is taken as the inverse of the Hessian matrix
where the
β
i
are model parameters, and
are the maximum-likelihood estimates. To obtain a more accurate estimate of parameter errors for small samples, the likelihood was maximised for 1000 bootstrapped samples, in which individuals were sampled from the dataset with replacement; see e.g. [
33]. Bootstrap resamples that did not allow all parameters to be estimated were rejected. This can happen, if for example, the resample does not contain any examples of a particular burn cause.
Estimates of the model's predictive ability were found by taking the predicted healing time as that with the highest probability. The resulting percentage of correct assignments is however liable to be over-optimistic, because the model is evaluated in-sample. A recommended method of correcting for this is to calculate the 'optimism' of an in-sample estimate as the mean of a large number of differences
D
i
between estimates from a bootstrapped dataset evaluated on the bootstrapped dataset and on the original dataset. Finally, the mean optimism
is subtracted from the sample estimate (e.g. [
28], section 5.2.5, [
33]). This procedure is better than simply splitting the sample into two, one for model fitting and one for validation [
34]. The predictive ability of a linear model may be measured by the coefficient of determination,
R
2. For instance, [
35] and others proposed a pseudo
R
2 for general models given by
where
β=
0 denotes the 'null' model. Nagelkerke (1991) [
36] proposed a correction, since the maximum value of
R
2 attainable is less than 1. The correction required normalising
R
2 to its maximum value of 1 - exp{2
n
-1ℓ(
0)}, and we use this corrected value of
R
2.
Significance tests that a parameter
β
i
is zero would classically be done using the Wald statistic, i.e. using the estimated standard error of the parameter estimate (see e.g. [
28], section 9.2.2). Alternatively, the increase in log-likelihood Δℓ on 'floating' the parameter can be used as a test statistic, when under
H
0 we have that 2Δℓ ~
χ
2[
1]. An alternative is to carry out a Wald test using the bootstrapped error estimate. However, an exact test can be obtained by retaining Δℓ as a test statistic, but obtaining its reference distribution under
H
0 by permuting the relevant variable
x
i
among the cases. The p-value of the test is the proportion of permutations for which the computed Δℓ exceeds the value for the original sample (see e.g. [
37] and [
33], chapter 15.). Variables such as gender must of course be permuted among individuals rather than among burn areas. The logic is that under
H
0 gender is irrelevant, and so the permuted datasets generate the reference distribution for the test statistic. Note that this is not the more elaborate procedure used by [
38].
Since the number of distinguishable permutations (combinations) is very large, a random sample of 1000 was generated to compute the p-value. A random permutation of n labels held in an array was achieved by swapping each array element in turn with a random array element.
The main advantage of a permutation test over a bootstrap hypothesis test is that the permutation test correctly generates the reference distribution of the test statistic under H
0, whereas the bootstrap hypothesis test can only generate the distribution of the test statistic under H
1. Also, the p-value obtained is exact, and not an asymptotic approximation. Permutation tests have a simple rationale; under H
0, gender labels are irrelevant to healing, and so we can obtain an equivalent dataset by shuffling them. The only drawback here is the extra computing time needed; a separate set of permutations must be carried out for each variable or related set of variables in turn.
In the analyses done here, variable selection was not a problem: given the laser-Doppler flux, only gender was significant. This contrasts with many clinical studies, where many covariates are significant, and the choice of best model is not easy.