Pseudo values
Considering the previous example, the transformation model can be written as g(S(t|T,A,G))=g(S
0(t))+α T+β A+γ G.
A possibility to estimate transformation models, using standard available software, is through pseudo-values [
32]. The pseudo value is defined for each subject
i at any time
t and is given by
(2)
where n is the sample size, is the survival probability based on the Kaplan-Meier estimator using the whole sample and is the survival probability obtained by deleting the i subject from the sample. When no censoring is present in the data, the pseudo values for subject i at time t is simply 1 if the subject is alive at t, while it is 0 if the event happened by t. Suppose to have an exposed male, 40 years old, which dies after 30 months of follow-up. The pseudo values computed at 12, 24 and 36 months are equal to 0, 0 and 1 respectively. The times at which the pseudo-values are computed are called pseudo-times.
When censoring is present in the data, pseudo-values are still defined for each subject (even those censored) and for each time, but the values may also be less than 0 or greater than 1 (See [
33]; page 5310–11 for further details on the properties of pseudo-values).
In general, to allow inference on the entire survival curve, M (greater than 5) pseudo times are used, considering, for example, the quantiles of the unique failure time distribution. As M pseudo values are computed for each subject, an augmented data set is created with M observations for each subject.
The pseudo-values are then used as responses in a regression model for longitudinal data, where time is a covariate. As no explicit likelihood is available for pseudo-values, generalized estimating equations (GEE), [
34], are used accounting for the correlation of the pseudo-values within each subject. The cluster robust variance-covariance is used for hypothesis testing using Wald tests. In general an independence working variance-covariance matrix can conveniently be used in the estimation process [
32].
In order to model
g(
S
0(
t)), the transformed baseline survival function, the standard procedure is to insert in the regression model indicator functions for each pseudo-time. If all event times would be used to compute the pseudo-values, the insertion of indicator functions would result in a non parametric representation of the (transformed) baseline survival, as in the Cox model. In general only a small number of pseudo-times are used obtaining a parametric baseline representation. As an alternative, spline functions can be inserted in the regression model, as did [
35] in a non-pseudo-values framework.
Considering for simplicity only two spline bases, the regression model of the example can be written as follows:
(3)
where
B
1(
t) and
B
2(
t) represent the first and second spline bases for time t. For example, if a restricted cubic spline basis is used with three knots at
k
1,
k
2,
k
3, then
B
1(
t)=
t and
, where, for example,
is equal to (
t−
k
1)
3 if
t>
k
1, otherwise is 0. Knots are chosen at quantiles of the failure time distribution. In the case of 3 knots the quantiles commonly suggested are 0.1, 0.5 and 0.9, [
36]. To choose the complexity of the spline the QIC, [
37], an information criterion proposed for generalized estimating equations, can be used. A less formal strategy is the graphical comparison between the Kaplan-Meier marginal survival probability and the marginal probability obtained from the transformation model without covariates. Such a procedure will be used in the examples.
The first part of the model, ϕ
0+ϕ
1
B
1(t)+ϕ
2
B
2(t), provides a parametric representation of the (transformed) baseline survival function, g(S
0(t)), during follow-up time.
The coefficients
α,
β and
γ represent the covariate effects expressed as differences in the Survival probability, transformed by
g associated with a unit increase in the covariates. Let us consider such an issue in detail. When
g is the logit link function, a proportional odds model is estimated. Accordingly,
α,
β and
γ represent the logarithm of the ratio of the odds of surviving associated with the change of one unit in the covariates. Such an effect is constant through follow-up times. The exponentiation of the parameter estimates represent therefore the ratio of the odds of surviving. Similarly, the logarithmic link produces a proportional risks model and the
e x p(
α),
e x p(
β) and
e x p(
γ) represent the ratio of the survival probabilities (Relative Risks,
RR). The identity link produces a constant survival difference model:
α,
β and
γ represent the adjusted differences in survival probabilities (risk differences,
RD). A constant difference model through follow-up is often not practical as a model such that at the beginning of the follow-up the survival curves start at 1 and then, eventually, become different. However, it is to be noted that the first pseudo-time is never placed at time 0, but later on the follow-up time scale. In Figure
1 an example of the model based
RD estimate with pointwise confidence intervals, constant through time, is reported in the right panel. The constant model estimated
RD can be used to obtain a constant estimate of
NNT by inversion. In the case of treatment T:
. The value of 1 indicates the largest possible effect of
NNT, while in correspondence of no covariate effect (
RD=0) the
NNT value is ±
∞. The largest possible harmful effect is −1. Positive and negative values of
NNT represent the expected number of patients needed to be treated for one additional patient to benefit and to be harmed, respectively.
In the case of the log-log link, g=l o g(−l o g(•)), e x p(α), e x p(β) and e x p(γ) are the ratio between cumulative hazard functions associated with the change of one unit in the covariates. This ratio is equal to that of hazard functions, only in the proportional hazard case.
The method allows to estimate the measures of effect also for continuous covariates. For example, the evaluation of a biomarker effect measured on a continuous scale, without cutoffs, is still possible with this methodology.
The use of different link functions to obtain a particular measure of effect, is an established technique in binomial regression, where the use of non-canonical links, such as the logarithm, allows to obtain adjusted measures of impact different from the odds ratio, [
1,
38]. Wacholder, [
39], is an excellent reference for deepen such aspects in the framework of logistic regression.
When there is evidence for time dependent effects of the covariates, the interaction between the covariates and the spline bases
B
1(
t) and
B
2(
t) are included in model (3).
In such a case, the estimated g-transformed survival probability differences change during follow-up time. In order to show the effect, varying in time, of a dichotomous covariate, for example treatment T, it is useful to adopt a graphical display, where the time is put on the horizontal axis while the function e x p(α+γ
1
B
1(t)+γ
2
B
2(t)) is on the vertical axis (exponentiation is not used with the identity link; R D(t)=α+γ
1
B
1(t)+γ
2
B
2(t)). In this case the estimated N N T(t) is naturally varying through follow-up time and again obtain by inversion: R D(t)−1.
For a continuous covariate, such as Age, A in the example, it is possible to use a surface plot, where Age and time are on the x and y axis, while the z axis reports the covariate effect with respect to a reference value. It would also be possible to model Age effect with spline bases. In this case, the interaction between Age and time is obtained through tensor product spline bases of Age and time.
When a large number of pseudo-times is used, spline functions allow to model parsimoniously the baseline risk compared to indicator functions. This is particularly important for the modelling of time-dependent effects in connection to the different link functions. In principle when a covariate effect is constant using a specific link, it should be time-varying with the other links. No statistical evidence against a constant covariate effect for more than one link may only be due to lack of power. The problem can also be exacerbated by some multiple testing issue. Time-dependent effects selection depends therefore on the link transformation used. As a consequence, the adjusted effect of a covariate may be constant using a link function, but time-dependent using a different link.
Moreover, the fitted values of the different models selected for the different link functions are generally different, being equal only if the models are saturated. Traditionally, the strategy used in the application of transformation models such as (3) was to select the best fitting
g transform, i.e., the transform where covariate effects are constant through time, see [
40,
41] as examples. The approach considered here is different. The interest is in using the
g-transform which is the most informative for the clinical or biological counterpart. Generally the best fitting link function and the one selected by the researcher are not the same. Time dependent effects should therefore be expected in the model.
Pointwise confidence intervals
Approximate pointwise 95
% confidence intervals are calculated from model results as in standard GLM/GEE modelling. The computation is easy when covariate effects are constant on the
g-transformed scale. The cluster-robust variance-covariance matrix must be used. Using model (3) as example, the 95% CI for treatment, on the transformed
g scale, will be
where is the estimated cluster-robust standard error for the model parameter α. When g is the log, logit or l o g−l o g link function, the 95% CI for the treatment effect (respectively an RR, OR or HR) is [ e x p.(l
lower
),e x p(l
upper
)]. With the identity link, the 95% confidence intervals is [ l
lower
,l
upper
], without additional transformations, and the corresponding interval for NNT is [ 1/l
upper
,1/l
lower
].
A Clarification is necessary for the confidence interval of the NNT.
When the estimated constant
RD is not statistically significant the confidence interval of
RD includes 0. The limits of the confidence interval are one positive and the other negative. The resulting confidence interval for
NNT should include infinity (
∞), [
42]:
With time-varying covariate effects, the variance of the sum of a linear combination of different parameter estimates must be computed for the each of the follow-up times. For example, for treatment
T, the variance of interest, at a specific time
t, is that of the linear combination
α+
γ
1
B
1(
t)+
γ
2
B
2(
t). Written in matrix terms the variance at time
t is given by:
where V(•) stands for the cluster-robust variance, while C o v(•,•) stands for the cluster-robust covariance of two random variables. When the variances at the different times are calculated, the pointwise 95% CI can be computed as before.
Software implementation
The approach to censored data regression based on pseudo values was applied to regression models for the cumulative incidence functions in competing risks and for multi-state modeling [
43], for the restricted mean [
44] and for the survival function at a fixed point in time [
45]. Implementation details and software can be found in Klein et al., [
32], and Andersen and Perme, [
46].
Software is available to compute pseudo values (macro
%pseudosurv
in
SAS
and function
pseudosurv
in
R
package
pseudo
[
32]) Standard GEE tools available in
SAS
or
R
can be used for regression. In
SAS
the
proc
genmod
allows to change link functions using the instructions
FWDLINK
and
INVLINK
. In
R
, the package
geepack
can be conveniently used, see [
32] for details.
As an example of the
R
software implementation, the identity link is used:
where the variable
pseudo
contains the pseudo values and the variable
tpseudo
the pseudo-times according to the software reported in [
32]. The
R
function
rcs
of the package
rms
, [
47], is used to compute restricted cubic spline bases. Each subject is represented by multiple rows in the data, one for each pseudo time. The records for each subject are identified by means of the variable
id
which is used to estimate the robust standard error by the
geese
function. Using the identity link function, the estimated coefficients can be interpreted as the adjusted
R D(
t) estimates.