A new person-time definition of follow-up rate (PTFR)
In this paper, we propose a new person-time follow-up rate (PTFR) – essentially, the observed person-time divided by the person-time assuming no dropouts. Specifically, we define the follow-up rate
η
PTFR
as:
$$ {\eta}_{PTFR}=\frac{{\mathrm{PT}}_{\mathrm{observed}}}{{\mathrm{PT}}_{\mathrm{no}\hbox{-} \mathrm{dropout}}}=\frac{\sum \limits_{i=1}^N\min \left({T}_i,{C}_i,\tau \right)}{\sum \limits_{i=1}^N\min \left({T}_i,\tau \right)}\ast 100\% $$
(3)
where PT
no-dropout = the total person-time that would have been observed in the study if there were no dropouts. The denominator is the hypothetical situation of no dropout, with subjects contributing time to event
T
i
or time to the end of the study, whichever came first. Note that the calculation of
η
PTFR
requires that the time to event
T
i
is known for all participants, whether they dropped out or not.
It can be shown that
η
CCI
underestimates
η
PTFR
since
$$ {\eta}_{PTFR}-{\eta}_{CCI}=\frac{\sum \limits_{i=1}^NI\left({C}_i\le {W}_i\right)\left(\tau -{W}_i\right)\sum \limits_{i=1}^N\min \left({\mathrm{C}}_i,{W}_i\right)}{\left(\sum \limits_{i=1}^NI\left({C}_i\le {W}_i\right)\tau +I\left({C}_i>{W}_i\right){\mathrm{W}}_i\right)\sum \limits_{i=1}^N{W}_i}\ge 0 $$
as
W
i
follows the distribution of
T
i
truncated at
τ. Using the example in Fig.
1, if none of the dropouts became events during the study,
η
PT
= 62.3% for scenario (A) and
η
PT
= 92.4% for scenario (B),
η
PTFR
=
η
CCI
; however, if 5 of the dropouts became events shortly after they dropped out, then
η
PTFR
= 65.3 % >
η
CCI
.
Because the PTFR cannot be calculated directly since the event times for dropouts are not observed, here we propose two estimation methods.
We first consider an observational cohort study design that involves repeated serial assessments of participants at fixed time-intervals of equal length (e.g., annual or semi-annual clinical visits). In addition to the baseline visit at t
0 = 0, we denote the pre-specified visit times as (t
1, t
2,…,t
K) where t
K = τ, i.e., the end of the follow-up. It is then assumed that, on average, events and censoring occur midway through each interval, consistent with standard practice in life-table analysis [
24]. Therefore, the numerator (i.e., the actual person-time of follow-up) of Eq. (
3) is estimated to be
$$ P{\widehat{T}}_{observed}=\sum \limits_{k=1}^K\left({N}_{k-1}-\frac{N_{E_k}+{N}_{C_k}}{2}\right) $$
where
N
k − 1= number of subjects at risk at the beginning of the time interval k (i.e., at time t
k-1) and
\( {N}_k={N}_{k-1}-{N}_{E_k}-{N}_{C_k},{N}_{E_k} \)and
\( {N}_{C_k} \) are number of events and dropouts that occurred during the interval k, respectively.
While PT
observed can be easily calculated by summing all participants their observed follow-up time during the study, calculation of the denominator, PT
no-dropout in the definition of
η
PTFR
, requires knowledge of the actual time to outcome event for each participant if it happened during the study, regardless whether or not the participant dropped out. This information is typically not available in a real-world study. In an earlier effort to address this problem, Chen, Wei and Huang used the known event rate for the population from which the cohort was derived to calculate “the maximum person-year”, which in our nomenclature, is PT
no-dropout [
15]. However, it is often difficult to specify the population from which a cohort is derived [
25], nor will the event rate be known except for certain general endpoints, such as all-cause mortality. Therefore, this approach is not applicable to most studies.
To estimate PT
no-dropout, herein we propose estimating the event rate based on the observed data. The survival function and the conditional probability of developing the event of interest are estimated using a nonparametric maximum likelihood approach (NPMLE) proposed by Turnbull [
26], equivalent of a Kaplan-Meier survival curve but appropriate for interval observations. To use this approach, all subjects follow-up time need to be described by an interval: if a subject experiences an event between the (k-1)th and kth visit, then that individual’s time to event is described by the interval (t
k-1,t
k); if a subject dropped out between the (k-1)th and kth visit, then that individual’s event time is described by an interval (t
k-1,t
K + 1) where t
K + 1 = some large number, such as 100 years(a theoretical time interval that in essence indicates that the person who dropped out will eventually develop an event assuming there are no competing risks); if this subject was free of events till the end of the study t
K, then that individual is given an interval (t
K,t
K + 1). The Interval package in R [
27,
28] can be readily applied to estimate the survival curve and the conditional probability of developing the event of interest during each interval.
Next, the expected number of events between (t
k-1,t
k) is estimated to be
\( {N}_{k-1}^{\ast }{\widehat{P}}_k \) where
\( {N}_{k-1}^{\ast }= \) number of subjects remained in the study at time t
k-1 if there was no loss of follow-up and
\( {\widehat{P}}_k= \)the estimated conditional probability of event during the kth interval using the NPMLE method for k = 1,…,K and
\( {N}_0^{\ast }=N \). Therefore, the number of subjects remained in the study at the beginning of the interval k + 1 if there was no loss of follow-up is then
\( {N}_k^{\ast }={N}_{k-1}^{\ast }-{N}_{k-1}^{\ast }{\widehat{P}}_k \). Then, the expected person time if there was no dropout is estimated to be
$$ P{\widehat{Y}}_{nodropout}=\sum \limits_{k=1}^K\left({N}_{k-1}^{\ast }-\frac{N_{k-1}^{\ast }{\widehat{p}}_k}{2}\right). $$
The Person-time follow-up rate is then estimated to be
$$ {\eta}_{FPT}=\frac{{\mathrm{P}\mathrm{T}}_{\mathrm{observed}}}{\widehat{\mathrm{P}}{\mathrm{T}}_{\mathrm{no}\hbox{-} \mathrm{dropout}}} $$
(4)
This method, apparently, is relying on the assumption of independent censoring, that is, the event rate of the dropout is the same as that in the general population.
While a prospective epidemiological cohort study may intend to follow participants at serial intervals of approximate equal-length (e.g., annual or semi-annual visits), not every participant returns for each visit or does so at the planned time. This leads to varying lengths of time between visits, which can sometimes be quite extensive. Clinical based cohort studies that involve ad hoc patient follow-up (e.g., cohorts defined retrospectively from hospital EMR) often result in irregular schedules of clinical visits with clustering that does not occur at random (e.g., motivated by symptoms, or an abnormal laboratory test result). To assess the follow-up rate for such data, we extended the proposed approach above to address irregular intervals between visits.
For cohorts involving intermittent and ad hoc follow-up, let
\( \left({t}_{1_i},{t}_{2_i},\dots, {t}_{K_i}\right) \) be the visit times for the ith person, where K
i is either (a) the date of the last visit in the study for the ith person; or (b) the visit that ith person was diagnosed of the event. Then for (a) we used time to the last visit as an estimate of the person’s censoring time, i.e.,
\( {\widehat{C}}_i=\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({\mathrm{T}}_i,{C}_i\right)={t}_{K_i} \), and for (b)we estimate the time to event occurred in the mid of the interval, i.e.,
\( {\widehat{T}}_i=\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({T}_i,{C}_i\right)=\frac{t_{K_i-1}+{t}_{K_i}}{2} \). The actual Person-time of follow-up by a specified time, say,
t
K
, is then estimated by the summation of all the observed follow-up times across subjects, i.e.,
$$ P{\widehat{T}}_{observed}=\sum \limits_{i=1}^NI\left(\min \left({\mathrm{T}}_i,{C}_i\right)<{t}_K\right)\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({T}_i,{C}_i\right)+I\left(\min \left({\mathrm{T}}_i,{C}_i\right)\ge {t}_K\right){\mathrm{t}}_K. $$
To estimate PTno-dropout, if the ith person developed the event at his/her last visit, the interval event time is \( \left({t}_{K_i-1},{t}_{K_i}\right) \) and if a person did not develop event at his/her last visit, the interval event time is then \( \left({\mathrm{t}}_{K_i},\mathrm{E}\right) \) where again E represents some large number. Then the NPMLE method can be applied to PTno-dropout.
As mentioned above, the use of observed data to estimate the event rate relies on the assumption that the loss to follow-up is not informative, i.e., event rate among those who remained in the study is the same as those who dropped out so that the event rate estimates obtained from the observed data apply to the unobserved. However, if the subjects who were lost to follow-up are at a different risk of recurrence than those who remained in the study, the estimates of event rates are biased. For example, if the subjects who were lost to follow-up had a higher risk of event, then the event risk is under-estimated using the observed data and the follow-up rate will be underestimated using the person-time approach because PY
nodropout
is overestimated. Conversely, if the subjects who were loss to follow-up had a lower risk of event, then the event risk is over-estimated and the follow-up rate will consequently be overestimated using the Person-time approach. Here we proposed to calculate a lower bound to the Person-time follow-up rate by assuming all those who dropped out never developed event of interest during the time interval we examined. In this case, PY
nodropout
reaches its highest possible value, leading to a lower bound for the follow-up rate. Note in this case PY
nodropout
= PY
potential
so that min η
PTFR
= η
CCI
. The lower bound of the follow-up rate is important because it provides a conservative estimate of the follow-up rate: if the follow-up rate was over-estimated it can lead to over-optimism on the quality of the follow-up.
A simplified method to estimate the person-time follow-up rate (SPT)
The need to estimate the event rate for the purpose of calculating the PTFR can be difficult especially to a non-statistician. Therefore, we also explore a simplified alternative method to allow quick estimation of η
PTFR
without having to estimate the event rate. Our proposed Simplified Person-Time method is a hybrid method including aspects of the Percentage Method and the Person-Time Method. Specifically, as in the Percentage Method, individuals who developed the event of interest during the study are treated the same as individuals who were followed till the end of the study, i.e., they are treated as having contributed complete follow-up since they have already provided complete data regarding the factors associated with becoming a case. Furthermore, as a Person-Time Method, dropouts contribute partial follow-up time in the numerator.
A simple alternative method to calculate the follow-up rate is therefore
$$ {\eta}_{SPT}={\frac{\sum \limits_{i=1}^NI\left({C}_i<\min \left({T}_i,\tau \right)\right){C}_i+I\left({C}_i>\min \left({T}_i,\tau \right)\right)\tau }{N\tau}}^{\ast }100\%. $$
(5)
Therefore, in Fig.
1,
η
SPT
= 66.7% for scenario (A) and
η
SPT
= 93.3% for scenario (B), remarkably close to but slightly overestimate
η
PTFR
, the slight overestimation is because events are given the full length of follow-up in this method. It can be shown that
$$ {\eta}_{PTFR}-{\eta}_{SPT}\le \frac{\sum \limits_{i=1}^NI\left({C}_i>{W}_i\right)\left({W}_i-\tau \right)}{\sum \limits_{i=1}^N{W}_i}\le 0. $$
Figure
1 also indicated that
η
CCI
and
η
SPT
together provides a close boundary for
η
PTFR
. In fact, the outcome events can be viewed as competing risk to loss to follow-up and we can therefore use the method in competing risk framework for the computation of cumulative loss to follow-up rate [
29,
30] and then to obtain the subdistribution reverse KM curve.
To revisit the reverse KM survival time, we will instead assign the events to have full follow-up time and then the rate of follow-up over time is no longer affected by the amount and the timing of the events. In Fig.
2, both scenarios (A) and (B) will share the same curve of follow-up rate over time after addressing the competing risk of events. It can be shown mathematically that the area under the curve of this new follow-up rate over time divided by τ is
η
SPT
.
R program for computation of each method is provided in Additional file
1.