Background
Injuries constitute an important cause of morbidity and mortality worldwide, in developing as well as developed countries, and in 2013 injuries accounted for 10% of the global burden of disease [
1]. Although injury rates appear to decline overall, patterns vary widely depending on age, sex, region and time [
1]. To explore associations with underlying causative factors it is essential that injury rates are quantified in different populations. Reliable risk estimates are also needed to develop suitable preventive measures and to establish efficient procedures for handling injuries in health care systems.
In large parts of the world it is difficult to carry out prospective studies of injury incidence in well-defined populations. Epidemiological investigations of injuries must frequently rely on data collected retrospectively in surveys dealing with incidents in particular time intervals in the past. It is well-known that memory decay affects data collected in this manner when the time intervals extend over more than a few months [
2‐
7]. It is largely unknown, however, how memory recall depends on the actual time span between data collection and the relevant period in the past. To a major extent it is also uncertain whether memory decay differs essentially between populations or between groups defined by demographic and social factors [
5‐
7].
Most retrospective studies of memory decay have compared apparent injury rates found by subdividing the range for the time between data collection and injury into a certain number of intervals. In several cases rather few and wide intervals have been compared [
2,
7,
8]. This may have been necessary because of the structure of the available data, but in principle such procedures represent suboptimal use of information. An alternative approach is to describe memory recall considering a specific mathematical model of the relationship with time between injury and data collection [
3,
9‐
11]. Such models can be fitted to the data by general regression techniques to obtain a more detailed description of the memory decay process. Until now, however, these techniques have not incorporated an assessment of the relationships with other factors affecting injury rates.
The present study will explore these issues by modelling the magnitude of memory decay as a function of the amount of time before information is collected, considering retrospective data from a household survey of injuries carried out in Khartoum State, Sudan. The primary objective is to demonstrate how a relatively simple mathematical relationship between memory decay and time can be established by standard epidemiological procedures. The purpose is also to show how such techniques can be used to investigate whether the relationship depends on demographic and social factors or on injury cause. Finally, implications for the overall estimation of injury rates are considered. The data set has previously been used to explore associations between injury rates and potential risk factors in a Sudanese context [
12], to study socioeconomic implications of injuries [
13] and to examine use of health services by injured people [
14].
Discussion
This paper has shown how a simple parametric statistical model may be used for assessing the effect of relevant factors on memory loss, depending on the time since an injury occurred. In the data from Khartoum State a basic exponential model, corresponding to a constant rate of memory loss [
22], did not suffice for describing the overall relationship. Both in the complete data set and in the particular lower socioeconomic tertile, a model involving an exponential function with a quadratic expression in time was needed, although basic exponential models were still adequate in the other tertiles. The general model was also used for exploring memory decay after injuries due to specific causes, suggesting that relationships with time may differ, with a slower memory decay after road traffic injuries.
In some studies of memory decay after injuries, another external data source has been available, providing information about all or nearly all injuries that could potentially be reported [
2,
8,
23‐
25]. The statistical analysis can then proceed in a different manner, with direct estimation of reporting probabilities. In studies based on comparison of rates at different times, various mathematical expressions have been introduced to model the relationship between memory decay and time since injury in particular groups [
3,
9‐
11,
26]. These models correspond to the last part of our formulation (1), representing the contribution to the rate of reported injuries made by the time variable. The fundamental assumption that injuries reported from the period immediately before data collection reflect the true state was formulated explicitly by Massey and Gonzalez already in 1976 [
3], and this assumption is underlying the arguments used in various subsequent papers on memory recall [
4,
9]. In some studies the mathematical relationship has been fitted separately to subsets of the data representing particular injury categories [
3,
9,
11,
27]. Yet, to our knowledge, no one has previously considered a joint statistical model for the rate of observed injuries, combining one term representing the true injury rate in the first time period depending on overall risk factors and a second term describing memory decay moving further back in time.
Although Eq. (
1) has a very simple structure, the interpretation of the first term
λ0(
x) as the true injury rate requires that the exponential second term is equal to unity for time
m = 1 , i.e.
t = 0. Which factors should be included in modelling the first term depends on what can realistically be assumed to influence true injury rates. It seems reasonable to investigate whether the coefficients
β
j
in the second term depend on the same factors, although these coefficients have a completely different interpretation, relating to memory decay rather than true injury incidence. Thus the analysis of a particular data set may lead to the conclusion that a certain factor affects memory decay but not true injury rates. The situation is rather unusual in that non-hierarchical statistical models, involving effect modification (or interaction) relating to a particular factor but not main effects of the same factor, may be relevant. In this connection it is essential that main effects are defined with respect to time
t = 0, considering true injury rates, but these are the quantities of primary interest anyhow.
The functional form of mathematical relationships considered in work on memory recall after injuries has largely been inspired by the models introduced by Massey and Gonzalez [
3] for studying overall injury rates in USA on the basis of national surveys. Let
μ(
t) be the true rate of reported injuries at time
t before the data collection, assuming for the moment that dependence on other risk factors is not explicitly taken into account. Moreover, let
$$ \overline{\mu}(t)=\frac{1}{t}{\int}_0^t\mu (w) dw $$
(2)
be the true average rate of reported injuries for the entire time interval of length
t, moving back in time from the data collection. Massey and Gonzalez [
3] considered three alternative models describing the relationship between time and the average rate:
$$ \overline{\mu}(t)=\alpha \exp \left(-\beta {t}^2\right), $$
(3)
$$ \overline{\mu}(t)=\alpha \exp \left(-\beta t\right), $$
(4)
$$ \overline{\mu}(t)=\alpha +\beta t+\gamma {t}^2 $$
(5)
Similar relationships were considered in a more recent study of overall injury rates in USA [
11], supplemented by linear and cubic analogues to the quadratic relationship in Eq. (
5). In a study of occupational injuries in USA [
10], the rate
μ(
t) itself was assumed to follow a relationship with time given by the right hand side of one of the Eqs. (
3), (
4) and (
5) (or a simple linear analogue to (5)). Finally, a linear relationship was considered in another study of occupational injuries in USA [
9].
A clear distinction has not always been made in the literature between the rate
μ(
t) at any moment and the average rate
\( \overline{\mu}(t) \) for the whole time interval of length
t. The focus in some papers [
3,
11] on the average rate appears quite reasonable when the main purpose is to compare the bias introduced by underreporting of injuries considering data from cumulative intervals of different length. This has the disadvantage, however, that corresponding crude estimates of the average rate from various overlapping intervals are not independent [
3]. As in other epidemiological models dealing with changes over time, it seems more natural to formulate the basic mathematical relationships in terms of the rate itself. Furthermore, adoption of expressions such as those appearing on the right hand sides of Eqs. (
3) or (
4) for either
μ(
t) or
\( \overline{\mu}(t) \) is not consistent with similar expressions being valid for the other quantity, as may be seen from Eq. (
2). If a polynomial is used as on the right hand side of Eq. (
5), the corresponding expression for the other quantity is still a polynomial of the same degree, but the relative magnitude of the coefficients is different.
Replacing the average rate on the left hand sides of Eqs. (
3) and (
4) with the rate
μ(
t), the right hand sides constitute particular cases of the corresponding relationship with time given by our formulation (1). Expression (4) then represents standard exponential memory decay, but if a more complex model is needed, it seems natural to consider general polynomials in the exponent as in (1), not just a single second degree term in time as in (3). The expression on the right hand side of (3) imposes the condition on the relationship with time that the first derivative vanishes when
t = 0 and the second derivative is negative for small time values, properties which cannot be taken for granted. Pure polynomial models as given by the right hand side of Eq. (
5) have the general disadvantage that they may lead to predicted negative values of the injury rates. With relatively little memory decay over the time span considered [
9], this is not necessarily a major problem, but in a study such as ours it would be.
In the present data set, the rate of reported injuries declined substantially moving back more than a few months from the time of interview. It appears that the great majority of the injuries occurring at least 5 months ago were not reported. This is consistent with recall studies of injuries carried out in other African countries as Ghana [
5] and Tanzania [
6] or in the overall Sudanese population [
7]. However, the particular problems relating to injuries reported to occur almost a year before data collection appear to be considerably more pronounced in the current study. A weak tendency to higher rates for recall periods approaching 12 months was seen in Tanzania [
6] but not in Ghana [
5]. Although some degree of general imprecise specification of injury dates may have affected our observations, the high rate of reported injuries in month 12 seems to require a different explanation. It is likely that forward telescoping, the tendency to report incidents as if they had occurred more recently than they actually did [
28], may have affected injuries occurring before the 12 month period covered here. If this is correct, a major part of the injuries assigned to month 12 do not really belong to the time range of our study. The observation that the apparent elevated rates in month 12 were not strongly associated with other factors lends some support to this idea.
This study found no major differences in memory decay between demographic groups except among socioeconomic categories. Few studies have shown definite differences of this kind. With a classification indicating whether injuries were reported to occur earlier than 4 weeks before data collection or not, the study at the national level in Sudan [
7] found less memory decay with increasing age. In the study from Tanzania [
6] more memory loss over time was found in rural areas, but no essential difference was seen between urban areas in the narrow sense and periurban areas, a comparison more similar to the one carried out in the present study. No major differences were found in Ghana [
5]. In recall studies in the general population of USA [
3] and in occupational studies [
9,
10] faster memory decay has been indicated for certain groups of young people.
The recall data from the lower socioeconomic tertile in this study gave quite a different impression from data in the middle and upper tertiles. It is reasonable to consider the faster memory decay in the lower tertile in connection with the higher rate of injuries reported in this group in month 1. It is certainly possible that this group experienced more memory loss in the first few months after an injury and that the true injury rate was elevated. A simple explanation could be that low socioeconomic status was associated with more instability in society, affecting both true injury risk and survey response. Another possibility is that members of this group had different ideas about incidents that should be reported. If some respondents tended to include more very minor recent incidents, such cases might be more easily ignored after a couple of months. A third possibility is that forward telescoping was a more serious problem for short time intervals in this category. No major contrast in recall was found between groups defined by a wealth index in the national study in Sudan [
7].
Comparing separate causes, the national Sudanese study [
7] found least memory decay for injuries caused by falls. This is consistent with the relatively slow memory decay seen for falls in the present study. However, most differences in memory decay according to causes were not very large in the national study, and road traffic crashes did not display an obviously different pattern from other causes. In our study, memories of recent road traffic injuries may have been subject to backward telescoping [
28], leading to relatively few injuries being reported very close to data collection. Most other studies did not consider memory decay according to causes in a similar manner, although one national study in USA [
3] found the slowest rate of memory decay for injuries caused by moving motor vehicles, with falls also exhibiting relatively slow decay. Several studies [
4‐
6,
9] have shown much slower memory decay for serious injuries. In the present study, classification according to severity was based on days of normal activity lost, and for many injuries this information was not yet available at data collection, so a similar classification according to recall was not feasible.
The decline in crude rates with longer recall periods in our study clearly illustrates the problem of using overall estimates based on values collected over long periods. A 3 month period is often considered reasonable for obtaining fairly reliable estimates [
4‐
6] but even in that case our crude estimate was 22% lower than the estimate for a 1 month period. On the other hand, sampling errors increase with very short recall periods, and potential forward telescoping in intervals close to data collection may be another reason why extremely short recall periods should be avoided. To compensate for the sampling error associated with estimates from a short period before data collection only, overall rate estimates have in some recall studies [
9,
10,
26] been based on predicted values from a regression model. This value is computed at the lower end of the relevant range, representing a time value close to data collection. Such procedures are likely to introduce more stability in estimates, in particular if memory decay is relatively modest. In our data, the predicted overall rates found by this procedure were of the same magnitude as the crude estimates.
However, the relative rates among socioeconomic tertiles were affected by the adjustment carried out in the model-based approach. Use of separate quadratic relationships in the distinct socioeconomic groups led to a larger relative risk estimate for the lower vs. the middle tertile, testifying that use of the correct model with effect modification by socioeconomic status may be important. For factors not influencing memory decay to any appreciable extent, our model-based relative rate estimates were rather similar to those found in previous analyses of the same data in the large urban stratum [
12], ignoring memory decay. Other studies investigating the effect on relative rate estimates of taking memory decay into account [
9,
10] also found a considerably smaller change in relative than in absolute rates.
In a survey-based study such as the present one, the response from any particular participant may be subject to errors of several kinds [
28]. Some injuries may not be reported for various reasons, some incidents reported may not have occurred, and the injury time recorded may be biased in either direction. The data set considered in this study, with its background from Khartoum State, offers unique possibilities, but it is not extremely large, especially when attention is confined to population subgroups. Thus it was not practicable to carry out complete analyses of memory decay for separate causes within socioeconomic categories. In interpreting the decline in apparent rates as an expression of memory decay, the important assumption is made that true underlying injury rates are nearly uniform over time. This seems reasonable when the environment is not subject to major seasonal fluctuations.