The 45 and Up study
The Sax Institute’s 45 and Up Study is a population-based sample from the state of New South Wales (NSW), Australia. Extensive demographic and social characteristics, personal health behaviour and general health-related data on individuals are collected. This provides researchers with reliable information on a wide range of exposures and outcomes of public health. The 45 and Up Study as a research resource will also give government the tools for evidence-based policy making to support healthy ageing.
Prospective participants were randomly sampled from the Department of Human Services (formerly Medicare Australia) enrolment database which provides a near complete coverage of the population. The study oversampled individuals from rural areas and those aged 80 years and over. Participants consented to regular follow-up and linkage of their survey data to a range of health databases. Recruitment commenced in February 2006 and the full cohort of size 267,157 reached by December 2009. The response rate to the 45 and Up Study is about 18% and participants included about 11% of the NSW population aged 45 years and over. Detailed description of the 45 and Up Study can be found in [
25].
The first follow-up of participants began in 2012 with 41,440 45 and Up Study participants invited to respond. Of these, 27,036 returned the follow-up questionnaire, resulting in a response rate of 65.2%. After excluding individuals with missing values for baseline covariates, 32,037 individuals were included in this analysis with 21,750 of these being responders to the follow-up questionnaire.
Data collection and variables
The 45 and Up Study baseline and follow-up questionnaires include demographic data such as age, postcode of residence, education, country of birth and type of housing, lifestyle factors including physical functional capacity, self-rated health condition and social support and marital status, employment status and household income. To explore the impact of nonresponse on measures of association, we focus on an outcome related to dwelling-type change between baseline and the follow-up. The 45 and Up Study questionnaires ask respondents to describe their dwelling type as belonging to one of eight categories: house, flat/unit/apartment, house on farm, retirement village/self-care unit, nursing home, hostel for the aged, mobile home and other. Due to low counts in some categories of these variables, house and house on farm; retirement village, nursing home and hostel for the aged; mobile and other are combined. Similarly, outer regional, remote and very remote Accessibility/Remoteness Index of Australia (ARIA) categories are combined. Physical functional limitation was assessed using the RAND 36-Item Health Survey, Version 1.0, subscale. The subscale was scored as recommended in ‘Scoring Instructions for MOS 36-Item Short Form Survey Instrument (SF-36) [
26]. Social connectedness was assessed using the Duke Social Support Index (DSSI) subscale and scored as recommended by [
27]. As per Phongsavan et al. [
28], due to the positively skewed distribution of the social connectedness scores, this variable was transformed into quartiles.
In this paper, the outcome of interest was change in dwelling type between surveys, which was assessed by comparing responses to the relevant questions between the baseline and the follow-up surveys. To gain a better understanding of the particular type of housing transition, we focus on the case where the binary outcome represents transition into retirement village/nursing home/hostel for the aged, limited to those 45 and Up participants who were not in these categories at baseline. For ease of exposition, we subsequently refer to the outcome of interest as “transition to residential aged care”, or sometimes simply “transition”. It is important to explore various demographics, socio-economic and health factors associated with this transition as the findings of this study provide useful insights into relocation behaviour as people age and implications for aging and housing policy and age care provision.
Statistical analyses
We will consider two statistical methods to adjust for nonresponse: inverse probability weighting using a propensity score and a Bayesian selection model. Both of these methods require a formulation of the model which predicts the probability of responding, given a set of observed covariates. As discussed below, the Bayesian selection model also allows for the possibility that missingness might depend on the unobserved response variable. We first conduct univariate chi-squared tests of association, to identify significant differences between responders and nonresponders in terms of demographic characteristics (age, gender, education qualification, country of birth, area remoteness), wellbeing (self-rated health, level of mobility, social support), household income, carer status, marital status and dwelling type at baseline.
The propensity score, as defined in Little [
29], is the conditional probability that an individual responds, given a set of covariates. Following convention, we use a multivariable logistic regression model with response to the follow-up as the outcome variable to estimate propensity scores. Those variables with significant univariate associations with response status were included in a multivariable logistic regression. Variables with
p-values > 0.05 were removed from the model in a stepwise fashion. The likelihood ratio test and model comparison using Akaike’s Information Criterion were used for variable selection. The final propensity score model comprises of the variables included in Table
1. These also formed the basis of the modelling for the nonresponse probability in the selection model described below.
Table 1
Characteristics of 45 and Up participants according to response to follow-up survey
Gender |
Male | 157 (1.7) | 9232 (66.5) | 13888 | Ref |
Female | 190 (1.7) | 11498 (68.0) | 16911 | 1.11 (1.05–1.17) |
Age (yrs) |
45–54 | 6 (0.1) | 6577 (66.0) | 9960 | Ref |
55–64 | 59 (0.7) | 8696 (70.4) | 12350 | 1.22 (1.15–1.30) |
65–74 | 132 (3.4) | 3920 (67.5) | 5805 | 1.12 (1.02–1.23) |
75+ | 150 (9.8) | 1537 (57.3) | 2684 | 0.81 (0.72–0.91) |
Highest qualification |
None | 34 (2.4) | 1421(52.1) | 2725 | Ref |
Year 10 | 82 (2.1) | 3950 (62.8) | 6294 | 1.25 (1.14–1.37) |
Year 12 | 29 (1.4) | 2019 (63.7) | 3171 | 1.45 (1.30–1.62) |
Trade | 35 (1.9) | 1855 (62.7) | 2957 | 1.34 (1.20–1.50) |
Cert./diploma | 86 (1.7) | 5042 (70.8) | 7125 | 1.78 (1.62–1.96) |
Tertiary | 81 (1.3) | 6443 (75.6) | 8527 | 2.29 (2.07–2.53) |
Area remoteness |
Major cities | 192 (1.8) | 10885 (66.1) | 16460 | Ref |
Inner regional | 135 (1.7) | 7831(69.0) | 11352 | 1.06 (1.00–1.12) |
Outer regional/remote/very remote | 20 (1.0) | 2014 (67.4) | 2987 | 1.06 (0.97–1.16) |
Country of birth |
Australia | 261 (1.6) | 16237 (69.1) | 23492 | Ref |
NW Europe | 65 (2.5) | 2652 (68.8) | 3857 | 0.97 (0.89–1.04) |
S & E Europe | 5 (1.4) | 356 (48.8) | 730 | 0.67 (0.56–0.79) |
Middle East | 0 (0.0) | 106 (42.6) | 249 | 0.50 (0.38–0.66) |
SE Asia | 4 (2.0) | 204 (44.9) | 454 | 0.47 (0.38–0.58) |
NE Asia | 0 (0.0) | 168 (46.9) | 358 | 0.56 (0.44–0.70) |
S & Central Asia | 0 (0.0) | 102 (50.2) | 203 | 0.50 (0.37–0.67) |
America | 1 (0.4) | 223 (60.4) | 369 | 0.62 (0.50–0.78) |
Sub Saharan Africa | 4 (2.3) | 177 (58.8) | 301 | 0.57 (0.45–0.72) |
Oceania | 7 (1.4) | 491 (64.0) | 767 | 0.76 (0.65–0.89) |
Speak a language other than English at home |
No | 329 (1.7) | 19453 (68.9) | 28234 | Ref |
Yes | 18 (1.4) | 1277 (49.8) | 2565 | 0.68(0.61–0.76) |
Marital status |
Single | 19 (1.9) | 1013 (63.2) | 1604 | Ref |
Married/de facto | 252 (1.5) | 16368 (68.5) | 23908 | 1.13 (1.01–1.27) |
Widowed/divorced/separated | 76 (2.3) | 3349 (63.3) | 5287 | 1.01 (0.89–1.14) |
Work status |
FT/self-employed | 21 (0.2) | 8996 (68.5) | 13126 | Ref |
PT | 20 (0.6) | 3148 (71.2) | 4422 | 1.18 (1.08–1.27) |
Fully retired | 276 (4.5) | 6162 (66.7) | 9243 | 1.34 (1.23–1.46) |
Partially retired | 13 (1.5) | 865 (74.3) | 1164 | 1.35 (1.17–1.55) |
Disabled/sick | 3 (0.7) | 442 (51.3) | 862 | 1.07 (0.91–1.26) |
Unemployed/look after home/study/unpaid | 14 (1.3) | 1117 (56.4) | 1982 | 0.86 (0.77–0.96) |
Income category |
< $20,000 | 101 (3.4) | 2951 (58.7) | 5027 | Ref |
$20,000–$40,000 | 97 (2.6) | 3696 (67.1) | 5512 | 1.11 (1.02–1.21) |
$40,000–$70,000 | 61 (1.3) | 4598 (70.6) | 6517 | 1.23 (1.12–1.34) |
> $70,000 | 34 (0.5) | 6890 (73.3) | 9394 | 1.25 (1.13–1.38) |
Prefer not to answer | 54 (2.1) | 2595 (59.7) | 4349 | 0.84 (0.77–0.92) |
Dwelling type |
House/house on farm | 272 (1.5) | 18419 (68.0) | 27086 | Ref |
Flat/unit/apart. | 64 (3.2) | 2030 (62.8) | 3231 | 0.92 (0.85–1.00) |
Mobile home/other | 11 (3.9) | 281 (58.3) | 482 | 0.86 (0.71–1.04) |
Carer status |
No | 296 (1.6) | 18427 (67.8) | 27178 | Ref |
Yes | 51 (2.2) | 2303 (63.6) | 3621 | 0.88 (0.82–0.95) |
Self-rated health |
Excellent | 37 (1.0) | 3893 (74.6) | 5219 | Ref |
Very good | 116 (1.4) | 8420 (70.9) | 11876 | 0.88 (0.82–0.95) |
Good | 133 (2.1) | 6436 (64.5) | 9984 | 0.74 (0.68–0.80) |
Fair | 52 (3.0) | 1729 (54.5) | 3170 | 0.58 (0.52–0.65) |
Poor | 9 (3.6) | 252 (45.8) | 550 | 0.53 (0.43–0.65) |
Functional limitation (fl) |
No fl | 163 (1.0) | 15710 (69.8) | 22510 | Ref |
Slight fl | 95 (3.4) | 2802 (65.8) | 4261 | 1.04 (0.96–1.12) |
Moderate fl | 35 (3.1) | 1119 (59.2) | 1891 | 0.94 (0.85–1.05) |
Significant fl | 35 (5.1) | 686 (54.1) | 1267 | 0.81 (0.71–0.92) |
Severe fl | 19 (4.6) | 413 (47.5) | 870 | 0.72 (0.61–0.84) |
DSSI in quartiles |
1 | 179 (2.1) | 8383 (69.6) | 12039 | Ref |
2 | 62 (1.2) | 5063 (68.2) | 7428 | 0.98 (0.92–1.05) |
3 | 50 (1.4) | 3611 (66.3) | 5447 | 0.96 (0.90–1.03) |
4 | 56 (1.5) | 3673 (62.4) | 5885 | 0.89 (0.83–0.96) |
The estimated probability of responding, or the propensity score, derived from the multivariable logistic regression model described above was used to obtain a probability weight for each individual. For the responders, this weight is simply the inverse of the propensity score, known as the inverse probability weighting (IPW) [
6]. The goal of this method is to weight individuals with lower propensities for response more heavily than those with higher propensities. The effect is that responders represent themselves and nonresponders who have similar characteristics in order to offset for the missing responses. The IPW approach is valid under a MAR assumption. That is, the probability of responding to the follow-up questionnaire is independent of the outcome, conditional on the set of observed covariates used to compute the weights. This is a strong assumption that asserts, given the observed covariates, those who do not respond behave in similar ways to those who do respond. This assumption is impossible to verify in practice without collecting data on the nonresponders.
For modelling the outcome variable, transition to residential aged care facility, univariate Chi-squared analysis was firstly used to identify statistically significant associations for each variable described above. Those variables with significant associations with the transition were included in a multivariable logistic regression model to further test associations. Possible first-order interaction terms between the following variables were also considered: sex with income, marital status, work status and age group; age group with physical function, health status, country of birth and language spoken at home. Interaction terms were first added to the main effects model one at a time and those with a p-value > 0.05 were dropped from the model. Then we sequentially add those interaction terms with significant p-values and after inclusion of main effects and other interaction terms, those with p-value > 0.05 were dropped to obtain the final model. This complete case analysis was repeated with survey commands that allowed for weighting responders according to their propensity scored derived weights (“complete case with IPW”). Note that other variable selection methods due to shrinkage, such as the Least Absolute Shrinkage and Selection Operator (LASSO) can be applied. The complete case analysis and the complete case with IPW were performed using SAS version 9.3.
If the MAR assumption was violated, then IPW adjustment may not necessarily remove all nonresponse bias. This leads to the missingness mechanism known as “not missing at random” (NMAR) or informative missingness, where the probability of a missing value depends on the value of the variable that is missing. In this case, the missing data mechanism must be specified by the researcher and incorporated into the model in order to obtain unbiased parameter estimates. However, available data contains no information about what would be an appropriate model for the missing data and statistical inference is very sensitive to the choice of such model. This makes sensitivity analysis essential for investigating possible violations of the MAR assumption and exploring the robustness of the study conclusions to increasingly extreme departures from the MAR mechanism ([
24,
30‐
32]).
In this paper, we adopt a selection model approach for NMAR, which consists of two sub-models: one specifies the relationship between the covariates and the outcome of interest and the other represents the missing data process, which is dependent not only on observed covariates, but also the outcome. More specifically, we assume a standard logistic regression for the transition to residential aged care:
$$ \mathrm{logit}\left( P\left({y}_i=1\right)\right)={b}_0+{\displaystyle \sum_{j=1}^k}{b}_j{x}_{j i}, $$
(1)
where
y
i
is the outcome and
x
ji
is the
jth baseline covariate for subject
i. Potential covariates for the outcome, as well as those that may be predictors for nonresponse are detailed in Additional file
1. We then specify a logistic model for missingness as follows:
$$ \mathrm{logit}\left( P\left({m}_i=1\right)\right)={\theta}_0+{\displaystyle \sum_{s=1}^l}{\theta}_s{x}_{s i}+\lambda {y}_i, $$
(2)
where
m
i
is a nonresponse indicator taking a value of 1 if the
i th individual did not respond to the follow-up questionnaire, 0 otherwise. Other viable modelling frameworks for analysing data with informative missingness include pattern mixture models [
33] and shared parameter models [
34].
In the above selection model we assume a linear relationship between the logit of the probability of nonresponse and the outcome. Different values of the parameter
λ posit different assumptions on how strongly the likelihood of nonresponse depends on the outcome. When
λ = 0, we have the MAR case where the probability of nonresponse only depends on observed covariates. This case corresponds exactly to the logistic regression model used to construct IPW weights and it further shows the selection model is an appealing choice as it relates to the propensity score method but model parameter are estimated jointly rather than in a two-stage process. More generally, the parameter
λ is interpreted as the log odds ratio of nonresponse for those who had a dwelling-type change, conditional on all other covariates included in the model. We make the assumption that
λ is nonnegative, that is, the likelihood of nonresponse is higher for those who had a dwelling-type change. This is a plausible assumption since change dwelling is often associated with family-type events such as marriage or birth and work transitions [
35]. Thus, those who had dwelling-type change are more difficult to track in a longitudinal study as well as other sorts of changes in life course, making them less likely to respond to the follow-up survey [
36]. In implementing the selection model, we repeat the analysis for a range of values of
λ and examine the sensitivity of the estimated regression coefficients in the outcome equation across these values. The values we set for
λ are (0, 1, 2, 3). More specifically, these values imply that the odds ratio of nonresponse for individuals with a dwelling-type change (which is transition into aged care facilities) is between 1 and 20 ([
36,
37]). Note that in practice, one could also assign a mildly informative prior distribution to
λ and estimate it jointly with other model parameters.
A full Bayesian probability modelling approach using Markov chain Monte Carlo (MCMC) was used for the selection model, as it was shown that the Bayesian modelling approach provides a flexible way to incorporate different assumptions on the missing data mechanism and enables coherent model estimation ([
24,
38,
39]). We ran the selection model in the WinBUGS software [
38,
40] for 15,000 iterations including 5000 for burn-in. Vague
N(0, 1000) prior distributions were assigned to intercept parameters
b
0 and
θ
0 and all coefficients
b
j
and
θ
s
in equations (1) and (2). Visual inspection of trace plots and autocorrelation plots of MCMC iterations was satisfactory suggesting that all runs achieved convergence.