Background
Permanent nursing home is a common term for a staffed residence for individuals who are unable to take care of themselves due to for example immobility and severe health problems [
1] In a number of countries, including Denmark, permanent nursing home residents are the most frail and ill of the elderly [
1‐
3] with a high prevalence of multimorbidity, cognitive impairment and functional limitation [
3]. Hence, permanent nursing home placement may be a determining factor or an outcome in epidemiological studies [
4‐
7]. Residency at a permanent nursing home may also be an important confounder to consider, and nursing home admission is already being used as a confounding variable in many research studies in the field of public health and medicine [
5,
8]. In one study it is used to find place of death [
4], in another study the association between subjective memory complaints and nursing home placement is investigated [
5], additionally another paper looks at factors contributing to mortality of nursing home inhabitants [
7]. Gonzales-Colaco et al were interested in the cognitive decline after nursing home admission [
9]. Further, reviews have addressed nursing home residence as a relevant outcome/proxy [
8,
10,
11].
Denmark is well known for their comprehensive administrative registers that can be linked to other registers and bio-banks by using a personal identification number [
12]. Based on several assumptions the administratively collected data is often used to calculate other variables, which may have high face validity, but often a proper validation is lacking. In Denmark, the national authority for statistical data, Statistics Denmark (StatD), identifies individuals living in permanent nursing homes according to one of two indirect methods: 1) for municipalities reporting data about home care, persons living in nursing homes were identified by individuals who had received home care in nursing homes (and not in their own home); 2) for non-reporting municipalities, persons living in nursing homes were identified by combining the address of the individual with addresses expected to be nursing homes (based on an algorithm).
For epidemiological studies, it is important to know the validity of such indirect methods. To our knowledge, no validation studies have yet been published. Hence, we aimed to validate StatD´s register of permanent nursing home residency by using the administrative data from the municipality regarding permanent nursing home residents as a gold standard.
Discussion
The validation of StatD’s permanent nursing home register showed that 85% of the individuals identified by the municipalities as residents of nursing homes (gold standard) were labeled as such by StatD. The PPV was 0.53, i.e. only 53% of those identified by StatD where registered as residents in nursing homes in the dedicated records of the municipalities. The accuracy in terms of sensitivity and PPV of StatD’s permanent nursing home algorithm was lower in the municipalities where address lists of nursing homes were created from an algorithm (method 2) as compared to municipalities who had provided information about home care for the individuals (method 1). The difference was, however, rather small. We found a high variability between the municipalities, which might be explained by the fact that municipalities have different procedures of nursing home residents, and perhaps some of the used methods have a problem in the interface to StatD.
Sensitivity, which is independent of the population studied, may be useful for validating a register but PPV is more useful on an individual level. PPV indicates how accurate the register is for a specific individual found to be a nursing home resident, and is dependent on the prevalence residency. We did not report specificity or negative predictive value (NPV) in this study since most inhabitants of a municipality do not live in a nursing home. Therefore, specificity and NPV are close to 100% and are not relevant to consider as a quality indicator of StatD’s register.
One potential limitation of our study is that our gold standard is also a “register” that has not yet been validated. However, administrative data like our municipality gold standard data are generally considered as valid data (7,8). The municipalities have a strong incentive to register all nursing home residents to collect payment from citizens and receive reimbursement from the government for individuals living in permanent nursing homes. Registers where an reimbursement is involved are often considered useful for research purposes [
13]. Therefore we hypothesize that the gold standard used in this study, is trustworthy. Further, the best available administrative registers often have to be used as gold standard to validate other registers, as for example in the study by Guldberg et al, where the Danish Urogynecological database is validated, with the Danish National Patient Registry as gold standard [
14].
An additional limitation could be that we only examined permanent nursing home residents on a specific day (the 1st of January) and therefore the prevalence may not be representative of any other day of the year. However, the scope of our paper was not to examine the prevalence but to validate StatD’s nursing home register using dedicated administrative municipality records on individual nursing home residents as gold standard.
Danish nursing homes were established before the law of general homes for elderly was implemented, and after the year 1987, a differentiation between nursing residents and general dwellings for elderly emerged. Nursing dwellings replaced nursing homes; however, no difference exists between care-taking or nursing. Individuals are admitted to a nursing home or a nursing dwelling depending on availability [
15]. In this paper both nursing homes and nursing dwellings are therefore labeled as permanent nursing homes.
Several factors may explain the misclassification found in StatD’s register. The gold standard applied in our study is based on data about residency from municipalities that are responsible for permanent nursing homes and data is used for sending bills to citizens. Such data is likely to identify residency on an exact day and can be used to calculate a prevalence of persons dwelling in nursing homes.
The StatD methods may have misclassifications for several reasons. The method based on provision of home care may include short term nursing home rehabilitation units and not solely permanent nursing home residents. Furthermore, a misclassification of place for provision of care has few direct consequences for municipalities and citizens and may therefore not be corrected. The method based on combination of addresses can give misclassification of residency due to possible errors in one or both addresses. The list of nursing home addresses based on the StatD algorithm may provide some misclassification in situations where the elderly keep their original home address when moving into a nursing home facility [
16].
We have found no other studies that have tried to validate nursing home residence status data that is based on addresses. In a study validating The Danish National Patient Registry, the register was found to be a valuable tool for epidemiological research, but not without considering strengths and limitations [
17]. The same national register was validated by Mason et al. They found a low completeness, which without precaution, could lead to bias [
18]. Our results are in line with these other studies, since we found a rather low sensitivity and PPV. Other national Danish registers have been validated as well, as for example Uggerby et al, investigating the validity of Schizophrenia diagnosis in the Danish Psychiatric Central Research Register, which they found to be well-suited for research [
19]. Lykke Petri et al validated specific data in the Danish Gynecological Cancer Database and found it sufficient for quality monitoring [
20]. Another study validated variables in the National Clinical Thyroid Cancer Database, also finding it reliable to use for research at a national level [
21].
Denmark has some of the most comprehensive registers in the world, and many are hosted and maintained by StatD. However, changes in the organization and provision of health services can be factors that affect some registers and their completeness. Furthermore, different definitions of variables are used by different registers indicating that the ability to make cross-overs between registers is very important. Moreover, changing of the codes used to register and the coding practices may also have an effect on the validity of the StatD’s permanent nursing home register [
22,
23].
Permanent nursing home residence is already used as a confounding variable in many research studies [
5,
8,
9]. In observational studies, administrative data can be used as a confounding variable or proxy for frailty which is difficult to measure in other registers [
24]. Previous studies using this algorithm to identify permanent nursing home residents might have made an overestimation, meaning that the impact of what they examined had less impact than reported. For example, if one wanted to investigate if a specific diagnosis led to permanent nursing home residency, an over- estimation of the truth might have been made. Further, if there are subgroups in the data used in a study, where the accuracy between the groups are different (like the variability between municipalities in our study), the estimates of effect when nursing home placement is used as an outcome could be biased. Therefore, it would be necessary to adjust according to municipality. Additionally, regional difference may be biased. Consequently, a valid algorithm for nursing home status is of importance in epidemiologic surveys.
Implications for research: Due to our results with a low PPV, we can conclude that validation studies are important for the accuracy in studies involving register even in countries like Denmark having comprehensive registers.
As implications for future use we suggest that one needs to be careful in the interpretation of StatD´s nursing home variable, especially since it´s accuracy varies tremendously between the different municipalities.
The validity of the variable can be improved by a direct use of the municipalities´ registers by StatD,