Estimating the number of people eligible for health service use
Introduction
Utrecht is a Dutch city with 230,000 inhabitants. In this city, a converted city bus is operated where free breakfasts, late suppers and blankets are on offer for homeless people. This facility, called the Tussenbus, is open when other day and night facilities for homeless people are closed. On average, the Tussenbus receives 50–60 visitors in early mornings and 30–40 visitors at night. However, what is not known is the total number of potential clients. Consequently, it is not known how successful the Tussenbus is in reaching homeless people. The objective of this study is to estimate the total number of potential clients and to compare this to the number actually reached. The City Council of Utrecht required this information for their decision to continue or discontinue financing the Tussenbus.
We present this study as a case, because it illustrates in a concise way what can be achieved with a statistical technique called truncated Poisson (tP) modeling. Like other capture–recapture (CRC) techniques (Dewit & Rush, 1996) tP-models can be used to estimate the unknown size of a hidden population such as homeless people, criminals, prostitutes, and drug addicts. This can be done in the absence of a sample-frame, or when a community-based survey would be too costly because the population of interest is relatively small compared to the lager population (e.g. the number of HIV infected people within the general population). CRC and tP-models are, of course, not restricted to homeless people, but can also be useful in fields like criminology (e.g. estimating the number of criminals in a community) and epidemiology (e.g. estimating the number of HIV infected people). Information on the size of a population can be useful for a number of reasons, e.g. unmet needs assessment, allocation of resources, agenda setting, health service performance evaluation. However, most CRC-techniques require two or more samples and come with a series of assumptions that are difficult to meet, whereas tP-models are computationally easy, use only one sample, and are based on relatively few assumptions, most of which can be checked or handled data-analytically. These features of the tP-model will be illustrated and explained in this paper.
There is a whole array of CRC-techniques to estimate the unknown size of a population, see Pollock, 1991, Seber, 1982, Seber, 1986, Seber, 1992 for general reviews, Wickens (1993) for a review with applications to drug-using populations, and Hook and Regal (1995) for epidemiological applications. Usually CRC-techniques require two or more, partially overlapping, preferably independent registers of the population of interest. Other estimators only need data from a single register (see for a review Wilson & Collins, 1992). These estimators are commonly known as tP-models. We will employ one tP-estimator which was proposed by Anne Chao in 1987 and another derived by Daniel Zelterman in 1988. Since these estimators make use of only a single register, they are particularly useful for estimating the size of the clientele of a single service provider.
To our knowledge, tP-estimators have not been used to estimate the size of a homeless population (Darcy and Jones, 1975, Fisher et al., 1994, Koegel et al., 1996, Shaw et al., 1996), but their undemanding data requirements and their relatively relaxed assumptions warrant a much stronger interest. In Section 2, we will describe how the data were collected and how the analysis was carried out. Results will be given and discussed with respect to the underlying assumptions of the tP-model. This paper will be concluded with some notes on lessons that were learned in this and related studies.
Section snippets
Method
During the data-collection week, the number of visits made by each visitor was tallied. To ensure that every visit made by each individual was counted, all visitors were approached. They were approached individually and at a convenient moment and then requested to give their date of birth and first two letters of their surname. In addition, the sex of each visitor was recorded. From the tally, a frequency distribution was obtained of the number of people who were 1,2,…,K time visitors, with the
Results
In the week in which data-collection was carried out we tallied 162 different visitors to the Tussenbus. Of these, 63 were seen once, 20 twice and the remainder were seen three times or more. Under Chao's model, we estimated the total number of the clientele as 261 within a 95% confidence interval ranging from 213 to 356. Under Zelterman's model we obtained est(N)=345 within a 95% CI of 278–455. The confidence intervals show overlap and therefore we conclude that both estimators are in
References (18)
- et al.
Assessing the need for substance abuse services: A critical review of needs assessment models
Evaluation and Program Planning
(1996) Robust estimation in truncated discrete distributions with application to capture–recapture experiments
Journal of Statistical Planning and Inference
(1988)- Bustami, R., Van der Heijden, P., Van Houwelingen, H., Engbersen G. (2001). Point and interval estimation of the...
Estimating the population size for capture–recapture data with unequal catchability
Biometrics
(1987)Estimating population size for sparse data in capture–recapture experiments
Biometrics
(1989)- et al.
The size of the homeless men population of Sydney
Australian Journal of Social Issues
(1975) - et al.
Estimating the numbers of homeless and homeless mentally ill people in north east Westminster by using capture–recapture analysis
British Medical Journal
(1994) - et al.
A comparison of different methods for estimating the prevalence of problematic drug misuse in Great Britain
Addiction
(2001) - et al.
Capture–recapture methods in epidemiology: Methods and limitations
Epidemiologic Reviews
(1995)
Cited by (15)
On the Chao and Zelterman estimators in a binomial mixture model
2015, Statistical MethodologyCitation Excerpt :For instance, a household can serve as a useful unit of disease surveillance, and a binomial mixture model can arise by assuming that the number of disease cases in a household is binomial and that the probability that one person is infected is allowed to vary over households [1,16,14,15,8–10]. There are various epidemiological applications of the binomial mixture model (e.g., [6,18,19]). We will use the nonparametric binomial mixture model.
An extension of Chao's estimator of population size based on the first three capture frequency counts
2011, Computational Statistics and Data AnalysisCitation Excerpt :The origin of capture–recapture modelling goes back to Petersen and Lincoln (Seber, 2002), who used the independent information of two identifying sources or lists to construct an estimator of population size. Capture–recapture models currently tend to be generally applied in a variety of applications including estimation of the size of a human target population, usually defined by a specific disease experiencing potential severe undercount (e.g. Böhning et al., 2004; Corrao et al., 2000; Gallay et al., 2000; Hay et al., 2009; Hook and Regal, 1995; Nardone et al., 2003; Smit et al., 2002; van Hest et al., 2008), as well as estimation of an elusive target population in the social sciences such as illegal gun owners or car drivers without licence (e.g. Carothers, 1973; Chang et al., 1999; Hay, 1997; Hope et al., 2005; van der Heijden et al., 2003a,b). The next result compares the asymptotic biases for the new and Chao’s estimator.
Estimating infectious diseases incidence: Validity of capture-recapture analysis and truncated models for incomplete count data
2008, Epidemiology and Infection