Data sources and management
This study was confined to deaths of people aged 5 years and older, for which HIV death is common and often misclassified. DR data from 1996-2009 were obtained from the Bureau of Policy and Strategy database, Ministry of Public Health. The 2005 VA study was conducted by the SPICE project, and included a sample of 9,644 deaths (3,316 in-hospital and 6,328 outside-hospital) from 28 selected districts in nine provinces of four regions, of which 9,495 were deaths of persons aged 5 years and older.
Table
1 summarizes the cause groups based on VA counts. Accordingly, the chapter-block classifications of ICD-10 codes [
19], consisting of blocks categorized mainly by human organs, were used to create 21 major cause groups for deaths at ages 5 years and older based on the distribution of VA-assessed deaths. For statistical accuracy, groups with small counts (mainly less than 200) were combined into larger groups using medical considerations (apart from septicemia, which received special attention due to over-reporting). The proportion of all deaths represented by these categories varied from 0.8% for septicemia to 11.3% for stroke and 5.4% for HIV deaths (ICD 10 code B20-24), as shown in Table
1.
Table 1
Cause groups based on VA counts
1: TB (A15-19) | 195 | 2.1 |
2: Septicemia (A40-41) | 77 | 0.8 |
3: HIV (B20-24) | 512 | 5.4 |
4: Other Infectious (A, B)
-
| 219 | 2.3 |
5: Liver Cancer (C22) | 500 | 5.3 |
6: Lung Cancer+(C30-39) | 320 | 3.4 |
7: Other Digestive Cancer (C15-26
-
) | 290 | 3.1 |
8: Other Cancer (C
-
, D0-48) | 697 | 7.3 |
9: Endocrine (E) | 647 | 6.8 |
10: Mental, Nervous (F, G) | 223 | 2.3 |
11: Ischemic (I20-25) | 617 | 6.5 |
12: Stroke (I60-69) | 1,076 | 11.3 |
13: Other CVD (I
-
) | 540 | 5.7 |
14: Respiratory (J) | 801 | 8.4 |
15: Digestive (K) | 489 | 5.2 |
16: Genitourinary (N) | 412 | 4.3 |
17: Ill-defined (R) | 501 | 5.3 |
18: Transport Accident (V) | 536 | 5.6 |
19: Other Injury (W, X0-59) | 327 | 3.4 |
20: Suicide (X60-84) | 158 | 1.7 |
21: All other | 358 | 3.8 |
Total | 9,495 | 100.0 |
Misclassification was not at random. The effects of sex, age, and spatial variables were used to correct misclassification, using logistic regression. For efficiency, the predictors were optimally grouped to obtain sufficient sample size for relatively homogeneous risk groups. Nine provinces were included in the VA study (Bangkok, Nakhon Nayok, Suphan Buri, Ubon Ratchathani, Loei, Phayao, Chiang Rai, Chumphon, and Songkhla). The effects of age for males and females were considered separately (see Results). Sex and age were grouped together into 14 levels (with seven levels of age in years: 5-19, 20-29, 30-39, 40-49, 50-59, 60-69, and 70+).
Similarly, misclassification of cause of death was considered differently for deaths in and outside hospitals. Reported causes of death and location were grouped into 18 levels, which resulted from the combination of two levels of location (in and outside hospital) and nine major causes of death (HIV, respiratory, septicemia, tuberculosis (TB), other infectious, mental and nervous system, digestive, ill-defined, and the remainder, which were aggregated into a single group).
Through logistic regression [
20]-[
22] we estimated the logit of the probability
P that a person died from HIV as a linear function of the determinant factor. The simple logistic regression model with simple cross-referencing is formulated as
(A)
where
P
i
is the probability of death due to HIV,
μ is a constant, and
α
i
is the only parameter of DR cause-location
i. The simple cross-referencing model (A) was compared with the full model (B), which includes an additive linear function of the determinant factors, which could be expressed as
(B)
where P
ijk
is the probability of death due to HIV and α
i
, β
j
and γ
k
are individual parameters specifying DR cause-location group i, sex-age group j and province k, respectively.
We used "sum contrasts" developed by Tongkumchum and McNeil [
23] and Kongchouy and Sampantarak [
24] instead of conventional "treatment contrasts" where the first level is left out from the model to be the reference. This method allows us to compute the estimate and the 95% confidence interval of deaths for each of the covariate levels in the VA and the DR datasets.
To assess the accuracy of model prediction, the Receiver Operating Characteristic (ROC) curve from logistic regression was drawn based on a concept described by Chongsuvivatwong [
22] and Fan et al. [
25]. Area under the ROC curve (AUC) measures the performance of a model and represents model accuracy [
26],[
27]. A cut-off point in the curve, where the predicted number of HIV deaths equals the observed value in the VA dataset (512 cases), was used to report sensitivity and specificity of the model. These were compared with results from the simple cross-referencing method.
Estimation of HIV mortality
For the nine study provinces, fitting the complete logistic regression model to the 2005 VA dataset resulted in nine province coefficients, 14 sex-age group coefficients, and 18 DR cause-location coefficients and the estimate of HIV deaths and 95% confidence intervals.
For the remaining 67 provinces, we used a simple and easily implemented spatial "triangulation method" [
28],[
29] to interpolate province coefficients. This was preferred to the "kriging" method because it uses fewer points than kriging, and there were insufficient sample provinces (only nine) to provide the basis for kriging [
30].
Triangles were drawn linking nine provinces in the 2005 VA study. The values of province coefficients in each triangle were assigned as an average of coefficients from nearby provinces in the model. For each triangle, values
a, b and
c were obtained by solving three equations using linear algebra based on latitude and longitude as follows.
(C)
(D)
(E)
(Note: P = Province, β = coefficient)
The coefficient for any province
j within a triangle could then be given by
(F)
Coefficients for provinces outside triangles were obtained similarly by extrapolation from nearby provinces. Province coefficients for all provinces were thus obtained and the magnitude of HIV deaths estimated.
R program version 2.15.2 [
31] was used for all statistical analysis and graphical displays.