Population and sero-survey
Details of the study have been described elsewhere [
16]. Briefly, after conducting a census in December 1994, the village was divided into 12 geographic sectors. Every 15
th house was then systematically sampled for participation in a sero-survey. Of a total of 172 sampled houses, 168 (98%) household heads agreed to participate. Between January and June 1995, after obtaining written informed consent, we conducted a sero-survey and administered a questionnaire to the study participants, soliciting medical and behavioral risk factors for liver disease. History of schistosomiasis was obtained by interview in which participants were asked whether they had ever received a diagnosis of schistosomiasis. Regardless of their histories, all participants were further queried about past treatment for schistosomiasis and whether the treatment was with oral drugs, injections or both. Approximately 7 ml of blood was then obtained via a venipuncture from each participant. Persons under 10 years of age were excluded from the sero-survey because earlier work in Kalama demonstrated HCV infection to be rare in this age group [
12]. Baseline socio-demographic information was collected at the time of the census. Household socio-economic status was classified as high, medium or low according to a locally developed scale. Details of the laboratory methods used to detect anti-HCV antibodies have been described elsewhere [
16]. Briefly, serum from the blood samples was screened for anti-HCV antibodies by EIA (Abbot Laboratories). All EIA-reactive sera were evaluated with either the second or third generation recombinant immunoblot assays (Chiron Corporation) as confirmatory tests. Positive results by either of the immunoblot assays were considered to be HCV positive (positive for anti-HCV antibodies).
Statistical methods
For baseline comparisons of the HCV positive and negative individuals, we used the chi-square test (or Fisher's Exact test when there was sparse data) for categorical variables. For dimensional variables, the t-test was used to compare the two groups when the data was normally distributed, and the Mann-Whitney-U test, when assumptions of normality were not met.
To test for intra- and inter-class clustering of HCV infections, we used a modified formulation of the Generalized Estimating Equations (GEE). The original formulation of GEE also referred to as GEE1 [
17] is used in the analysis of correlated data in which clustering is treated as a nuisance factor that needs to be taken into account at the analysis stage. This approach uses a working correlation matrix to model the correlation between the repeated or related observations for an individual or a cluster while simultaneously adjusting for covariates in a generalized linear model. With the new formulation, also referred to as GEE2 or Alternating Logistic Regression (ALR), clustering can be treated as a parameter of interest on which we can perform hypothesis testing [
18‐
20]. Instead of specifying a correlation matrix, an OR is computed for each cluster in this approach to detect clustering among related observations. Since the outcome of our study is binary (HCV positive or negative), we describe the approach for binary outcomes.
For a family of size n, if Yj is the HCV status (0 = negative, 1 = positive) of the jth individual, we define
log [Pr(Yj = 1)/Pr(Yj = 0)] = β0 + β1x1j+ β2x2j + ...... + + βmxmj (1)
where the x's are m covariates for the jth individual. For most analyses, the x's are variables that describe the baseline characteristics or are confounding variables. These variables could be specific to the jth individual (such as age and gender) or could be common for the entire family (e.g. race or socio-economic status of the family). The β's are the regression parameters to be estimated. We next define the odds ratio (OR) between the jth and kth members of the family as
The odds ratio is a commonly used measure of association for categorical variables and assumes the values between zero and infinity. An odds ratio equal to one suggests the lack of an association between individuals. In equation (2), an OR = 1 for each of the
NC
2 (number of ways in which we can select 2 out of N) pairs of members within a house indicates the absence of clustering of disease within the family. On the other hand, an odds ratio greater than one will indicate clustering of disease within the families. It should be noted that the OR defined in (2) is not the same OR that one conventionally uses to measure the association between exposure and disease in case-control or observational studies. The OR in (2) describes the odds of disease or no disease for another member in the same cluster, given that a member is affected or unaffected. We henceforth refer to this OR as 'OR for clustering'. Such ORs are commonly used to describe familial aggregation of disease in genetic studies [
21]. The OR can be modeled for subclasses or sub-clusters within the main cluster as
log ORjk = γz (3)
Where z indicates the class membership of the (j, k) pair within the house. For example, if we consider adults within a house to be members of class 'a' and children to be members of class 'c', then log(OR
jk) = γ
aa, γ
cc, or γ
ac depending on whether the (j, k) pair are adults, children, or an adult-child pair. From equation (3) we thus obtain two intra-class ORs namely e
γaa and e
γcc for adults and children respectively and one inter-class OR namely e
γac between adults and children. A significant γ
aa or γ
cc is suggestive of clustering within adults or within children, whereas a significant γ
ac is indicative of the existence of a correlation between adults and children. When z = 1, there is a single class and the equation models a constant log odds across all the clusters. Estimates for β's and γ's are obtained by simultaneous estimation of the parameters from equations (
1) and (3), thereby adjusting the ORs for covariates introduced in (1). Further details on the formulation, estimation methods, features of the model, and properties of the estimators have been published elsewhere [
18‐
20].
To model sub-groups of high and low risk groups within houses, we chose an age cutoff of 15 years based on an examination of the age-specific prevalence within our study and based on reports stating that parenteral treatment of schistosomiasis was stopped between 1982 and 1986, with the introduction of praziquintal, an oral drug used to treat the disease [
22]. Further, children under 5 years of age were not eligible for receiving injections during the campaigns. Since children born in 1981 would have been 14 years of age at the time of our 1995 HCV study, it can be safely assumed that they were not present at the time of the schistosomiasis campaigns. This age cutoff has also been used in other studies [
13].
We conducted our analysis using the ALR implementation of GEE in SAS V8.02 (SAS Institute Inc., Cary, NC). Since we assume that the parenteral treatment for schistosomiasis in the campaigns for the entire house occurred around the same time, to model the correlation between the members of the house, ordering of individuals within the house (cluster) is not relevant. Hence, to model the OR for clustering of schistosmiasis treatment in the house, an exchangeable log odds ratio was assumed. With an exchangeable log odds ratio, a constant OR for clustering is assumed for all houses. To estimate the degree of clustering of HCV infections, a history of parenteral treatment of schistosomiasis in the house was considered to be a blocking factor. Separate log odds ratios were estimated for each block with the assumption that the log odds ratios are constant within each block.