Variables and inclusion criteria of participants
We included men who self-reported their MMC status and had reported being sexually experienced. The main exposure of interest was the MMC status, i.e. whether a participant had MMC or not. Those who reported being uncircumcised or traditionally circumcised (represents partial removal of the foreskin), or did not know their circumcision status, were classified as not having MMC. The two primary outcomes of interest in our analysis were the HIV test result (+ve = 1, −ve = 0) and HSV-2 test result (+ve = 1, −ve = 0). Venous blood samples were tested for HIV antibodies and antigens using the fourth-generation HIV enzyme immunoassays with the bioMérieux Vironostika Uniform II Antigen/Antibody Microelisa system (bioMérieux, Marcy-l’Étoile, France). The HIV 1/2 CombiRoche Elecsys (Germany) (Roche Diagnostics, Penzberg, Germany) and HIV-1 Western Blot Bio-Rad assay (Bio-Rad Laboratories, Redmond, WA, USA) were used to confirm positive samples. For measurement of HSV-2, serum was tested for HSV-2 antibodies via ELISA (HerpeSelect, Focus Diagnostics, Cypress, CA, USA). HIV and HSV-2 status were laboratory-derived; hence, they are not susceptible to self-report bias.
Covariates selected include age (in years), marital status (married, widowed/divorced/separated, single), education (no education, primary/ not completed high school, completed high school, degree/diploma), whether respondent drinks alcohol. Sexual behaviours include the number of lifetime sexual partners (1,2–5, 6+, refused to report), condom usage (always/sometimes, never), and whether the respondent had sex in the last 12 months. At the household level, we assessed total household monthly income (categorized) and whether the household receives social support grant. These variables are epidemiologically plausible or possible confounders for the relationship between MMC status and HIV/HSV-2 outcomes.
There is substantial evidence from the literature regarding the association amongst sexually transmitted infections, including HIV and HSV-2, with other STIs [
18‐
21]. Thus, we further analyzed test results from the following STIs:
Chlamydia trachomatis, Trichomonas vaginalis, syphilis,
Neisseria gonorrhoea, Mycoplasma genitalium and hepatitis B.
From the original 3547 male participants, we removed 692 participants who reported never having had sex. We further excluded participants who had missing values for MMC status (n = 5). Our analytic sample consisted of 2850 male participants.
Statistical analysis
We contrasted the marginally adjusted prevalence ratios of the HIV and HSV-2 outcomes that would be observed for the MMC exposure. In other words, we compared the prevalence for each of the two outcomes, when the men were medically circumcised with not being circumcised. Further, all contrasts were adjusted for important confounders including age, marital status, educational level, number of lifetime sexual partners, condom usage, sexual activity in the last 12 months, condom usage, alcohol use, coinfected with other STI, household monthly income, household receipt of social support grant. We estimated MMC associations with HIV and HSV-2 using TMLE, full matching on the propensity score (PSFM) and inverse probability of treatment weighting (IPTW).
Traditional circumcision is probably not equivalent from an STI risk perspective to being uncircumcised. However, in our analytic sample, the traditionally circumcised men were a small minority (4.5%) and had the same HIV and HSV-2 prevalence as the uncircumcised men. We conducted a sensitivity analysis to further understand the effect of including traditionally circumcised men in our uncircumcised group on our STI outcomes models (see Appendix Table 1 and Figure 1, Supplementary file
1). We dropped the traditionally circumcised men in our analytic sample and reran all our outcome analyses using the same methods described above for the main analyses.
We conducted a subgroup analysis to provide more novel information and establish heterogeneity in MMC association with HIV and HSV-2 (Table
3). The subgroup analysis was conducted on two subgroups of respondents: those who were coinfected with any other STI other than the primary STI outcome and those who were not coinfected with any other STI other than the primary STI outcome. This secondary analysis was necessitated because coinfections with other STIs might mediate associations observed for the primary outcomes. For the subgroup analysis, we used the same methods as described for the primary analyses.
Targeted maximum likelihood estimation
Let T, Y denote the exposure (or treatment) indicator and observed outcome (MMC status and STI outcome, respectively, in this context), and let W be a vector including the identified confounders for the effect of T on Y.
The implementation of TMLE is straightforward. We first fitted an initial logistic regression of the STI outcome
Y, given the MMC status and covariates, Q
0 (
T, W) = E0(
Y |
T,
W). The estimate
\( {\mathrm{Q}}_n^0 \) (
Ti,
Wi) and the predictions
\( {\mathrm{Q}}_n^0 \) (1,
Wi) and
\( {\mathrm{Q}}_n^0 \) (0,
Wi) were estimated with Super Learner. Super Learner is an ensemble learner of a pre-specified library of algorithms with parameters. It uses cross-validation to adaptively create an optimally weighted combination of estimates from candidate algorithms [
22]. Optimality was defined based on each ensemble learner fit using 10-fold cross-validation, thereby reducing the chance of overfitting.
These estimates
\( {\mathrm{Q}}_n^0 \) (
Ti,
Wi),
\( {\mathrm{Q}}_n^0 \) (1,
Wi), and
\( {\mathrm{Q}}_n^0 \) (0,
Wi) form additional columns in our data matrix. We then plugged-in our estimates
\( {\mathrm{Q}}_n^0 \) (1,
Wi), and
\( {\mathrm{Q}}_n^0 \) (0,
Wi) into our substitution estimator of the parameter of interest, prevalence (or risk) ratio, to obtain an untargeted estimate:
$$ {\psi}_{MLE,n}={Q}_n^0\left(1,{W}_i\right)/{Q}_n^0\left(0,{W}_i\right), $$
We next estimated the conditional distribution of MMC given covariates W,
g0 = P (
T | W) with Super Learner, using the same set of algorithms. The predictions
gn(1 |
Wi) and
gn(0 |
Wi) were added to our data matrix. Initial estimates of Q
0 (
T, W) were then updated along a path of some fluctuation parameters, incorporating additional information from the propensity score function to reduce residual confounding in Q
0 (
T, W). This updating involves two steps: Firstly,
gn, was used in a clever covariate
\( {H}_n^{\ast } \) (T, W) to define a parametric working model to fluctuate Q
0 (
T, W).
$$ {H}_n^{\ast}\left(\mathrm{T},\mathrm{W}\right)=\left(\frac{I\left(T=1\right)}{g_n\left(1|W\right)}-\frac{I\left(T=0\right)}{g_n\left(0|W\right)}\right) $$
For each individual with Ti =1 and Ti =0, the clever covariates are calculated as \( {H}_n^{\ast } \) (1, Wi) = \( \frac{1}{g_n\left(1\ \right|{W}_i\Big)} \) and \( {H}_n^{\ast } \) (0, Wi) = \( \frac{-1}{g_n\left(0\ \right|{W}_i\Big)} \), respectively. In addition to adding the columns \( {H}_n^{\ast } \) (1, Wi) and \( {H}_n^{\ast } \) (0, Wi), these values are then combined to form a column \( {H}_n^{\ast } \) (Ti, Wi) in the data matrix.
In the second and final step, we estimated the fluctuation parameter ℇ
n by fitting an intercept-free logistic regression of Y on
\( {H}_n^{\ast } \) (T, W) with the logit of
\( {\mathrm{Q}}_n^0 \) (T, W) being an offset (fixed quantity), where is the resulting coefficient of the clever covariate
\( {H}_n^{\ast } \) (T, W). We next updated the estimate
\( {\mathrm{Q}}_n^0 \) into a new estimate
\( {\mathrm{Q}}_n^1 \) of Q
1 (
T, W):
$$ \mathrm{Logit}\ {\mathrm{Q}}_n^1\left(\mathrm{T},\mathrm{W}\right)=\mathrm{Logit}\ {\mathrm{Q}}_n^0\left(\mathrm{T},\mathrm{W}\right)+{\varepsilon}_n\kern0.5em {\mathrm{H}}_n^{\ast}\left(\mathrm{T},\mathrm{W}\right) $$
We calculated
$$ \mathrm{Logit}\ {\mathrm{Q}}_n^1\left(\mathrm{T},\mathrm{W}\right)=\mathrm{Logit}\ {\mathrm{Q}}_n^0\left(1,\mathrm{W}\right)+{\varepsilon}_n\kern0.5em {\mathrm{H}}_n^{\ast}\left(1,\mathrm{W}\right) $$
for all individuals, and then
$$ \mathrm{Logit}\ {\mathrm{Q}}_n^1\left(0,\mathrm{W}\right)=\mathrm{Logit}\ {\mathrm{Q}}_n^0\left(\mathrm{T},\mathrm{W}\right)+{\varepsilon}_n\kern0.5em {\mathrm{H}}_n^{\ast}\left(0,\mathrm{W}\right) $$
for all individuals and included additional columns of \( {\mathrm{Q}}_n^1 \) (1, Wi) and \( {\mathrm{Q}}_n^1 \) (0, Wi) to our data matrix.
The updated estimates
\( {\mathrm{Q}}_n^1\left(1,W\right) \) and
\( {\mathrm{Q}}_n^1\left(0,W\right) \) were then used to compute the targeted estimator:
$$ {\psi}_{TMLE,n}={\mathrm{Q}}_n^1\left(1,{W}_i\right)\Big\}/{\mathrm{Q}}_n^1\left(0,{W}_i\right) $$
Our Super Learner library algorithms included generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO) regularized GLM, generalized additive models, random forests, neural networks, k–nearest-neighbours, and the simple mean.
Full matching on the propensity score
Propensity score full matching (PSFM) is a synthesis of stratification on the propensity score (strata of the two exposure groups) and optimal pair-matching, which forms pairs of subjects from each of the two exposure groups such that the average within-stratum difference in the propensity score is minimized [
23]. The stratification imposed by PSFM ensures that the following weighting system for estimating average treatment effects can be applied in each stratum for a subject
i: ω
i = T
i P (T
i = 1)
\( \frac{\left(\mathrm{t}+\mathrm{u}\right)}{u} \) + (1 − T
i) (1 – P (T
i = 1))
\( \frac{\left(\mathrm{t}+\mathrm{u}\right)}{t} \) [
24], where t and u denote the number of exposed and unexposed subjects in a given stratum, and P (T
i = 1) is the marginal probability of treatment in the overall sample.
Inverse probability of treatment weighting
We defined the propensity score as the conditional probability that a participant was exposed (or treated), given the covariates: e = P (
T = 1 |
W), estimated using the logistic regression. For IPTW, weights are computed to denote the inverse of the probability of receiving the treatment received by the subject. To obtain the IPT weights for estimating the average treatment effects, exposed (or treated) subjects are assigned a weight equal to the reciprocal of the propensity score, while the unexposed (or control) subjects are assigned a weight equal to the reciprocal of one minus the propensity score): ω
i =
\( \frac{{\mathrm{T}}_i\ \mathrm{P}\ \left({\mathrm{T}}_i=1\right)\ }{{\mathrm{e}}_i\ } \) +
\( \frac{\left(1-{\mathrm{T}}_i\right)\ \mathrm{P}\ \left({\mathrm{T}}_i=0\right)\ }{1-{\mathrm{e}}_i\ } \) [
25].
For each of the two propensity score methods, these induced weights are then incorporated in a weighted univariate log-binomial regression model, which involves regressing the STI outcome on the MMC status to estimate the prevalence ratios. As suggested by Dugoff and colleagues [
26], we also included the survey weight as an additional covariate in the propensity score model.
Analyses accounted for the survey design by incorporating the survey weights in their final estimation. Only a few variables had missing values and are shown in Table
1. In the multivariable analyses, ‘missing’ was made a separate category for the variable capturing the number of partners; for education level, missing values (
n = 1) were excluded. TMLE was implemented using the
tmle package [
27] in R version 4.0.0. PSFM and IPTW were respectively implemented using the R packages
MatchIt [
28] and
WeightIt [
29].
Table 1
Descriptive characteristics of the male participants (n = 2850) in the HIPSS study according to their MMC status
Age (years) | 26.2 ± 7.5 | 31.8 ± 8.7 |
Marital status |
currently married | 52 (6.3%) | 187(11.1%) |
widowed/divorced/separated | 26 (3.2%) | 108 (4.1%) |
Single | 752 (90.4%) | 1725 (84.8%) |
Education (missing = 1) |
primary school/incomplete high school | 292 (37.3%) | 1074 (56.2%) |
completed high school | 428 (50.9%) | 783 (36.9%) |
has a degree/diploma | 91 (10.5%) | 91 (4.1%) |
No education | 19 (1.3%) | 71 (2.7%) |
Condom used with partner |
always used condom | 242 (31.2%) | 402 (21.1%) |
sometimes used condom | 420 (47.4%) | 1134 (52.4%) |
never used condom | 168 (21.5%) | 484 (26.5%) |
Number of lifetime sex partners |
1 | 141 (5.4%) | 278 (9.9%) |
2 | 126 (4.1%) | 214 (8.1%) |
3–5 | 263 (34.2%) | 592 (30.4%) |
6+ | 204 (25.3%) | 552 (30.9%) |
Refused to respond | 96 (8.0%) | 384 (13.2%) |
Total monthly household income |
No income | 105 (10.9%) | 322 (12.9%) |
≤ R2500 | 358 (40.8%) | 1044 (48.2%) |
R2500 - R6000 | 205 (27.6%) | 373 (23.7%) |
> R6000 | 93 (13.9%) | 114 (7.9%) |
No response | 69 (6.7%) | 167 (7.3%) |
Household receives social support grant | 389 (60.6%) | 801 (55.7%) |
Had sex in the last 12 months | 703 (24.9%) | 1650 (59.2%) |
Drinks alcohol | 378 (46.0%) | 969 (48.0%) |
Neisseria gonorrhoea (+ve) | 13 (1.5%) | 46 (2.0%) |
Chlamydia trachomatis (+ve) | 70 (8.8%) | 99 (4.2%) |
Syphilis (+ve) | 17 (1.9%) | 66 (3.2%) |
Mycoplasma genitalium (+ve) | 32 (4.0%) | 154 (7.6%) |
Trichomonas vaginalis (+ve) | 24 (2.7%) | 121 (5.4%) |
Hepatitis B (+ve) | 22 (2.5%) | 127 (6.3%) |
HSV-2 (+ve) | 324 (36.4%) | 1205 (59.8%) |