Study selection
Articles were screened and selected based on the following criteria: (A) NPS prevalence (dichotomous data) and/or NPS severity (continuous data) for females and males separately. We included papers that referred to both sex differences and gender differences. Furthermore, sex differences had to be reported for either overall NPS burden or specific symptoms and not for clusters of NPS due to its limited comparability. (B) Clinical diagnosis of AD dementia based on either the Diagnostic and Statistical Manual of Mental Disorders (DSM) or International Classification of Diseases (ICD) classification systems or conventional consensus criteria [
21,
22]. (C) NPS were assessed using a validated instrument such as the Neuropsychiatric Inventory (NPI) [
23] or established using well-defined diagnostic criteria, e.g., depression in AD [
24]. (D) Studies had to report sufficient information needed to perform a meta-analysis (e.g., means, standard deviations, frequency tables, and/or odds ratios [OR]). (E) Studies had a cross-sectional observational design. In case of longitudinal data, only baseline data were used. Articles containing small selectively sampled populations were excluded, e.g., sex- and age-matched samples. In cases in which the same cohort of patients was used in different studies, only the study with the largest
N was selected.
Two independent reviewers (W.S.E., M.P.) screened titles and abstracts, and subsequently inspected full texts for eligibility. Discrepancies were discussed, and consensus was reached (with E.v.d.B.).
Data synthesis and statistical analysis
For this meta-analysis, we studied sex differences in NPS for studies reporting on NPS
prevalence and NPS
severity. We examined sex differences in studies that reported the prevalence of any NPS, total scores of NPS measures (e.g., NPI total score), and the prevalence and/or severity for specific NPS analogous to the twelve NPI domains: delusions, hallucinations, agitation/aggression, depressive symptoms, anxiety, euphoria, apathy, disinhibition, irritability, aberrant motor behavior, nighttime behaviors, and eating behaviors [
23]. In addition, psychotic symptoms were also studied separately since studies used criteria for psychosis in AD [
27], psychosis domain score of the Behavioural Pathology in Alzheimer’s Disease (BEHAVE-AD) Scale [
28], or NPI domains of hallucinations and delusions combined [
23]. Note that instruments such as the NPI assess
neuropsychiatric symptoms, while diagnostic criteria such as psychosis in AD or DSM diagnosis of a major depressive episode capture
neuropsychiatric syndromes. In our analyses, these assessment methods will initially be combined and denoted as symptoms. Next, meta-regression analyses will be used to examine the differences in the outcomes between studies that used questionnaires (symptoms) and studies that used diagnostic criteria (syndromes).
For the studies that reported on NPS prevalence, ORs were calculated based on the 2 × 2 frequency tables based on the following formula:
\(\mathrm{OR}=\frac{\left({\mathrm{NPS}}_{\mathrm{females}}/\mathrm{non}-{\mathrm{NPS}}_{\mathrm{females}}\right)}{\left({\mathrm{NPS}}_{\mathrm{males}}/\mathrm{non}-{\mathrm{NPS}}_{\mathrm{males}}\right)}\). An OR = 1 represents that there is no sex difference in NPS, whereas an OR > 1 suggests that female sex is associated with higher odds of having NPS and an OR < 1 suggest that male sex is associated with higher odds of having NPS. For the studies that reported on NPS severity, means and standard deviations were converted into Hedges’
g using the following formula:
g =
\(\frac{M_1-{M}_2}{{\mathrm{SD}}_{\mathrm{pooled}}}\), where SD
pooled was calculated based on the following formula:
\({\mathrm{SD}}_{\mathrm{pooled}}=\sqrt{\frac{{\mathrm{SD}}_1^2+{\mathrm{SD}}_2^2}{2}}\). If studies did not report the means and standard deviations, reported effect sizes were converted to Hedges’
g using conventional formulas [
29]. A positive effect size indicates more severe NPS for women compared to men.
Heterogeneity was assessed with the
I2 statistic and tested using Cochran’s
Q-test [
30]. The
I2 statistic is an appraisal of the consistency of the effect sizes: > 25% suggests low, > 50% suggests moderate, and > 75% suggests high inconsistency across studies. In case of a significant
Q statistic and moderate or high inconsistency across studies, we conducted outliers/influential study diagnostics. Influential studies were identified if one of the following was true: DFFITS value > 3√(
p/(
k −
p)) where
p is the number of model coefficients and
k is the number of studies, lower tail of a chi-square distribution of
p degrees of freedom cutoff by the Cook’s distance > 50%, hat value > 3(
p/
k), and/or the DFBETAS value > 1 [
31]. In case influential cases were identified, leave-1-out meta-analyses were conducted to examine how individual studies affected the summary statics. Based on these analyses and visual examination of the forest plots, we excluded one study in the meta-analysis for studies reporting on the prevalence of any NPS, one study in the meta-analysis on psychotic symptoms prevalence, one study in the meta-analysis on irritability prevalence, one study in the meta-analysis on agitation prevalence, and one study in the meta-analysis on aberrant motor behavior prevalence (see Additional file
1: eTable 8). For meta-analyses on NPS severity, one study was identified as an outlier in the meta-analyses on the total scores of NPS measures, agitation, aberrant motor behavior, anxiety, apathy, delusions, depressive symptoms, disinhibition, euphoria, and hallucinations (see Additional file
1: eTable 8).
The following meta-regression and subgroup analyses were selected a priori: study setting (community-based vs. clinic sample), clinical relevance (neuropsychiatric symptoms vs. a clinically relevant cutoff score or clinical criteria for NPS syndrome), method of NPS assessment (proxy vs. self-reported), NPI vs. non-NPI measures, mean age of patients, mean years of education of patients, mean Mini-Mental State Examination (MMSE) score, mean disease duration in years, percentage of
APOE-ε4 carriers, and study quality (poor/fair/good). In addition, we ran subgroup analyses for studies reporting significant sex differences in age, MMSE score, proportion
APOE-ε4 carriers, and/or disease duration compared to studies that did not find sex differences in these characteristics. We tested whether the heterogeneity across studies was explained by these moderators using omnibus Wald-type tests. We conducted meta-regression analyses including studies that were identified as outliers and only if a minimum of six studies was available for continuous moderators and at least four studies were available for each subgroup of categorical moderators [
32].
Funnel plot asymmetry was evaluated as an indication for publication bias. Begg’s rank tests and Egger’s regression tests were used to test for funnel plot asymmetry. If any of these tests was indicative of funnel plot asymmetry, the trim-and-fill method was used to estimate the number of missing studies and to recompute the summary statistics based on complete data [
33].
In order to aggregate studies that reported multiple independent outcomes, we used multilevel meta-analyses including a random factor for study. Multilevel meta-analyses were conducted for 18 outcomes across the 17 studies that reported the severity of depressive symptoms. Because substantial heterogeneity between studies was expected, random-effects models were applied for all analyses. All analyses were conducted using the
metafor package in
R v4.0 [
34].