Background
Depression is a major health problem affecting pregnant women in low resource settings [
1,
2] with high prevalence rates of antenatal depression (10.7 to 47%) [
1‐
4]. Antenatal depression can lead to poor uptake of antenatal care, adverse birth outcomes [
3] and is a risk factor for postnatal depression [
5]. Routine screening for antenatal depression is essential for early identification of pregnant women with depressive symptoms [
6] and routine antenatal contacts with health providers provide opportune times for assessing, preventing and treating depression during pregnancy [
7].
There are however some challenges in these settings as many women may be ashamed to speak about depression as there is a cultural expectation of pregnancy happiness. In addition, these settings are understaffed, lack consultation rooms, have heavy workloads with high midwife to pregnant woman ratios. Midwives commonly have limited consultation time to explore depressive symptoms or risk factors and often lack guidelines or tools for assessing psychosocial status of pregnant women [
8]. In this setting, screening instruments suitable for the early detection of depression must be effective in the identification of individuals who are cases and those who are not [
9]. Suitable instruments must therefore demonstrate both high sensitivity and specificity [
9].
Many validation studies for depression screening tools have previously been conducted in high income countries (HICs) whose cultures and socio-economic context differ from those in low resource settings. Due to a concern about the variation of performance of screening tools in different populations and settings [
10] and with the aim of identifying a tool suitable to be recommended for use in antenatal services in low resource settings, a systematic review of instruments for screening depression in antenatal care in low resource settings was conducted.
Methods
The Standards for the Reporting of Diagnostic Accuracy Studies (STARD) guidelines were used to conduct the review [
10].
Search process
A limited search of the Cumulative Index of Nursing and Allied Health Literature (CINAHL) and Medline was undertaken to identify relevant keywords contained in the title, abstract, and subject descriptors. Search terms and synonyms were then identified for use in searching different databases for screening studies conducted in antenatal clinics in low resource settings. Low resource settings refer to settings where health care systems do not meet the minimum standards set by the World Health Organisation (WHO) or any other quasi-governmental organisation [
11]. In this review, low resource settings were defined as health care settings synonymous with those found in low income and lower middle income countries as defined by World Bank [
12] and some health care settings in upper middle income countries (UMICs), such as South Africa, where disparities in the public health infrastructure or supplies or human resources [
13] are found. Some articles from low resource settings are not indexed to indicate that they are reporting about health outcomes or disparities for under-served populations in low resource settings [
14] and the term, ‘low resource settings’, was not included in the search terms but applied manually at the article review stage. Date limits were set from 2000 to 2015 in anticipation that a wider period to be searched will yield many relevant studies with recent evidence. Detailed search terms are supplied in Table
1.
ScienceDirect | ALL (“screening instruments” OR “screening tools” OR “screening scale”) and ALL (depression AND antenatal). |
ALL (“screening instruments” OR “screening tools” OR “screening scale”) and ALL (depression AND pregnancy OR prenatal) AND LIMIT-TO (topics, “woman, patient, depression, depression scale, pregnancy, mental health, depressive symptom, health care, maternal, adolescent, health”). |
ALL (EPDS or CESD-10 or HSCL or K-6 or K-10 or SRQ or PHQ or GHQ) and ALL (depression AND antenatal) AND LIMIT-TO(topics, “woman, pregnancy, obstet gynecol, depression scale, depression, health, patient, maternal, depressive symptom, mental health”). |
ALL (“screening instruments” OR “screening tools” OR “screening scale”) and ALL (depression or “depressive disorder” AND antenatal or prenatal) |
CINAHL | TI screening AND TI depression AND TI pregnancy |
screening AND depression AND pregnancy AND LIMIT-TO (research article) |
screening tools AND depression AND antenatal |
epds validity AND depression AND antenatal |
TI Edinburgh postnatal depression scale OR TI Hopkins symptom checklist OR TI self-report questionnaire OR TI center for epidemiological studies depression scale OR TI patient health questionnaire OR TI general health questionnaire OR TI beck depression inventory OR TI whooley questions AND TI antenatal AND LIMIT-TO (research article) |
MEDLINE | TX depression AND TX screening tools AND pregnant women |
TI screening test AND TI antenatal depression |
TX depression AND TX screening AND TX pregnant women |
TI prenatal depression AND TI screening |
Pubmed | ((((“screening instruments”) OR “screening tools”) OR “screening scales”) AND depression) AND antenatal |
((screening[Title]) AND depression[Title]) AND antenatal[Title] |
(((screening[Title]) AND depression[Title]) AND pregnancy[Title]) |
SABINET | (alltext:(depression AND screening)^20 AND alltext:(antenatal)^20) |
(alltext:(depressive AND disorder AND screening)^20 AND alltext:(pregnant AND women)^20) |
PsychARTICLES | depression AND screening AND pregnancy |
The following databases were searched: ScienceDirect, CINAHL, MEDLINE, PubMed, SABINET and PsychARTICLES and results were imported into Endnote. Reference lists of key articles identified were hand searched to identify further relevant articles. Manual searches of indexes and “grey” literature databases were not carried out. The preliminary searches were conducted between August and September 2015 and the final search was done on 4th September 2015.
Review process, selection and data extraction
After the initial search, duplicates and irrelevant articles (conferences, congresses, editorials, commentaries, reviews, news, old) in the Endnote database were removed and the search data were exported to Excel. Articles for review were then selected in three phases.
Abstract and title screening
In this phase, the reviewers scanned the identified titles and abstracts independently and indicated in the Excel database which articles were relevant. Where the abstract did not provide enough information or the reviewers were unsure, the full text articles were reviewed and agreement reached between the reviewers on the inclusion or exclusion of the article. A kappa statistic was calculated to assess the level of agreement for eligibility for inclusion at this stage.
Screening based on PICOS criteria
The second phase of selection consisted of a review of articles by applying and extracting the PICOS criteria: Participants (P) (pregnant women at any stage of pregnancy attending antenatal care), Index test (I) (Screening instrument), Comparator test (C) (gold standard- psychiatric assessment), Outcome measures (O) (psychometric properties of screening instrument) and study setting (S) (low resource settings). In this phase, articles from HICs were excluded. Full text articles from UMICs were reviewed and included if the study setting was a public health setting and the studies were located in low resource settings where disparities in the public health infrastructure or supplies or human resources in the services were adequately described.
Article review
In the third phase, full texts of the articles were reviewed for reported validity of one or a combination of depression screening instruments (sensitivity, specificity, area under curve [AUC]) and whether a gold standard was present. The articles were independently examined by the reviewers to confirm inclusion. The gold standard was set as a formal diagnostic psychiatric assessment of depression as the most accurate test to detect the presence or absence of depression [
15]. Psychiatric diagnostic assessment of depression included the use of the Structured Clinical Interview for DSM-IV (SCID), the Mini-International Neuropsychiatric Interview (MINI), Composite International Diagnostic Interview (CIDI), International Classification of Diseases version 10 (ICD-10) or the Diagnostic and Statistical Manual of Mental Disorders version 4 (DSM-IV) by a psychiatrist to assign a diagnosis. The MINI and SCID are compatible with DSM-IV and have sensitivity/specificity above minimum acceptable level (.8/.8) for structured interviews which are used as gold standards [
16]. Instruments that are routinely used for depression screening such as Edinburgh Postnatal Depression Scale (EPDS) or other nonconventional psychiatric assessment instruments were not considered as gold standards.
Eligibility for full article review, assessment of study characteristics, and relevant data extraction was conducted using a review tool in Excel that included the PICOS criteria and the confirmation of the presence of psychometrics and a gold standard. For each eligible study the reviewers extracted information concerning: author, country of study, sample, gold standard, screening instrument, Area under the Curve (AUC), sensitivity (Se) and specificity (Sp). All results were subject to double data entry.
Assessment of methodological rigour
The Quality Assessment of Diagnostic Accuracy Studies (QUADAS) [
17] was used by both reviewers to assess the psychometric quality of the final selected articles. The QUADAS has 14 items with three possible responses ‘Yes’, ‘No’ and ‘Unclear’. In the QUADAS, the target condition was depression during pregnancy, the index test was a screening instrument used to screen for depression, and the reference standard was the gold standard against which the index test was validated. The QUADAS items measure the variability of study samples (items 1–2), methodological rigor and bias (items 3–7, 10–12 and 14), and the quality of reporting methodology (items 8, 9 and 13). The scoring of QUADAS is not standardised [
18] but studies were categorised as ‘excellent’ (11 to 14 items), ‘good’ (9 to 10 items), ‘adequate’ (6 to 8 items), ‘poor’ (4 to 5 items) or ‘unacceptable’ (0 to 3 items) based on the number of items that were answered ‘Yes’ [
17].
Analysis
Descriptive data extraction and presentation was done to compare screening instruments’ psychometrics data in a between-study literature analysis [
19]. A meta-analysis was conducted using REVMAN by pooling individual and all instruments sensitivity and specificity data to show the pooled ability of the screening instruments to identify depression. Upper and lower confidence intervals (95%) for sensitivity and specificity of screening instruments were calculated.
Discussion
An instrument being considered for selection for routine screening, should be inexpensive, be easy to administer, cause minimal discomfort and have high reliability and validity in distinguishing between cases and non-cases of a condition [
51]. In this review, screening instruments with a pooled sensitivity/specificity balance >85% were considered as ideal to distinguish between depressed and non-depressed women. The EPDS met criteria for both brevity and validity with this review, similar to two earlier systematic reviews [
21,
24] which found high sensitivity, high specificity and the highest level of accuracy (AUC = .965). Though the K-10 had the best pooled sensitivity (Se = .91), the EPDS had the best pooled specificity (Sp = .81). The BDI had a good sensitivity/specificity balance (Se = .85 and Sp = .76) respectively, but the EPDS sensitivity/specificity balance was more ideal with a higher specificity (important in screening out non-cases) and adequate sensitivity (Se = .80).
A second finding from this review is evidence that seven local language versions of depression screening instruments (BDI, CES-D-20, EPDS, HAM-D, HSCL-25, K-10 and SRQ) had acceptable sensitivities or specificities and level of accuracy in antenatal clinics in low resource settings. However, none of these instruments were specifically designed to measure antenatal depression in low resource settings and their sensitivity and specificity varied with studies. The included studies had significant differences in methodology, population sampled, gestation period, type of instrument used and gold standards which indicated that there was clinical heterogeneity amongst included studies. Nevertheless, forest plots showed that distinct subgroups of studies which used similar participants and instruments were homogeneous. But one has to bear in mind that this method of identifying heterogeneity has limited power in detecting bias when studies are few [
52].
It is documented that HIV prevalence in a population may influence the prevalence and severity of depression [
3]. However, in this review, the instruments (EPDS and K-10) which had highest sensitivity (Se = 1.0) were validated in general population of pregnant women while lowest sensitivity (Se = .69) of EPDS was found in both general population of pregnant women, and in sample comprising of HIV positive and HIV negative pregnant women. In this review, it was clear that the pooled sensitivity of EPDS (Se = .80) for a subgroup of adult and non-HIV positive pregnant women was higher than that for HIV positive women (Se = .78). Nonetheless, one may not clearly ascertain from this review the extent to which HIV status of pregnant women influenced validity of screening instruments.
In this review, it was clear that in Mexico, sensitivity of EPDS among teenager pregnant women was 0.05 lower than its sensitivity among adult pregnant women [
36,
37]. This may suggest that the population sampled may influence validity of a screening instrument. Studies have found that instruments may have different levels of sensitivity and specificity when applied to women at different stages of pregnancy. In this review, the EPDS had both highest sensitivity (Se = 1.0) [
4] and lowest sensitivity (Se = .69) [
34] among third trimester pregnant women and BDI had different sensitivity values among second trimester pregnant women in Brazil [
1,
39]. It was however not possible in this review it establish whether screening instruments may have different levels of sensitivity and specificity when applied to women at different stages of pregnancy due to inconsistencies in completeness of reporting in original studies.
Lastly, while systematic reviews are widely recognised as an efficient, reliable and comprehensive source of evidence for decision-making, few systematic reviews have considered effects on health equity [
14]. In the light of this, the reviewers’ recommendations were focused on the appropriate end-users (antenatal services in low resource settings) and we recognise that the findings are context-specific [
14]. In this context, the EPDS emerged as the most suitable instrument for screening antenatal depression in low resource settings where time and other resources are limited. This performance of the EPDS in low resource settings is important as it supports the existing evidence from HICs which cannot always be applied effectively in low resource settings [
53]. As such, this
emic evidence will supplement the existing
etic evidence to bring transformational health changes in antenatal care in low resource settings [
13] which have heavy workloads, insufficient staff, poor funding and lack of medicines and supplies [
11].
Strengths and limitations
One of the key strengths of the review is the specific evidence on screening tools used in antenatal services in low resource settings. It may serve as an efficient, reliable and comprehensive source of evidence for decision-makers in low resource settings [
14] since most evidence, generated from HICs, may not be applicable in low resource settings. A limitation of this review is that restrictions on language and date limits may have resulted in missing out some relevant articles.
Acknowledgements
We acknowledge all colleagues who offered guidance and technical support during development of the manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.