Background
In late 2013 and early 2014, Ukraine underwent a period of rapidly escalating and widespread political discontent, resulting in the Ukrainian revolution of 2014 (also known as the Euromaidan Revolution) [
1] and a prolonged, ongoing conflict in the eastern regions of the country [
2,
3]. Since the conflict began, more than 10,000 people have been killed and more than 25,000 people have been wounded, including many veterans of the “Anti-Terrorist Operation” (ATO), the Ukrainian government’s military effort to defeat pro-Russian separatists from eastern Ukraine [
4]. The conflict has severely disrupted social and economic life in eastern Ukraine, particularly in the oblasts (i.e., administrative regions) of Donetsk and Luhansk (the Donbass). More than 1.7 million Ukrainians—4% of the total population—have been internally displaced [
5]. Reviews of mental health problems among diverse conflict-affected populations, including internally displaced persons (IDPs) and military veterans, reveal that displacement and exposure to violence are consistently associated with elevated psychiatric symptomatology, especially related to depressive, anxiety, and stress-related disorders [
6,
7].
Beginning in 2015, the United States Agency for International Development’s (USAID) Victims of Torture Program began supporting mental health and psychosocial research and program activities to increase access to effective and appropriate services for the conflict-affected population in Ukraine. Johns Hopkins University (JHU) and its primary implementation partner, the National University of Kyiv-Mohyla Academy (NaUKMA), were chosen to lead this effort, which includes the identification and treatment of mental health problems among IDPs, veterans, and others affected by the conflict.
To enhance local capacity to identify conflict-affected individuals with mental health problems, we aimed to develop a set of reliable and valid instruments for self-report of common mental health problems among adult Ukrainian IDPs and ATO veterans. Although some structured assessments of mental health problems have been used in Ukraine (e.g., Composite International Diagnostic Interview), [
8,
9] to our knowledge there has been no local validation of self-report measures for these problems in any Ukrainian population. The validity study described herein entailed a process of instrument adaptation and testing and was part of a methodology for the design, implementation, monitoring, and evaluation (DIME) of community-based services to address mental health needs [
10]. In addition to the standard DIME process, we utilized Item Response Theory (IRT) methods to shorten and refine the instruments to make them more pragmatic for use in both research and clinical practice.
Results
Participant characteristics
A summary of participants’ demographic characteristics is provided in Table
1. In total, 153 participants (109 in Zaporizhia, 44 in Kyiv) were interviewed. The sample included adult IDPs (55%) and veterans. The status of five (3%) participants is unknown. There were slightly more male than female participants (54% vs. 46%). The majority were married (56%) or single (20%). Overall, the sample was highly educated; over half of participants (58%) had received at least a university degree. There were no statistically significant demographic differences between the re-interview sample and the single interview sample.
Table 1
Sample characteristics
Mean age in years (SD) | 39 (11) | |
Site |
Kyiv | 44 | 29 |
Zaporizhia | 109 | 71 |
Male | 83 | 54 |
Marital status |
Single | 31 | 20 |
Married | 86 | 56 |
Widowed | 9 | 6 |
Divorced | 27 | 18 |
Education |
Primary | 4 | 3 |
High School | 18 | 12 |
Vocational | 42 | 27 |
University | 83 | 54 |
Post-university | 6 | 4 |
Status |
IDP | 84 | 55 |
Veteran | 64 | 42 |
Non-disclosed | 5 | 3 |
Participants’ reports of exposure to traumatic events are shown in Table
2. Overall, we found high levels of exposure to traumatic events in our sample. Most participants had
experienced combat exposure (84%). Other common exposures were lost contact with loved ones/fearing for their safety (58%), physical assault (46%), and forced displacement (46%). Many participants also reported witnessing life-threatening illness/injury (44%).
Table 2
Frequencies and percentages of participants’ reports of lifetime exposure to potentially traumatic events
Combat or exposure to war zone | 128 | 84 | 57 | 37 | 26 | 17 |
Lost contact with loved ones and fear for their safety | 88 | 58 | 10 | 7 | 23 | 15 |
Physical assault | 71 | 46 | 30 | 20 | 21 | 14 |
Forced displacement | 71 | 46 | 38 | 25 | 29 | 19 |
Fire or explosion | 62 | 41 | 5 | 3 | 19 | 12 |
Sudden loss of possessions to the point of poverty | 61 | 40 | 4 | 3 | 13 | 9 |
Transportation accident | 53 | 35 | 35 | 23 | 9 | 6 |
Assault with a weapon | 52 | 34 | 21 | 14 | 48 | 32 |
Severe human suffering | 48 | 32 | 30 | 20 | 28 | 18 |
Life-threatening illness/injury | 43 | 28 | 67 | 44 | 31 | 20 |
Starvation or fear of starvation | 42 | 27 | 49 | 32 | 45 | 29 |
Serious accident during work/home/recreational activity | 21 | 14 | 36 | 23 | 58 | 38 |
Other unwanted/uncomfortable sexual experience | 17 | 11 | 10 | 7 | 7 | 5 |
Exposure to toxic substance | 16 | 10 | 15 | 10 | 10 | 7 |
Serious injury, harm, or death you caused to someone else | 16 | 10 | 19 | 12 | 11 | 7 |
Sexual assault | 15 | 10 | 25 | 16 | 20 | 13 |
Natural disaster | 13 | 9 | 36 | 24 | 17 | 11 |
Captivity | 3 | 2 | 28 | 18 | 10 | 7 |
Sudden, violent death | – | – | 12 | 8 | 34 | 22 |
Sudden accidental death | – | – | 68 | 44 | 17 | 11 |
Any other very stressful event/experience | 105 | 69 | 41 | 27 | 32 | 21 |
Reliability of the MHAI instrument
Table
3 presents the results for internal consistency and test-retest reliability for the full MHAI scales and the shortened scales based on the IRT analysis. All Cronbach’s alpha (α) values for the full and shortened scales were acceptable, as evidenced by scores greater than α = 0.70. On the full MHAI scales, there were minor if any differences between the baseline and re-interview samples for PTS (α = 0.97 vs. 0.97), anxiety (α = 0.90 vs. 0.89), depression (α = 0.94 vs. 0.93), and alcohol use (α = 0.86 vs. 0.87) The baseline sample had somewhat higher alpha scores than the re-interview sample for the local functioning scale (α = 0.92 vs. 0.78), and the WHODAS scale (α = 0.95 vs. 0.80). Results were similar for long and short scales for depression (α = 0.94 vs. 0.89), PTS (α = 0.97 vs. 0.91), and anxiety (α = 0.90 vs. 0.82).
Table 3
Reliability results of mental health symptom and functioning scales
Internal consistency (α) |
Baseline sample (n = 153) | 0.94 | 0.97 | 0.90 | 0.86 | 0.92 | 0.95 |
Re-interview sample (n = 30) | 0.93 | 0.97 | 0.89 | 0.87 | 0.78 | 0.80 |
IRT-based analysis |
Shortened scale | 0.89 | 0.91 | 0.82 | – | – | – |
Test-retest (ρ) |
Re-interview sample | 0.84 | 0.87 | 0.80 | 0.91 | 0.85 | 0.90 |
IRT-based analysis |
Shortened scale | 0.87 | 0.87 | 0.80 | – | – | – |
Test-retest reliability scores of the full MHAI scales for post-traumatic stress (r = 0.87), depression (r = 0.84), and anxiety (r = 0.80) were good, while reliability for alcohol use (r = 0.91), as measured on the ASSIST 3.0, was excellent. Test-retest reliability for functioning was excellent for the WHODAS (r = 0.90) and moderately high for the local functioning scale (r = 0.85 ). Nearly identical test-retest reliabilities were produced in the short scales for depression (r = 0.84 vs. 0.87), traumatic stress (r = 0.87 vs. 0.87), and anxiety (r = 0.81 and 0.80).
Convergent validity
For the full MHAI scales, we observed a very high correlation between depression and PTS (r = 0.94) as well as high correlations between depression and anxiety (r = 0.84) and between anxiety and PTS (r = 0.79). The correlations between alcohol use and the mental health problem scales were low (depression: r = 0.18; PTS: r = 0.25; anxiety: r = 0.11). For the short scales resulting from the IRT analysis, we also observed a very high correlation between depression and PTS (r = 0.94). The correlation was acceptable between PTS and anxiety (r = 0.70) and moderate between depression and anxiety (r = 0.67). Compared to the full scales, the short scales between alcohol use and the mental health problems scales improved (depression: r = 0.31; PTS: r = 0.28; anxiety: r = 0.33).
For the full MHAI functioning scales, we found moderate-to-strong correlations between the WHODAS and depression (r = 0.69), PTS (r = 0.70), and anxiety (r = 0.51). The correlation between the WHODAS and alcohol use scale was low (r = 0.18). The local functioning scale correlations with depression (r = 0.71), post-traumatic stress (r = 0.76), anxiety (r = 0.52), and alcohol use (r = 0.18) were similar to the correlations between these and the WHODAS. For the short scales resulting from the IRT analysis, we found moderate-to-high correlations between the WHODAS and depression (r = 0.66), PTS (r = 0.69), and anxiety (r = 0.51). We also found relatively strong correlations between the local functioning scale and depression (0.78), PTS (0.77), and anxiety (0.69).
In the full MHAI, we found moderate correlations between suicidal ideation and the mental health and functioning scales (range: r = 0.28–0.62). The correlations between the independent item about difficulty doing usual activities at home/work and the mental health scales was moderate-to-high (range: r = 0.61–0.76), except alcohol (r = 0.08). For the short scales resulting from the IRT analysis, we found a similar pattern of correlations between suicide ideation and the shortened mental health scales as well as between the independent item and the mental health scales, except for alcohol use, for which we noted a substantial improvement (r = 0.34).
Criterion validity
Table
4 presents the SCID diagnostic results. The majority of our sample (
n = 85; 57%) met the modified SCID diagnostic criteria for at least one disorder: Major Depressive Disorder (21%), Post-traumatic Stress Disorder (47%), Generalized Anxiety Disorder (1%), Adjustment Disorder (7%), Alcohol Abuse (7%), and Alcohol Dependence (4%). In general, comorbidities were low, except for comorbidity of MDD and PTSD (14%).
Table 4
Group differences on MHAI scale scores by SCID diagnosis
SCID Diagnosesc |
Major Depression |
Diagnosis + | 32 | 1.34 (0.08) | 5.49 (48)*** |
Diagnosis - | 118 | 0.82 (0.04) |
Post-traumatic Stress Disorder |
Diagnosis + | 70 | 1.11 (0.06) | 4.25 (143)*** |
Diagnosis - | 80 | 0.78 (0.05) |
Alcohol Abuse |
Diagnosis + | 10 | 25.9 (2.57) | 7.46 (10)*** |
Diagnosis - | 140 | 6.14 (0.64) |
Alcohol Dependence |
Diagnosis + | 6 | 27.3 (2.97) | 6.75 (5)*** |
Diagnosis - | 144 | 6.62 (0.67) |
Concurrent criterion validity was assessed by comparing scale scores between SCID diagnosed cases and non-cases for MDD, PTSD, Alcohol Abuse, Alcohol Dependence. Group difference tests (Table
4) indicated highly significant differences comparing cases to non-cases: MDD (M = 1.34, SD = 0.08 vs. M = 0.82, SD = 0.04,
t = 5.49, df(48),
p < .001), PTSD (M = 1.11, SD = 0.05 vs. M = 0.78, SD = 0.06,
t = 4.25, df(143),
p < .001), alcohol abuse (M = 25.9, SD = 2.57 vs. M = 6.14, SD = 0.64,
t = 7.46, df(10), p < .001), and alcohol dependence (M = 27.3, SD = 2.97 vs. M = 6.62, SD = 0.67,
t = 6.75, df(5),
p < .001). The distribution of scale scores for depression and post-traumatic stress by the associated diagnoses on the SCID are shown in Additional file
1: Appendix B.
Table
5 presents empirical cut-points, based on the Liu method of maximizing sensitivity, and test characteristics for the long and short versions of the MHAI scales. AUC statistics indicated sufficient or good differentiation for each of the four disorders. Overall accuracy estimates suggested fair-to-good average percentages of accurate classification by a given scale.
Table 5
Empirical cut-points and test characteristics of the long vs. short versions of the MHAI scales
Scalea |
Post-traumatic stress |
Long | 0.66 (0.04) [0.59, 0.73] | 0.915 (0.08) [0.75, 1.08] | 0.66 | 0.66 | 0.66 |
Short | 0.68 (0.04) [0.60, 0.75] | 1.042 (0.07) [0.91, 1.17] | 0.64 | 0.71 | 0.68 |
Depression |
Long | 0.75 (0.03) [0.68, 0.81] | 0.960 (0.12) [0.72, 1.20] | 0.84 | 0.65 | 0.69 |
Short | 0.78 (0.04) [0.71, 0.86] | 1.065 (0.11) [0.84, 1.28] | 0.81 | 0.75 | 0.76 |
Alcohol abuse |
Long | 0.89 (0.03) [0.83, 0.95] | 9.500 (6.27) [−2.78, 21.78] | 1.00 | 0.78 | 0.80 |
Short | 0.87 (0.05) [0.77, 0.96] | 7.500 (0.74) [6.06, 8.94] | 0.90 | 0.84 | 0.84 |
Alcohol dependence |
Long | 0.93 (0.02) [0.88, 0.97] | 14.500 (4.72) [5.24, 23.76] | 1.00 | 0.86 | 0.87 |
Short | 0.91 (0.03) [0.86, 0.96] | 7.500 (0.93) [5.66, 9.33] | 1.00 | 0.82 | 0.83 |
Instrument refinement using item response theory
Based on our item inclusion criteria, we selected 8 MHAI items for depression, 12 for PTS (5 overlap with depression items, 1 with anxiety, and 6 are unique to PTS), 4 for anxiety, and 8 for impaired function for our shortened instrument. Discrimination parameters for the depression items ranged from a = 1.5 for the item “feeling tired, low in energy or slowed down” to a = 3.1 for the item “feeling sad.” Difficulty parameters ranged from b1 = − 1.4 for the item “feeling tired or fatigued” to b3 = 3.7 for the item “psychomotor agitation or slowing.” For the post-traumatic stress items, discrimination parameters ranged from a = 1.7 for “avoiding thoughts or memories of the event” to a = 3.0 “feeling that no one understands.” Location parameters ranged from b1 = − 1.5 for “feeling upset when reminded of the traumatic event,” to b3 = 2.9 for “trembling or shaking.” For the anxiety items, discrimination parameters ranged from a = 2.14 for “trembling or shaking” to a = 2.9 for “feeling tense or keyed up,” with location parameters ranging from b1 = − 1.0 for “feeling tense or keyed up” to b3 = 3.3 for “trembling or shaking.” Finally, for the functioning items, selected from the WHODAS and local function scales, discrimination parameters ranged from a = 1.9 “doing hobbies” to a = 3.2 for “conversing with others;” location parameters ranged from b1 = − 0.2 for “doing hobbies” to b3 = 3.10 for “helping others.”
Test information curves of the shorter scales indicated sufficient and comparable reliability across the latent trait spectrum compared to the longer scales. Validity and reliability results for the shortened scales were comparable to the longer scores. The IRT approach yielded comparable or slightly higher AUCs for the shorter scales compared to the longer ones, indicating that selecting fewer, but high performing, items tended to increase diagnostic accuracy.
Discussion
This paper described the adaptation and psychometric testing of a set of scales of mental health and alcohol use problems in a sample of approximately 150 conflict-affected Ukrainians, namely IDPs and military veterans. Using a systematic approach, including the incorporation of locally relevant items based on a prior qualitative study we conducted in the same population, we generated a brief, reliable, and valid measure of three mental health problems (depression, post-traumatic stress, and anxiety) and alcohol use problems. The measure, which for convenience we refer to as the Mental Health Assessment Inventory (MHAI), can be used among male and female conflict-affected adults in Ukraine.
Psychometric testing entailed evaluation of internal consistency reliability and test-retest reliability as well as both convergent construct validity and concurrent criterion validity. Criterion validity was evaluated through the use of a standardized clinical diagnostic tool, the Structured Clinical Interview for DSM-IV-Research Version (SCID). We created a more pragmatic yet psychometrically robust version of the valid measure based on Item Response Theory (IRT) analyses. These analyses identified key symptoms and function items that, taken together, increased our diagnostic accuracy while shortening the time it takes to complete the assessment.
Approximately half of the participants met diagnostic criteria for Major Depressive Disorder, Post-traumatic Stress Disorder, Alcohol Abuse, and/or Alcohol Dependence. In comparing SCID-defined cases to non-cases for each disorder, we found significant differences (
p < .001) on each of the scale scores for depression, post-traumatic stress, and alcohol use problems, providing evidence of concurrent validity for the corresponding scales in the MHAI. The empirical estimates of diagnostic accuracy for the MHAI scales provided some additional evidence of their validity. Diagnostic accuracy was moderate for the post-traumatic stress scale and fairly strong for the depression and alcohol use scales. The short scales can be used and still achieve the same (or better) classification accuracy as the long scales. The empirical cut-points we used maximized sensitivity and specificity. Given the lack of psychometric research from the region with which to compare our findings, these results need to be interpreted with caution. Modifications to the cut-off score, such as by lowering it, may be appropriate if screening high-need individuals into mental health services is the ultimate goal, as connecting such individuals to care may counterbalance a higher false-positive rate. We echo others in highlighting the need for more research to calibrate screening instruments like these in studies of mental health in conflict-affected populations [
44].
Regarding reliability, overall we found very good estimates of internal consistency and test-retest reliability in our measures for symptoms of depression, post-traumatic stress, anxiety, and alcohol use. Cronbach’s alpha values for internal consistency reliability were consistently above 0.80, and the IRT-based analyses revealed the shortened versions of these scales were comparable to or, in some cases, better than the full versions of the scales in the MHAI. The coefficients for test-retest reliability were also consistently above 0.80, and we found comparable results in the IRT-based analyses, suggesting either the short or long versions of the scale can produce consistent results.
IRT analysis is an alternative—as opposed to a substitute—to standard reliability and validity analyses based on classical test theory (CTT). We elected to use IRT, in addition to CTT, because it can describe more finely the error typical of individual scale items written to tap into unobservable constructs, such as depression, post-traumatic stress, and anxiety [
11]. In psychiatric research, it is becoming increasingly recognized that IRT can assist instrument developers to identify particular scale items that best discriminate among individuals with regard to the level of intensity they experience the latent construct (e.g., depression) [
45]. This recognition has extended to mental health research on conflict-affected populations. For example, Betancourt and colleagues used IRT to refine a dimensional scale of psychosocial adjustment in Ugandan youth living in IDP camps [
46], and Haroz and colleagues used IRT to compare the performance of the Hopkins Symptom Checklist 15-item (HSCL-15) depression scale across eight countries [
47].
Surprisingly, we did not find high correlations between the mental health symptom scales and alcohol use or between the functioning scales and alcohol use. This is in contrast to other studies of displaced and veteran populations, both within and outside the region, which have found alcohol use to be highly correlated with mental health problems and functioning [
48‐
50]. It is possible the scale for assessing alcohol use problems (ASSIST 3.0) was not sufficiently sensitive to differentiate problematic from non-problematic use.
Although we found good evidence of concurrent criterion validity comparing SCID-defined cases to non-cases on the MHAI sub-scales for depression, post-traumatic stress, and alcohol use, very few participants met the diagnostic criteria for anxiety disorder, so we were not able to assess criterion validity for the anxiety scale of the MHAI. The reason for few anxiety diagnoses may have resulted from our sampling strategy, whereby we purposefully asked recruiters to refer people based on presentation of symptoms related to depression, post-traumatic stress, and alcohol misuse. Alternatively, the SCID criteria for PTSD or Adjustment Disorder may have better accounted for the symptomatology in this population than generalized anxiety. Cases of anxiety may have, therefore, been captured in other diagnostic categories prioritized in the SCID assessment. We note the SCID has not been widely used in eastern Europe, and we found no prior research testing its use in Ukraine. Notably, the SCID-5, which corresponds to the latest DSM criteria, has not yet been translated into a Ukrainian or Russian language version; we acknowledge the use of its predecessor as a study limitation.
This study had several other limitations. We sampled adult individuals from only two urban areas, although there are veterans and displaced individuals and families scattered across the country. The study sample size is relatively small, although it is similar to those of other psychometric studies conducted by our group [
24,
51] as well as other groups [
37,
52] in different populations in low- and middle-income countries. We also note that our a priori sample size calculation indicated that 45 participants in each group under study provided sufficient power to detect medium differences on symptom scores between the groups.
Our study was strengthened by working in partnership with local mental health experts, and due to the availability of a mental health workforce in Ukraine we were able to employ Ukrainian mental health professionals to use a standardized diagnostic tool (SCID) for evaluating the validity of the MHAI. We took care to ensure the quantitative assessments reflected the findings of our prior qualitative research involving IDPs, veterans of the conflict, non-IDP Ukrainian citizens, and mental health care workers. While much research on conflict-affected populations (pertinently) focuses on symptoms of depression and trauma [
53,
54], we also attended to alcohol use.
Conclusion
Accurate mental health research and appropriate service delivery requires reliable, valid, and useful measurement tools. The literature repeatedly calls attention to the high need for validated measures for both epidemiologic and clinical purposes. These kinds of measures are frequently lacking for conflict-affected populations, owing to difficulty and cost of local adaptation and testing. The methods and procedures used in this study (and based on research described elsewhere [
10,
11,
13]) were designed for relatively rapid investigations among conflict-affected populations.
To our knowledge, this is the first validity study of instruments to assess for mental health and alcohol use problems among Ukrainians affected by the current conflict. The resulting instrument is being used to facilitate enrollment screening and symptom tracking in a psychotherapeutic intervention for adult Ukrainian IDPs, veterans, and family members of veterans and will also be made freely available to other researchers and clinical workers in Ukraine. This study also demonstrated how IRT can produce shortened versions of measures that retain comparable—and, in some instances, improved—psychometric properties compared with the longer versions. We suggest that measurement methods based on IRT, in addition to those based on classical test theory, should become a standard practice in validity studies of common psychiatric and behavioral conditions.