Study participants
All women aged ≥18 years, with or without POP symptoms, willing to participate in the studyand who visit the Gynecology Outpatient Clinic of the University of Gondar Hospital between December 2017 and March 2018 were eligible for inclusion. However, women who had a psychiatric problem, could not speak or understand Amharic, had undergone previous POP surgery, had a known or suspected pregnancy, were postpartum (first 6 weeks following childbirth), had palpable pelvic mass (uterine, ovarian, colorectal, bladder) or had history of acute symptoms of urinary tract infection were excluded from the study.
Study participants were identified by one of the research team (MA) before undergone symptom screening and pelvic examination. Symptoms of POP were assessed (MA) using two questions [
4,
37]: Do you have a feeling of bulging/pressure or something coming down through the vagina? Do you have a visible mass protruding from the vagina? If the participant had experienced one or both of these problems in the past 1 year, they were considered to have symptoms of POP and were defined as symptomatic.
Data collection
Patients were recruited consecutively and a two-stage strategy was used to collect data. First, a face-to-face interview was conducted by two female Midwifery Nurses using the translated P-QoL at the outpatient visit (baseline data). These data collectors were not involved in pre-testing. After completing the questionnaire, all women were asked to volunteer for a pelvic examination. One research team member (TG) blinded to the questionnaire score performed the pelvic examination. The simplified Pelvic Organ Prolapse Quantification (S-POPQ) staging system was applied [
38]. Pelvic examination was supervised by the research team gynaecologist (MA). Pelvic examination was done after the woman emptied her bladder. After receiving an explanation of the procedure, the participant was requested to lie on an examination couch in the lithotomy position. A disarticulated Graves speculum was inserted into the vagina. The posterior vaginal wall was retracted to observe the descent of the anterior vaginal wall and the degree of protrusion in relation to hymenal ring with strain or cough. Secondly, the anterior vaginal wall was retracted to observe a descent of the posterior vaginal wall during straining. In accordance with the method, no measuring device was used. The examiner estimated the degree of descent by observing the points on the anterior and posterior vaginal segments that were used to represent the respective walls. The point descent in relation to the hymenal ring while performing Valsalva or cough was recorded as the stage in the three areas examined (anterior, posterior and apical/cervix) and the final stage was the maximum one from the three measurements. Accordingly, women were assigned a SPOPQ stage as: stage 0, no prolapse; stage 1, leading point of the wall of the vagina or cervix remains at least 1 cm above the hymenal ring; stage 2, leading point descends to the introitus, defined as an area extending from 1 cm above to 1 cm below the hymenal ring; stage 3, leading point descends > 1 cm outside the hymenal ring, but does not form a complete vaginal vault eversion or procidentia uteri, and stage 4, complete vaginal vault eversion or procidentia uteri [
38].
To measure the test–retest reliability, a randomly selected patients (
n = 70) were asked to complete the questionnaire 2 weeks later. Patients were selected at random for to maximize the probability that the patients who received the questionnaire were representative of the sample population. The follow-up assessment was performed with face-to-face interviews by same data collectors who collected the baseline. Stability was evaluated by Patient Global Impression of Change (PGIC) scale [
39] using the above data collectors. The PGIC evaluates overall health status as perceived by the patient in a seven-point single-item scale ranging from ‘very much worse’ to ‘very much improved’. For descriptive purposes, patients were classified into three categories according to the PGIC score: disease deterioration (very much worse, much worse and minimally worse), stable disease (no change) or disease improvement (very much improved, much improved and minimally improved) since the initial baseline visit. Women were considered stable if she rate “no change” on the PGIC scale [
40]. The PGIC have been implemented and/or validated in clinical studies of patients with urogenital prolapse [
41]. The questionnaire was translated from English to Amharic without back-translation before use. In this study women were considered stable if she scored “no change or almost the same” on the scale.
Statistical analysis
Sociodemographic characteristics and selected clinical background information were described with descriptive statistics. The responses were checked for completeness and partly completed questionnaires were removed prior to analysis. When necessary, items were recoded and transformed [
18]. Semantic, idiomatic, experiential and conceptual equivalences were evaluated using content and face validity and acceptability. However, measurement equivalence was evaluated with test-retest reliability, internal consistency, and construct and criterion validity based on the COSMIN recommendations [
42]. The significance level was set as 0.05.
Content validity, whether all domains of the P-QoL would cover all the appropriate domains of HRQoL, was evaluated. Questionnaires that demonstrate content validity should have few missing responses, use the full range of scores with little skew, and have few ceiling (best possible score) or floor (poorest possible score) effects.
Face validity, the extent to which a questionnaire is a logical measure of what it intends to measure in the opinion of the experts and patients [
43], was evaluated by the expert committee throughout the adaptation process and the pre-test through qualitative analysis of the comments provided. The experts were asked to make remarks or comments on the plausibility of the questions, the comprehensiveness, and the relevance of a scale ranging from 1 to 4 (very relevant to irrelevant). Expert agreement on relevance was calculated using the CVI, and agreement ≥80% was considered acceptable [
36]. Moreover,
acceptability, the extent to which an instrument is acceptable to participants, was evaluated using the estimated time required to fill out the questionnaire, percentage of fully completed questionnaire, percentage of difficult/distressing item, and levels of missing data [
44].
Reliability was assessed using agreement and consistency indices. Cronbach’s alpha was computed to assess the internal consistency of subscale and items in the P-QoL questionnaire, and values of ≥0.7 were considered adequate [
30]. We further analyzed item-to-subscale and item-to-total correlations to evaluate the fit of the item within the subscale and the total score. Item-total correlations of ≥0.5 and interitem correlations ≥0.3 were considered acceptable [
45]. We hypothesized that individual items or indicators of the scale should all be measuring the same construct and thus be highly inter-correlated. The interclass correlation coefficients (ICC 2, 1; two observation time points of one item) was calculated in order to evaluate the reproducibility of the results (under constant condition). Single rating, absolute agreement, and a two-way mixed- effects model were used. We assumed that item scores of the two test results would be in agreement and ICC value ≥0.7 were satisfactory [
42].
Construct validity was evaluated by factorial (exploratory and confirmatory factor analysis, discriminant and convergent validity) and known group validity (hypothesis testing) [
46].
Exploratory factor analysis (EFA) is known as a data-driven method, and confirmatory factor analysis (CFA) as a theory-driven method. So the usage of EFA or CFA should be strictly considered and chosen according to the aim of a study, and aimless application of EFA and CFA to the same dataset should be avoided [
47]. Latent variable structure of a dataset can be explored with EFA. On the other hand, CFA requires an a priori hypothesis or previous “theory” as CFA is a hypothesis testing method which tests whether the obtained dataset is suitable for a model [
47]. Thus, first we used CFA to investigate whether the 9-factor structure can be replicated in the new dataset (model fit of the dataset obtained from 212 participants). CFA with maximum likelihood estimation was used for validation [
48,
49]. The following goodness-of-fit indices were used to assess the model: Tucker Lewis Index (TLI; > 0.90 acceptable, > 0.95 excellent), the Comparative Fit Index (CFI; > 0.90 acceptable, > 0.95 excellent), and Root Mean Square Error of Approximation (RMSEA; < 0.08 acceptable, < 0.05 excellent), and Standardized Root Mean Residual (SRMR; < 0.08 acceptable) [
50]. Second, after performing CFA, we extracted a more suitable factor structure from the same dataset. We then performed exploratory factor analysis (EFA) [
48,
51]. Since our sample data violated the assumption of multivariate normality, EFA was performed using Principal axis factoring (PAF) extraction method [
48,
49]. Extracted factors were rotated by oblique (promax) rotation [
52]. Oblique rotation was chosen based on the expectation that dimensions of health would be associated [
53]. Prior to conducting EFA, Bartlett’s test of sphericity (
p < 0.05) [
54] and the Kaiser–Meyer–Olkin (KMO
> 0.5) measure of sampling adequacy [
55] was performed to evaluate the factorability. The determination of the number of meaningful factors to be retained was guided by the scree plot test (above the break or elbow), Kaiser’s criteria (Eigenvalue≥1), interpretability, and the cumulative variance explained (> 40%) [
56]. Items of the P-QoL were retained based on the following criteria: those with primary factor loadings > 0.4 and secondary factor loadings < 0.3 [
51]. Items that did not meet these criteria were individually removed and the EFA repeated until all remaining items met these criteria for item retention. The reliability of items in each factors was examined using Cronbach’s alpha and value ≥0.7 for a factor was deemed reliable [
30]. We also evaluated convergent and discriminant validity for the extracted factors. Factor-based convergent validity, the degree to which items within a single factor are highly correlated, was measured by composite reliability (CR ≥0.7) and average variance extracted (AVE ≥0.5) [
57]. AVE < CR was used to establish convergent validity [
58]. Factor-based discriminant validity, the extent to which factors are distinct and uncorrelated, was assessed by comparing AVE, maximum shared squared variance (MSV), average shared squared variance (ASV) and square root of AVE [
59]. Discriminant validity was corroborated if AVE > MSV/ASV and the AVE square root of a given factor greater than inter-construct correlation [
57]. Model validity measures was performed using “master Validity Tool”, AMOS Plugin [
60].
Known group validity was evaluated by comparing the median-score distribution of P-QoL factors according to symptom status of participants. Women having POP symptoms are associated with poor HRQoL [
18,
28]. Therefore, we tested the hypothesis that women with symptoms suggestive of POP would had a lower HRQoL scores as compared with those without symptoms of POP. The participants of this study were divided into two groups based on the symptom status (symptomatic vs. asymptomatic). Median P-QoL score of the two groups were tested using Mann-Whitney U test since our P-QoL score did not follow a normal distribution.
Criterion validity, how well the questionnaire correlates with an existing gold standard, was assessed by comparing P-QoL factors scores with the objective vaginal examination findings using SPOP-Q system [
18]. Spearman’s correlation coefficient (SCC) was used to quantify the magnitude of the correlation. We used the following criteria to interpret the size of the correlation coefficients: 0.8–1.0 excellent, 0.61–0.80 very good, 0.41–0.60 good, 0.21–0.40 sufficient, and 0.00–0.20 poor [
61]. We hypothesized that P-QoL score is correlated with SPOP-Q score and women with higher score of SPOP-Q had poor HRQoL.
We used the Analysis of Moment Structures (AMOS; version 23, Chicago, IL) for CFA, the Statistical Package for Social Sciences (SPSS; version 20, IBM Corp., Armonk, NY) for EFA, and STATA version 14 (StataCorp, College Station, TX, USA) for other calculations.