Reliability between test and retest for MPA, VPA and MVPA using IPAQ-SF was good. With respect to validity, comparison of IPAQ-SF estimations of MPA, VPA and MVPA with the reference method SWA showed, however, limited agreement. Physical activity level was under-reported using IPAQ-SF for the total group, in contrast to most self-reporting questionnaires, used both in general and in pregnancy [
29,
39]. Interestingly, our results suggested that physically active pregnant women tended to under-report, while inactive pregnant women tended to over-report their physical activity level using IPAQ-SF. This indicates that self-reported estimation of physical activity varies by physical activity level.
Reliability
We found somewhat higher ICC (0.81–0.84) of IPAQ-SF compared to previous studies investigating test-retest of physical activity questionnaires, where median reliability coefficients varied from 0.62 – 0.76 [
40]. Furthermore, the present study demonstrated higher test-retest reliability compared to the initial test-retest of IPAQ-SF conducted among adults in 12 countries (pooled Spearman
τ 0.76) [
28,
40], as well as similar reliability compared to another pregnancy-specific self-reported questionnaire (0.78–0.83) [
24] and interview-based questionnaires (0.81–0.84) [
38,
41]. Reliability was highest for VPA (0.84), which may be explained by the often planned nature of these activities, making them easier to recall.
To achieve level one of evidence for reliability it is suggested that the time frame between the two questionnaires should be short enough not to change physical activity level, while long enough to prevent recall [
14]. The time frame of mean 2.5 weeks, appropriate sample size (>50) and analysis (ICC), as well as good correlation (>0.70) between MPA, MVPA and VPA in the present study supports achievement of level one evidence of reliability, according to points raised by Van Poppel et al. [
14]. Though the present study lacks measure of responsiveness, the high correlation coefficients reflect good consistency, which may give IPAQ-SF some value in repeated measures and ability to monitor change in physical activity level over time, as well as ability to compare physical activity levels before, during and after pregnancy.
Validity
Correlation coefficients for MPA (
τ = 0.08,
p = 0.536), VPA (
τ = 0.39,
p = 0.002) and MVPA (
τ = 0.14,
p = 0.280) in the present study were in accordance with correlation coefficients reported in a review comprising 23 previous studies using IPAQ-SF (between −0.09–0.38 for MPA,−0.18–0.47 for VPA and 0.15 for MVPA) [
29]. In addition, the present results are in line with other pregnancy-specific physical activity questionnaires that have been validated against a physical activity monitor, with correlation coefficients between 0.08–0.59 for MPA, VPA and total physical activity for self-reported questionnaires [
23‐
25], and between 0.06–0.59 for MPA, VPA, MVPA and total activity for interview-administered questionnaires [
38,
41]). In a systematic review on measurement properties for physical activity questionnaires in adults, Van Poppel et al. (2010) suggested a correlation cut off point >0.50 as sufficient for level 1 evidence of validation when compared to an accelerometer [
14]. Few physical activity questionnaires for pregnant women report correlation values >0.50. To our knowledge, a self-reported questionnaire validated by Haakstad et al. [
23] and an interview-administered questionnaire validated by Schmidt et al. [
38] were the only two studies reporting correlation values >0.50, and only for VPA and total activity/sports-exercise respectively. However, comparison to other studies should be done with caution as studies differ in methodologies that are used, including measurement methods, statistical analysis and cut-off points [
14,
41]. These differences might also partly explain the variation in results between studies, in addition to assessment in different trimesters when it concerns pregnancy. According to a systematic review comprising 148 studies, large variations of under- and over-reporting of physical activity level are found ranging from −100% to 4024%, with an average over-report of 138% [
39]. Our findings, that physical activity categorized as MPA and MVPA was under-reported by > 52%, are similar to results reported in a Swedish IPAQ-SF validation study using MTI Actigraph as criterion measurement, where MPA and VPA were under-reported by 49% and 31%, respectively, among the female participants (
n = 98) [
42]. Another validation study of a pregnancy-specific questionnaire conducted in Norway, using ActiReg® system as criterion measure, also found that MPA was under-reported, although not to such extent as in the present study (only 16%, MPA
τ = 0.15,
p = 0.183) [
23].
Pregnancy is associated with large physiological changes in cardiovascular, respiratory, hematologic and metabolic responses, leading to increased heart rate, respiration, resting metabolic rate and absolute energy cost, [
43,
44]. These changes in relation with IPAQ’s guidelines of moderate and vigorous intensity (
www.ipaq.ki.se) may explain the poor correlation between the two methods included in the present study. The physiological changes may alter the perception of intensity level with respect to physical activity and exercise. The wide limits of agreement of MPA (−84 ± 402) and MVPA (−85 ± 452) for the total group in the present study may indicate that IPAQ-SF does not assess these intensities accurately on an individual level in pregnancy. Further, as significant correlation between IPAQ-SF and SWA was seen only for VPA for our total group, these results indicate that IPAQ-SF alone may have limited value in assessment of physical activity level among pregnant women, especially if use of only one measurement point. The proven reliability of IPAQ-SF may make it suited for repeated measurements of physical activity level over time in study participants.
When dividing our total group based on those fulfilling the national physical activity guidelines or not, the active group (
n = 34) under-reported their MPA and MVPA with almost three hours/week. In contrast, the women in the inactive group over-reported MPA and MVPA by six (13%) and 23 (49%) minutes/week respectively, which is lower than most findings from previous validation-studies of IPAQ-SF (36–173%) [
29]. Our findings are similar to those reported by Shook et al. [
45], which demonstrated differences in self-reported physical activity level based on fitness level in the general adult population [
45]. In another recently published study using a pregnancy-specific questionnaire [
22], self-reported physical activity levels were over-reported among both active and inactive participants. Few studies have, however, focused on possible differences between those defined as physically active and physically inactive. In the present study the degree of under-reporting of MPA and MVPA by the active group was substantially larger than the corresponding over-reporting by the inactive group, resulting in considerable impact on the result for the whole group. Finally, similar to previous studies [
30,
45], we found VPA level over-reported in both the inactive and the active group.
There may be several reasons for discrepancies in self-reporting of physical activity level among active versus inactive pregnant women. Perception and tolerance of intensity in a given activity may be different. Due to the physiological changes in pregnancy, as decreased pulmonary reserve, increased cardiac output and systemic vasodilation, the inactive women may have experienced heavier breathing at a lower activity level and classified intensity as moderate in accordance to questionnaire guidelines, while the SWA may only have registered it as light intensity. On the other hand, the active women may have used the physiological responses before pregnancy as reference and thereby felt insufficiently active, which might have led to under-reporting of MPA and MVPA. A Canadian study conducted in 129 adults (
n = 90 females) highlights the difficulties in selecting the proper physical activity intensity; most participants underestimated MPA and VPA and, when instructed, only 24% walked at a moderate to vigorous pace, while the majority actually walked at light intensity [
46]. In addition, in the present study, active women might have spent more time walking compared to the inactive women, captured by the SWA as moderate intensity activity, but according to IPAQ instructions, walking should not be included in the moderate intensity category. The SWA registered significantly more steps among the active- compared to the inactive group (58616 steps vs. 45424 steps/week,
p = 0.005, data not shown), although quantifying steps does not include aspects such as intensity.
The significant correlation between IPAQ-SF and SWA seen for MPA for the inactive group may suggest that IPAQ-SF may be of some value to assess MPA for inactive pregnant women. Further, when we removed the three outliers seen in the Bland-Altman plot in Fig.
1, we found an association between IPAQ and SWA in assessing MPA and MVPA for the inactive group, with the mean differences and limits of agreement from Bland-Altman plots being 4 ± 138 and 10 ± 172 min/week, respectively (data not shown). However, removing outliers did not change the results significantly for the total or for the active group.
A wide range of self-reported physical activity questionnaires are available, though reviews have shown that it is difficult to point out some that are superior to other [
14,
40,
47‐
49], including those specific to pregnant populations [
50]. Accordingly, dose–response relationships between self-reported physical activity level and pregnancy outcomes remain difficult to establish [
51,
52]. Though we found limited validation of IPAQ-SF when assessing MPA, VPA and MVPA in pregnancy, only a small number of other self-reported physical activity questionnaires available for use in pregnancy possess overall good validity for measuring different physical activity intensities [
23‐
25]. Therefore, IPAQ-SF’s advantage of brevity and its ability to assess physical activity level preconception, during pregnancy and postpartum, as well as later in life, are of great value. This is especially relevant as the importance of initiating lifestyle changes pre-pregnancy is increasingly recognized [
53‐
55]. Additionally, IPAQ-SF’s good test-retest reliability in this study and for the general adult population [
28] supports use of repeated measures.
Strengths and limitations
Strengths of the present study included an acceptable sample sizes [
14], and that all data were cleaned and analysed according to the IPAQ protocol. Another strength is the use of the objective physical activity monitor SWA that combines information about different signals and captures movements. SWA is sensitive for several activities, from sedentary behaviour and sleeping to vigorous physical activity [
56]. Furthermore, the SWA is small, light and wireless and localized on the upper arm, a convenient location especially in pregnancy, compared to other activity monitors worn at the waist or hip. In addition, compliance with the SWA was high (mean wear 6.7 days, 98% on-body).
A limitation of SWA is that it must be removed when in contact with water and that it contains eight percent nickel, which may cause skin reactions. SWA has also, like accelerometers [
57,
58], been shown to have difficulties in registering inclined walking, rowing and cycling [
32,
59,
60]. Another possible limitation of the present study is that we included two different versions of SWA.
Another limitation is that we cannot report responsiveness of IPAQ-SF in the present study [
14]; in the test-retest study there is a lack of an objective comparison, while in the validation study there is a lack of two self-report measures.
Characteristics of the women in both our studies, such as age, marital status, household income and smoking habits, are similar to those reported in the largest cohort study conducted on pregnant Norwegian women (the Norwegian Mother and Child Cohort Study) as well as to the general female population of reproductive age in Norway [
54,
61,
62]. The majority of women in our studies (61.5% in the reliability study and 75% in the validation study) had higher education (college/university education) which also concurs with what was reported in the Norwegian Mother and Child Cohort Study (59.5%) [
54]. A large proportion of the general female population in Norway also has higher education (27.6 - 58.0%, age interval 20–39 years) [
63], although somewhat lower than what was observed in the validation study. In the reliability study, 27% of the participants were overweight/obese, which is similar to the prevalence found in the general female adult population in Norway (23% with BMI ≥ 27 kg/m
2) [
64] and in participants in the Norwegian Mother and Child Cohort Study (32.8% with BMI ≥ 25 kg/m
2) [
54]. In the validation study, however, only 10.9% of the participants were overweight or obese. Based on these characteristics, the participants in the reliability study seem to be representative of both the pregnant population and the general female population of Norway, while the participants in the validation study were somewhat slimmer and a slightly larger proportion had higher education. However, as this study aims to test-retest a questionnaire and to compare two measurement methods within the same subject, we maintain our assumption that motivated participants compliant to the planned investigations can provide relevant data for a methodical study. Further, as the two studies aimed to assess measurement properties of IPAQ-SF within each subject, one could argue that the results might have been similar in a random sample from the pregnant population [
25]. Additionally, IPAQ has been tested among adults both in developed and developing countries and demonstrated similar results [
28].
We have no information regarding non-responders. However, there were no significant differences in socio-demographic variables when comparing those included (n = 88 in reliability study, n = 64 in validation study) with those excluded from the analysis in the two studies (n = 18 in the reliability study and n = 31 in the validation study), except for 92% of included women being fully employed outside home in the reliability study compared to 100% of the excluded women (p = 0.019).