Background
Physical activity (PA) is related to a number of health outcomes. According to the Global Recommendations on Physical Activity for Health by the World Health Organization (WHO), physical inactivity is the fourth leading risk factor for all deaths, and regular participation in PA reduces the risk of coronary heart disease and stroke, diabetes, hypertension, depression, breast and colon cancer [
1].
PA is defined as any bodily movement that results in energy expenditure [
2]. Aerobic, muscle-strengthening, bone-strengthening activity, and stretching are the four main types of PA [
3]. It is a complex behavior and, thus, challenging to measure. Different methods to measure PA exist, including behavioral observations, questionnaires, PA diaries, direct/indirect calorimetry, and motion sensors, such as accelerometer, heart rate monitors (HRM), combined heart rate and accelerometry devices and pedometers. Due to the many different methods available to measure PA, there is a lack of comparability among studies. Furthermore, a number of challenges need to be considered for the various methods including expense, time, recall-bias and equipment needs. While the doubly labeled water (DLW) method is the most costly measurement, the most cost-effective measurement is the administration of PA questionnaires, which can assess all types of PA and can be used in large samples. They can also cover longer time frames which, however, may also lead to recall bias. PA questionnaires have been generally designed to minimize these potential biases as much as possible. For example the Global Physical Activity Questionnaire (GPAQ) asks about a “typical week” to reduce the need for longer recall [
4].
Due to the complex and subjective information collected PA questionnaires may also over- or underestimate participants’ PA [
5,
6]. In particular, older adults are more likely to engage in light- to moderate-intensity PA, which is the most difficult type of activity to be assessed by questionnaires [
7]. Motion sensors, such as pedometers or accelerometers are increasingly implemented as an additional measure of PA in a free-living environment. Accelerometry has become a common tool in recent studies [
8]. Accelerometers are small electronic devices that record acceleration associated with body movement and provide an objective estimate of duration and intensity of locomotion [
9]. Today, a multitude of different accelerometers from a number of companies are on the market. They are generally able to assess PA in at least three axes (vertical, horizontal, and perpendicular). A typical output of accelerometer measurements is expressed in activity counts per unit of time, most frequently, counts per minute. In order to make the data comparable across types of accelerometers or types of PA measurement, activity counts can be translated into quantitative estimates of energy expenditure [
10]. Each accelerometer model has its own algorithm to convert accelerometry counts into kilocalories (kcals) or metabolic equivalent of tasks (METs). This may lead to different output values depending on the model used so that one cannot directly compare data from different models. Accelerometers are designed to measure all PAs, however, they have also limitations. Depending on the attachment site, single accelerometers are not able to detect all movements, (e.g., upper/lower body or stationary movement) or capture the context in which the measured activities take place (e.g., leisure time or work). They are not suitable for long-term measurements, hence, repeated administration of accelerometers is of great importance in order to assess seasonable variation in PA. Water-based activities may also lead to misclassifications of an individual’s PA profile, because not all devices are waterproof, thus must be removed during such activities. Their administration is logistically more complex and costly.
In a publication on best practices of PA monitors by Matthews et al., the authors state that there is a variety of possible wear positions and that a wear-period of 7 days may be sufficient [
11]. However, they suggest that further research is needed to inform the appropriate wear-time. Additionally, the daily required wear time is of strong interest, as it has been shown that modifications lead to significant differences in PA measures and adherence [
12,
13]. Data collection is very dependent on compliance by the participant to wear the device. Prior research indicates that healthy participants who are younger, unemployed or current smokers are more likely to be noncompliant [
14]. This variance in compliance is less likely to occur in cancer patients, as these are often motivated to modify their lifestyles [
15]. This is underlined by the results of a prior study in colorectal cancer patients that reported no significant differences in compliance of wear-time by age, gender, BMI or tumor stage [
16].
Many recent investigations combined the use of questionnaires and motion sensors in order to collect complementary and comprehensive data. Several systematic reviews have been performed comparing objective versus self-reported PA [
17,
18]. However, these reviews focused on different domains compared to this presented review. Prior research articles and questionnaires predominantly focused on a specific type of PA (e.g., leisure time, work). The GPAQ, however, is a more recently used instrument to assess several types of PA and thus, may be able to provide a more complete impression of an individual’s level of PA. It may be of future interest to review correlations between multimodal PA questionnaires such as the GPAQ and accelerometer data, when this method has been applied more often.
The objective of this study is to review accelerometer settings and wear methods to determine whether a practical standard for settings/wear methods exists. Furthermore, the aim is to determine correlations between accelerometry and PA questionnaire data, overall, and by gender, age and BMI. We believe that our approach using a large set of studies and stratifying on specific subgroups helps to fill gaps in our understanding of PA assessment types in multiple populations.
Methods
A total of 57 full articles published on simultaneous PA measurement in adults with accelerometry and questionnaires in free-living conditions were reviewed. A literature search was conducted in PubMed in July 2014. Search terms included “accelerometry”, “accelerometer”, “accelerometers”, “motion sensor” or “motion sensors”, and “questionnaire” or “questionnaires” and had to be identified in the title or abstract. For all publications, the following inclusion criteria were applied: (1) All participants within each study had to be adults (18+ years) to reduce age-related differences in PA patterns and (2) relevant investigations had to include a sample size of at least 100 participants to increase stability of the observed associations and to allow investigation of differences by age, sex and body mass index (BMI). The following exclusion criteria were applied to improve study comparability: (1) studies with wheelchair-using or non-ambulatory participants (2) articles not available in English, and (3) investigations lacking correlation data between accelerometry and questionnaires. Two authors (SS, MP) independently screened and extracted data from the studies according to the above mentioned criteria, regardless of publication date. Disagreements were discussed between the two authors and then resolved.
For this review, all investigations providing correlational comparison values between the different types of PA measurements for the purpose of validity assessment are presented and distinguished by sex, age, and BMI categories where possible.
Both, Spearman and Pearson correlation coefficients were included, depending on which type was provided by the study. The main difference between these two measures is that the Spearman correlation coefficient applies to non-parametric data, whereas the Pearson correlation coefficient requires normally distributed data. In order to make it clear to the reader which coefficient has been used we present (ρ) for Spearman correlations and (r) for Pearson correlations. Reported metrics in this review were derived from the individual studies. Correlations of accelerometer-derived total PA and total measures from questionnaires assessing sedentary behavior were excluded as they are expected to be inversely correlated. Accelerometer wear methods were also extracted. In Additional files
1,
2 and
3, the number of participants is presented along with the brand and/or model of accelerometer used in the investigations, the settings of the accelerometer measurements, the questionnaires used, and the correlations between PA measured by questionnaire accelerometer. Results organized by sex, age and BMI categories are presented sequentially.
Discussion
This review identified 57 publications that compared PA questionnaires with accelerometry data. Although there have been a few systematic reviews discussing self-reported versus objective PA [
17,
18], the present work is novel both in its subgroup assessment as well as in its framing of accelerometer wear methods. Today there are no set standards for use of accelerometers with respect to wear-time, minimal wearing time to be considered valid, or position of application, even though there seem to be trends for each of these aforementioned elements. Large observational studies, such as the National Health and Nutrition Examination Survey (NHANES) study have changed their protocols from attachment on hips to wrists [
76]. Some studies suggest that hip-worn accelerometers assess PA more precisely compared to wrist-worn devices [
77] whereas other investigations reported reasonable precise estimations of PA when using wrist-worn devices [
78,
79]. However, to some extent wear methods are dependent on the study aim, the design of the accelerometers or the activity that is aimed to be captured, as well as acceptability within the study population. Of the 57 studies reviewed, accelerometer wear-time of 7 consecutive days during waking hours was the most consistently reported duration of measurement (
N = 37, 65 %). Further requirements included having at least four out of 7 valid days (14 out of 37 studies), which was defined in most investigations, as being worn for at least 10 h. Many studies included at least one weekend day of the required wear-period. Since correlations seemed to be stronger with increased wear-time one could consider longer accelerometer wear-time for future studies. Additionally, in previous studies it could be shown that altering wear time led to significant differences in adherence as well as PA measures [
12,
13]. It is important to note that in this review wear-time information was investigated only in the 57 included studies, and not in all available studies using accelerometry. Nevertheless, the identified inconsistencies in wear-time requirement within the 57 investigations demonstrate the need for general guidelines for the use of accelerometers in free-living conditions in order to increase comparability of these and future studies.
It has been shown that healthy, younger, unemployed and smoking participants are less likely to be compliant regarding wearing-time of accelerometers while participants that are suffering from a serious disease may be more interested in participating in research and thus may be more compliant.
Investigations reviewed in this manuscript compared PA scores of questionnaires with PA measures from accelerometers. In the 57 investigations, correlations between questionnaires and accelerometry were weak to moderate. This finding is in agreement with previous reviews [
80,
81]. Potential explanations for this result might be associated with the advantages and disadvantages of both methods. Questionnaires can assess all types of PA, including stationary activities such as weight lifting. They can also cover long time frames. However, due to the complex and subjective nature of the gathered information, they may be subject to limitations in recollection or to recall bias, such as estimating or recalling the incorrect intensity [
5,
6,
82]. Alternatively, accelerometers assess PA continuously and objectively. Unlike questionnaires, they are not suitable for long-term measurements and thus seasonable activities can be captured only through repeated administration. This is expected to reduce correlations. There are further aspects that limit PA measurement by accelerometry, such as the devices not being able to cover stationary activities, strength training, or cycling. Water-based activities can also lead to misclassifications in individual PA measurement in cases where the sensors are not waterproof or not worn during that activity. In addition, the wearing of an accelerometer itself may promote PA [
16].
Data retrieved from accelerometers are commonly expressed as “counts”. This non-dimensional unit cannot be meaningfully interpreted, and therefore there exists the need to convert counts to an informative measure of PA, such as METs or kilocalories (kcals). With the help of regression equations, accelerometer counts are translated into measures of energy expenditure and measured PA can be classified into different intensities. There are many different regression equations reported, and depending on which accelerometer was used to determine the amount and intensity of PA, correlations with questionnaires vary, as different data processing algorithms result in different values of PA outcome measures [
10]. Bassett et al. [
83] reported that accelerometers may over-predict energy expenditure during walking while they may under-predict energy expenditure of many other activities. In the 57 studies, accelerometry data was reported as MET scores, time spent in physical activities, accelerometer counts per minute, or step counts. Questionnaire data was also reported in various measures (e.g., minutes per day, hours per week). This variation limits the ability to compare results across studies. Data processing guidelines for accelerometry would allow comparability among studies.
Among 25 studies, vigorous activity was more strongly correlated with self-report in men than in women ((e.g.,
r = 0.43, (
P < 0.05) men vs.
r = 0.05, (n.s.) women [
47] or
ρ = 0.23,
P < 0.001 men;
ρ = 0.09,
P < 0.05 women [
43])). This could be explained by the fact that men have higher levels of vigorous PA [
24,
84], which is more easily assessed by questionnaires. Women tend to engage more in light PA, which is the most challenging type of activity to recall because it is most dominant in daily life as, for example, in household activities [
24]. Correlations for light physical activities were investigated by Emaus et al. [
31]. They showed negative correlations for self-reported and objectively measured leisure activities with light PA among both men and women (
ρ = −0.23,
P < 0.05 and
ρ = −0.22,
P < 0.05), whereas weak to moderate correlations were reported for work activities (
ρ = 0.29,
P < 0.01 men and
ρ = 0.40,
P < 0.001 women).
Only 6 studies investigated PA correlations by BMI categories and of those, there were small differences in defining the BMI categories, thus making comparison across studies challenging. The four investigations presented their results in different manners concerning BMI categories as well as different PA intensity categories. This inconsistency, once again, shows the importance of general guidelines to enable a reasonable comparison across studies like this.
Although there were no conclusive findings suggesting stronger associations by age group (<65 years vs. ≥65 years) correlations tended to be slightly higher among participants in the younger groups. Notably, most PA questionnaires are designed for younger populations; the focus of these studies is more on sports and recreational activities and therefore, do not meet the criteria for the elderly.[
85] Kowalski et al. investigated the agreement between objective and self-reported PA in older adults and found generally weak to moderate correlations (
r = −0.02–0.79) [
17]. Older people are more likely to engage in activities that are most inaccurately assessed by questionnaires [
7]. This might be an explanation for the slightly weaker correlations among the elderly reported here.
While studies assessing the correlations between questionnaire and accelerometer data are the primary focus of this review, an alternative method to assess agreement between two quantitative measurements are Bland-Altman plots [
86]. Prior research has shown that bias in questionnaires can be revealed by Bland–Altman plots, while it may remain undetected by the use of correlation coefficients [
87]. Therefore, studies using this graphical method may provide additional valuable insights [
88]. Of the 57 studies presented in this review 18 utilized Bland-Altman plots to evaluate agreement between the mean differences of questionnaire and accelerometer data [
21,
23,
28,
35,
36,
40,
42,
47,
51,
52,
56,
57,
60,
62,
68‐
70,
73]. We present the results of two studies exemplary. Interested readers are advised to consult the references provided above. Dahl-Petersen and colleagues [
62] reported the results of Bland-Altman agreement methods and correlation coefficients. They observed moderate validity for questionnaire-based overall PA from the IPAQ compared to accelerometer data (
r = 0.20–0.35,
P < 0.01). Bland-Altman agreement analyses showed relatively small median differences for all measures of PA; however, moderate-intensity PA was substantially greater when reported by IPAQ when including walking [
62]. Similarly, a study in an Asian population [
69] showed a higher estimate of self-reported PA using the IPAQ compared to accelerometer data. These examples illustrate that, beyond correlation coefficients the Bland-Altman method provides additional information on the agreement between questionnaires and accelerometers.
The strength of this review is the inclusion of more than 50 studies with at least 100 participants which results in increased stability of observed associations. However, the varying measurement conditions and methods complicate comparison of findings from different studies. The reported accelerometry metrics in this review are derived from the individual studies and thus can differ. Questionnaires that assess PA are variable, with differences in number of items, time frame, focus, or background and characteristics of study population, which further complicates comparisons among different studies. A further limitation is that information on the exclusion of PA bouts of less than 10 min, which can have a significant effect on the correlations, was not always available. In this review, only the available correlational information from each investigation was used to compare results from the 57 studies in order to facilitate comparison among all reviewed studies. However, there are also other well-established methods to demonstrate associations (e.g., regression). Furthermore, due to the minimum number of required participants for inclusion into this review, most studies included healthy participants from the general population.
Only eight studies included participants with breast or prostate cancer, arterial diseases, multiple sclerosis, fibromyalgia, total hip arthroplasty, or rheumatoid/osteoarthritis [
26,
32,
33,
67,
73,
89]. PA plays a significant role in the prevention or progression of different diseases [
1]. This fact illustrates the importance of continued research of PA not only in healthy populations, but particularly in diseased cohorts in order to establish guidelines for patients or their physicians; Patients diagnosed with a disease such as cancer are often motivated to modify their lifestyles [
15].
This review highlights the need for further research on the assessment of PA in studies. Due to the inconsistent correlations, the different aspects measured by questionnaires and accelerometers and some differences in the dimensions studied, future investigations should ideally use both questionnaires and accelerometers to gain the most accurate possible and complementary information. Needed are also guidelines for accelerometer settings, data processing and wear methods and the summaries presented in this review may help foster these. As reported in this review there were only a few studies investigating PA in diseased populations. Due to the importance of PA in the prevention of many diseases, such as cancer, more investigations relating to PA assessment in diseased populations are needed.