Paper The following article is Open access

Identifying adults' valid waking wear time by automated estimation in activPAL data collected with a 24 h wear protocol

, , , , , , and

Published 21 September 2016 © 2016 Institute of Physics and Engineering in Medicine
, , Citation Elisabeth A H Winkler et al 2016 Physiol. Meas. 37 1653 DOI 10.1088/0967-3334/37/10/1653

0967-3334/37/10/1653

Abstract

The activPAL monitor, often worn 24 h d−1, provides accurate classification of sitting/reclining posture. Without validated automated methods, diaries—burdensome to participants and researchers—are commonly used to ensure measures of sedentary behaviour exclude sleep and monitor non-wear.

We developed, for use with 24 h wear protocols in adults, an automated approach to classify activity bouts recorded in activPAL 'Events' files as 'sleep'/non-wear (or not) and on a valid day (or not). The approach excludes long periods without posture change/movement, adjacent low-active periods, and days with minimal movement and wear based on a simple algorithm. The algorithm was developed in one population (STAND study; overweight/obese adults 18–40 years) then evaluated in AusDiab 2011/12 participants (n  =  741, 44% men, aged  >35 years, mean  ±  SD 58.5  ±  10.4 years) who wore the activPAL3 (7 d, 24 h d−1 protocol). Algorithm agreement with a monitor-corrected diary method (usual practice) was tested in terms of the classification of each second as waking wear (Kappa; κ) and the average daily waking wear time, on valid days. The algorithm showed 'almost perfect' agreement (κ  >  0.8) for 88% of participants, with a median kappa of 0.94. Agreement varied significantly (p  <  0.05, two-tailed) by age (worsens with age) but not by gender. On average, estimated wear time was approximately 0.5 h d−1 higher than by the diary method, with 95% limits of agreement of approximately this amount  ±2 h d−1.

In free-living data from Australian adults, a simple algorithm developed in a different population showed 'almost perfect' agreement with the diary method for most individuals (88%). For several purposes (e.g. with wear standardisation), adopting a low burden, automated approach would be expected to have little impact on data quality. The accuracy for total waking wear time was less and algorithm thresholds may require adjustments for older populations.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction

Excessive time spent in sedentary behaviours—sitting or reclining while awake with low energy expenditure (⩽1.5 metabolic equivalents) (Sedentary Behaviour Research Network 2012)—has been associated with several chronic diseases and premature mortality (Thorp et al 2011, Wilmot et al 2012, Cong et al 2014, Shen et al 2014, Biswas et al 2015). Evidence regarding the health consequences of sedentary behaviour and intervention effectiveness can be improved with the use of monitors that can assess time spent in sedentary behaviour objectively and accurately during free-living conditions. The activPAL is a small, unobtrusive, thigh-worn monitor that can meet such a need, by accurately measuring periods spent in sitting/lying posture (Lyden et al 2012). However, the device output includes periods of sitting/lying that do not constitute sedentary behaviour, such as sleep and non-wear.

The methods researchers have applied for sleep and non-wear removal as identified in a recent review (Edwardson et al 2016) are varied and mostly high burden, limiting accuracy and the feasibility of collecting sedentary behaviour measures. For continuous (24 h) wear protocols, usual practice has involved excluding diary-reported sleeping periods (Ryan et al 2011, Alkhajah et al 2012, Craft et al 2012, Gorman et al 2013, Reid et al 2013, Berendsen et al 2014, Aguilar-Farias et al 2015). These low-burden methods have no published validity and key limitations (Edwardson et al 2016). One study (Godfrey et al 2014) excluded very long sitting/lying bouts (>8 h) from their sitting estimates, claiming they were likely sleep. However, sleep can be  <8 h and interspersed with movement. Activity has been examined during assumed waking periods (Godfrey et al 2014, Smith et al 2014, Barreira et al 2015b), such as 08:00–20:00, that unlikely apply to every individual every day. Recently, Chastin and colleagues (Chastin et al 2014) estimated each individual's waking day as beginning with the first standing bout after  ⩾2 h of continuous sitting/lying during the hours 00:00–09:00, and ending with the first bout of standing before  >3 h of sitting/lying after 22:30. However, sleep can be non-nocturnal, such as can be the case for older adults with polycyclic sleeping patterns, or for shift workers. In overnight-removal protocols, researchers only need to identify non-wear. Acceleration data have been used to this end (Harrington et al 2011, Barreira et al 2015a) with an unknown degree of validity.

To address the need for validated, low-burden automated methods, we created a simple automated algorithm to isolate adults' valid waking wear periods within activPAL data collected with a continuous wear protocol. It was developed and refined using data from a study of UK overweight/obese young adults aged 18–40 years, then tested in a large, population-based study of Australian adults aged 35–89 years. In the absence of a feasible gold-standard for free-living data, we compared the algorithm with usual practice (a diary-based method). We also considered our validity findings in light of an automated method (van der Berg et al 2016) that emerged after our review (Edwardson et al 2016, van der Berg et al 2016).

Methods

Development and validation studies conformed to the Declaration of Helsinki. All participants signed written informed consent. Ethics was approved by the Nottingham National Health Service Research Ethics Committee (Sedentary Time and Diabetes, STAND) and the Alfred Health Human Ethics Committee (Australian Diabetes Obesity and Lifestyle Study, AusDiab).

Algorithm development study

The STAND study (Wilmot et al 2011) included 187 overweight and obese adults aged 18–40 years (n  =  125 with relevant data). Participants wore the activPAL3 monitor continuously, 24 h d−1, for 10 d. Monitors were waterproofed, then attached to the mid-line anterior aspect of the right thigh by an adhesive medical dressing. Detailed wear and re-attachment instructions were provided to participants along with a paper diary to record the times they went to bed, went to sleep, woke up, arose from bed, and any times they removed their monitor.

Automated algorithm development

The algorithm development process was iterative and collaborative, with a strong element of trial and error. Firstly, the investigators evaluated the current practices in the literature and the procedures and experiences in removing sleep and invalid data that have been used in our studies to date (published and unpublished) employing the activPAL monitor (Edwardson et al 2016). The relevant underpinning principles and the general algorithm rules were determined, considering current practices as well as salient observations about sleep and monitor performance. These are outlined in figure 1. Key decisions, based on the current state of the field, were that the approach should be simple, focus not on 'when' sleep may occur but on 'what' sleep and removals are (for activPAL data), and fulfil immediate, addressable needs. Accordingly, we decided to develop a simple algorithm based on knowledge of the behaviours (sleep, activity, monitor wear), that could be tested using available data. It removes non-wear periods, non-wear days, and what we have termed 'sleep' from the valid waking wear data. 'Sleep' is the broader period the person spends in bed, from 'into-bed' or 'lights out' time to finally awakening or arising from bed, including brief periods out of bed such as to visit the bathroom. We did not aim to provide sub-classifications of the excluded data, such as sleep versus non-wear, or time asleep by biological definitions versus other time in bed.

Figure 1.

Figure 1. Outline of considerations in the early stages of algorithm development.

Standard image High-resolution image

Next, specific rules for the algorithm were discussed, trialled and decided upon based on performance in the STAND study data. Early attempts at algorithms implementing specific rules were trialled and reported at conferences. Coding issues were rectified and ultimately a single set of specific rules was chosen (figure 2). The thresholds for the rules (figure 2) can be adjusted for different populations; those we used are reported here. We implemented and report two versions, described for convenience as versions A and B of the same algorithm. These use the same rules, with minor variations that arose because they were implemented in different software by different coders. Assessing two versions evaluates the robustness of the algorithm to minor differences in how the rules may be applied by a different coder and in different software packages. It also provides two sets of freely available source code for use or checking for version A (DB's STATA code; supplementary materials 1 (stacks.iop.org/PM/37/1653/mmedia)) and version B (EW's SAS code; supplementary materials 2).

Figure 2.

Figure 2. An automated approach to estimating valid waking wear periods from activPAL events data collected in adults using a continuous wear protocola.

Standard image High-resolution image

The automated algorithm

Figure 2 summarises the algorithm's general and specific rules, and displays the minor differences between the two versions. A glossary (supplemental material 3) contains further information about the key terms and definitions of both versions of the algorithm. The algorithm requires only data that are routinely available in the proprietary activPAL Events files. Events files have a separate row for each continuous period of sitting/lying and standing, and each individual step/stride. The algorithm attributes the entirety of each bout to the day on which the bout begins (see supplemental material 3).

The algorithm rules are summarized briefly here. The algorithm's first step finds long periods that are most likely to be sleep or non-wear. Sleep/non-wear bouts were identified as (1) the longest bout per 24 h period (from noon-to-noon each day) that lasts  ⩾2 h, or (2) any very long bouts lasting  ⩾5 h. This allows sleep/non-wear to occur at any time, any number of times (including never) within a 24 h window. Because sleep can register as multiple periods of sitting/lying interspersed with real or erroneously detected posture changes and stepping, the next step iteratively examines surrounding bouts and determines whether they are more likely additional sleep/non-wear (limited movement) or waking wear (more movement). Bouts were 'surrounding' if any portion was within a 15 min window before or after a sleep/non-wear bout. All bouts in the sleep window were classed as sleep/non-wear when the window contains any of these: a sitting/lying or standing bout that is long (⩾2 h), or moderately long (⩾30 min) with very few (⩽20) steps in between; a sleeping/non-wear bout; or, posture changes without intervening steps. This step repeats until no more sleep/non-wear is found. The third step identifies invalid days from limited wear and movement, using wear criteria typical of the literature and movement criteria loosely based on prior approaches (Mutrie et al 2012). Specifically, days were classed as non-wear if they met any of these criteria: limited variation in activities (⩾95% of waking wear in any one activity); limited stepping (<500 steps); or, limited waking wear time (<10 h). The final step is quality control. We validated our algorithm against a diary-based method. Other possibilities for quality control (not performed here) are shown in supplementary material 4.

Validation study

The AusDiab study was initiated in 1999/2000 as a general, population-based sample of community-dwelling adults aged  ⩾25 years (n  =  11247) sampled probabilistically from non-rural areas across Australia by a multistage process (Dunstan et al 2002). In the third follow-up (2011/12), 4614 adults aged  ⩾35 years attended the onsite assessment at 46 sites across Australia (Tanamas et al 2013). A subsample of 782 participants were fitted at the onsite assessment with the activPAL3 monitor (77% of the 1014 invited to participate) (Healy et al 2015) and valid data (i.e. at least one valid day of wear by the diary-based method) were obtained from 741 (95% of those provided a monitor). Waterproofed monitors were affixed on the midline, one third of the way down the thigh with a breathable hypoallergenic dressing. Trained staff usually affixed the monitors but simply checked the monitor placement was correct for any participant who preferred to self-attach the monitor privately. Participants were asked to wear the monitor at all times over a seven-day period beginning the day after the onsite assessment and to not remove the monitor, even during showering, bathing or swimming, or for sleep unless it was likely to be lost or damaged (e.g. swimming in the ocean). Dressings and swabs to reattach the monitor were provided along with a diary covering sleep (i.e. 'lights out') and wake times, and monitor removals (if any). The monitors were initialised and downloaded using the activPAL software 6.4.1 (PAL Technologies Ltd, Glasgow, UK). Monitors were either initialised to record immediately or in advance, from midnight of the first intended wear day.

Diary data were entered into an MS Access database (n  =  776, n  =  5387 d), checked for missing times, errors in reported dates (non-consecutive) and times (e.g. am/pm), and converted to the same time-zone as the monitor data. Staff estimated missing sleep/wake times from the monitor (n  =  157 participants, n  =  299 d), looking for a single or multiple long periods of sitting/lying between days. Checks occurred also after processing, using graphs (heatmaps) of activity classifications over time for every participant, with staff re-checking the diary upon encountering suspicious data (e.g. very long periods in the valid data that look like non-wear). These were classed as wear if the participant had indicated they did not remove the monitor, otherwise non-wear.

Data processing

Data were processed using STATA v14.0 (StataCorp Texas, USA) for version A, and SAS 9.4 (SAS Institute Inc., Cary, USA) for version B. The comparison method was not a gold standard. It was a diary-based method, consistent with usual practice (Edwardson et al 2016), with monitor corrections, as reported previously (Healy et al 2015). Monitor corrections based on surrounding movement were used as diary reporting is often imprecise (e.g. wake at 6 am, which unlikely occurred at precisely 06:00). Events were initially identified as awake and as non-wear if they mostly (i.e.  ⩾50%) occurred during these diary-reported periods (e.g. wake to sleep). For example, with a diary-reported waking period of 6 am– 10 pm on a particular day, an event that began at 5:58 am and finished at 6:10 am that day would be classed as awake, initially, while an event that began at 5:50 am and ended at 06:02 am that day would not. Non-wear included removals, all time before wake on the first day, and all time after sleep on the last day. Then, the beginnings and ends of sleep periods initially identified were adjusted to not begin/end until the first/last event lasting at least 20 min. The diary criteria reported previously (Healy et al 2015) contained elements that are not appropriate for comparing diary and automated procedures. For better comparability, the diary days were classified as invalid if they had  <10 h of waking wear, using the definitions of 'days' as per each algorithm version (see Glossary). To align the diary with version A, the entire bout was treated as 'sleep'/non-wear if any event in the bout was 'sleep'/non-wear according to the diary method.

Statistical analysis

Analyses were performed in SAS 9.4 and STATA 14.0. Significance was set at p  <  0.05, two-tailed. For each individual, we examined the agreement of the bout classifications as valid waking wear (yes/no) as kappa, frequency-weighted by bout duration (rounded to the nearest second or rounded up to one second) to indicate agreement on an approximately second-by-second basis. Differences in agreement by age and gender were examined using a non-parametric test of medians. Agreement in average daily waking wear time was assessed using the Bland–Altman approach, with variation in mean differences and error across average values tested using regression models (Brown and Richmond 2005). To indicate the impact of choosing one data reduction method over another, we estimated sitting, standing and stepping time using each algorithm and the diary-based method independently. We examined means and standard deviations of time spent in the various activities, with and without correction for waking wear time and correlations of these algorithm measures with the diary-based measures.

Results

The validation sample participants (table 1; n  =  741) covered men (44%) and women of a wide range of ages (36–89 years, median  =  57 years), and socioeconomic backgrounds, with 37% working full time and 30% retired. Most were born in Australia or New Zealand (81.6%) and very few reported currently smoking (7%). Many were categorised as overweight (43%) or obese (25%). The average BMI (mean  ±  SD) was 27.6  ±  5.1 kg m−2. There were some small selection biases in age, socioeconomic position and waist circumference. The algorithm development study participants (68% female) were younger (33.8  ±  5.6 years) and heavier (BMI 34.6  ±  8.9 kg m−2) than the validation sample.

Table 1. Characteristics of the validation and development samples.

  Mean (SD), median or n (%) Pa
AusDiab Wave 3 attendees (n  =  4614) Validation sample (n  =  741)
Age, years 59.2 (9.9), 60.5 58.5 (10.4), 57.0 <0.001
Men, n(%) 2062 (46.0%) 324 (43.7%) 0.547
Ethnicity, n(%)     0.058
 Australia/New Zealand 3618 (79.2%) 605 (81.6%)  
 Other English speaking 550 (12.1%) 81 (10.9%)  
 Other non-English speaking 446 (8.7%) 55 (7.4%)  
Married/defacto, n(%) 3524 (78.5%) 562 (76.6%) 0.280
Smoking status, n(%)b     0.874
 Never smoker 2527 (55.5%) 414 (56.3%)  
 Ex-smoker 1683 (37.9%) 271 (36.8%)  
Current smoker 297 (6.7%) 51 (6.9%)  
Body mass index, kg m−2 27.5 (5.0), 27.5 27.6 (5.1), 27.2 0.497
Waist circumference, cm 93.9 (13.8), 94.3 93.5 (14.0), 93.8 0.036
Employment status, n(%)     0.007
 Full time 1456 (32.2%) 272 (36.7%)  
 Part time 956 (21.0%) 157 (21.2%)  
 Retired 1542 (34.7%) 221 (29.8%)  
 Other not working/missing 660 (12.1%) 91 (12.3%)  
Gross household income, n(%)     <0.001
   <  $30 k 818 (18.2%) 111 (15.0%)  
 $30–  <  $60 k 1033 (23.0%) 168 (22.7%)  
 $60–  <  100 k 912 (20.3%) 153 (20.6%)  
   ⩾  $100 k 1340 (29.8%) 260 (35.1%)  
 Refused/don't know/missing 511 (8.8%) 49 (6.6%)  
  STAND baseline attendees (n  =  187) Development sample (n  =  125) pa
Age, years 32.8 (5.6), 33.7 32.9 (5.5), 33.8 0.708
Men, n(%) 59 (31.6%) 39 (31.2%) 0.883
White European ethnicity, n(%) 150 (80.2%) 99 (79.2%) 0.621
Smoking status, n(%)     0.363
 Never smoker 109 (58.3%) 76 (60.8%)  
 Ex-smoker 38 (20.3%) 26 (20.8%)  
 Current smoker 40 (21.4%) 23 (18.4%)  
Body mass index, kg m−2 34.6 (4.9), 33.8 34.3 (4.7), 33.5 0.230
Waist circumference, cm 103.3 (13.9), 101.0 102.5 (13.0), 101.0 0.497

ap for difference between included participants (n  =  741 in AusDiab; n  =  125 in STAND) and those excluded (not selected or did not provide data n  ≈  3873 in AusDiab and not providing data; n  =  62 in the STAND study). bFor Ausdiab: Current  =  any amount now and  ⩾100 cigarettes in lifetime, Ex  =  none now but  ⩾100 cigarettes in lifetime, and never  =  smoked  <100 in lifetime n  =  4507 attendees and 736 participants. Table presents mean (standard deviation; SD), median or sample n (%). For AusDiab data, the mean SD and % are corrected for the complex survey design using survey commands, linearized variance.

Both versions of the algorithm achieved near identical results for agreement with the diary-based method in the waking wear (yes/no) classifications of each second (table 2). The algorithm achieved a high median sensitivity (0.95), specificity (0.99) and chance-corrected agreement as indicated by kappa (0.94). Agreement was substantial or better (κ  >  0.6) for almost all participants (>97%) and was 'almost perfect' (Landis and Koch 1977) for most participants (88%). Agreement with the diary did not vary significantly by gender, but varied significantly by age (p  <  0.001), with less (but still good) agreement (median κ  >  0.9), seen in those aged  ⩾65 years than their younger counterparts.

Table 2. Agreement with the diary-based method in the waking wear classification (yes/no) of each second of activPAL data on days valid according to both methods.

n Statistic Version A Version B
741 (all available data)a Sensitivity 0.95 (0.89, 0.98) 0.95 (0.89, 0.98)
Specificity 1.00 (0.98, 1.00) 1.00 (0.98, 1.00)
κ (kappa) 0.94 (0.88, 0.97) 0.94 (0.88, 0.97)
Slight/no agreement (κ  ⩽  0.2), n (%) 1 (0.1%) 1 (0.1%)
Fair agreement (κ  >  0.2–0.4), n (%) 5 (0.7%) 5 (0.7%)
Moderate agreement (κ  >  0.4–0.6), n (%) 14 (1.9%) 15 (2.0%)
Substantial agreement (κ  >  0.6–0.8) 66 (8.9%) 64 (8.6%)
Almost perfect agreement (κ  >  0.8) 655 (88.5%) 656 (88.5%)
717 (reasonable wear compliance)b κ, Overall 0.94 (0.88, 0.97) 0.94 (0.88, 0.97)
κ, Men (n  =  317) 0.94 (0.88, 0.97) 0.94 (0.88, 0.97)
κ, Women (n  =  400) 0.94 (0.88, 0.97) 0.94 (0.88, 0.98)
p for difference (test of medians) p  =  0.967 p  =  0.622
κ, 35–44 years (n  =  68) 0.94 (0.91, 0.97) 0.94 (0.91, 0.97)
κ, 45–54 years (n  =  201) 0.95 (0.90, 0.98) 0.95 (0.90, 0.98)
κ, 55–64 years (n  =  247) 0.94 (0.89, 0.98) 0.94 (0.89, 0.98)
κ, 65–74 years (n  =  145) 0.92 (0.84, 0.96) 0.92 (0.84, 0.96)
κ, ⩾  75 years (n  =  56) 0.92 (0.87, 0.97) 0.92 (0.87, 0.97)
p for difference (test of medians) p  <  0.001 p  <  0.001

a  ⩾1 d classed as valid by the diary and the algorithm methods. bWith available data for comparison and reasonable compliance with the monitoring (⩾4 valid days by the diary method) for better comparison between population subgroups. Unless stated otherwise, the table presents median (25th, 75th percentile) of participants' agreement in classification of activPAL bouts (weighted by duration) as estimated by the automated method and the diary-based method.

The same status as to valid/invalid day was assigned to  >98% of days that occurred from the diary period onwards (table 3). The pre-diary period was not counted as monitors likely could have been worn for  ⩾10 waking hours at this time, since monitors were fitted on the day prior to the first diary day and were sometimes recording at that time. The algorithm excluded  <1% of diary-classified valid days as invalid and included only 3–4% of diary-invalid days as valid data. The discrepant classifications were seldom clear algorithm errors (n  =  8 d) or diary errors (n  =  5 d). Most occurred after the diary period and could reflect algorithm errors, or participants wearing the monitor after they ceased filling in their seven-day diary.

Table 3. Classification of each day as valid or invalid by the algorithm (version A and version B) compared with the diary-based methoda.

  Algorithm Diary-based method
During and after the period covered by the diary During the period covered by the diary
Invalidb Validc All Invalid Valid All
Version A versus diary Invalid 2431 (96.7%) 24 (0.5%) 2455 76 (67.9%) 24 (0.5%) 100
Valid 84 (3.3%) 4933 (99.5%) 5017 36 (32.1%) 4925 (99.5%) 4961
Total 2515 4957 7472 112 4949 5061
Version B versus diary Invalid 2428 (96.5%) 22 (0.4%) 2450 75 (67.0%) 22 (0.4%) 97
Valid 87 (3.6%) 4934 (99.6%) 5021 37 (33.0%) 4926 (99.6%) 4963
Total 2515 4956 7471 112 4944 5060

aTotal n days varies depending whether days were classified based on the day a bout or activPAL event began. bDays valid by algorithm but invalid by diary were for the following (mutually exclusive) reasons: algorithm errors (n  =  8 d); diary errors (n  =  5 d); and unclear which estimation is correct (remaining n  =  54 d by version A and n  =  57 d by version B). Wear status was not clear for which n  =  48 and n  =  50 d that were after the diary period, for days close to the 10 h threshold for waking wear (n  =  13 d); and when the difference was from incompatible definitions (n  =  3 d). cDiary-valid days were rejected by the algorithms for these reasons (more than one applied at a time): wear time was close to the 10 h threshold; the step count threshold for a valid day was not met by an apparently inactive participant; and, long periods during which participants did not report a removal were identified as sleep/non-wear.

The mean difference and the random error increased significantly with the average of both measures (all p  <  0.001) (figure 3). On average, the algorithm (version A and B, respectively), significantly overestimated waking wear time relative to the diary by 31 and 32 min d−1 (i.e. 3% of a 16 h waking day), with 95% limits of agreement of  −86 to  +149 min d−1 and  −87 to  +150 min d−1 (i.e.  −9% to  +16% of a 16 h waking day). Limiting to the days valid by both methods, the correlation in average daily waking wear time with the diary-based method was 0.67 (95% CI: 0.62, 0.72) for version A and 0.67 (0.61, 0.72) for version B.

Figure 3.

Figure 3. Agreement of algorithm with the diary in waking wear time per day for version A (a) and version B (b) waking wear time per day was calculated as the average across the days valid by both the algorithm (version A or B) and the diary.

Standard image High-resolution image

The mean amounts of waking wear time, sitting, standing and stepping varied by only a small degree (±0–3%) from those obtained by usual practice (table 4). The algorithm gave slightly higher estimates of mean sitting and lower estimates of mean standing and stepping than the diary-based method, with or without standardising the data for waking wear time. The standard deviations for waking wear time by the algorithm were larger by approximately 20% than by the diary method, while those for sitting, standing, and stepping varied only by  ±5% or less. Correlations with the diary-based estimates (table 5) were close to 1 for sitting, standing and stepping when standardising for waking wear time and for unstandardized standing and stepping (⩾0.97), were strong for unstandardized sitting time (r  =  0.88) and were lowest at r  =  0.63 for waking wear time.

Table 4. Descriptive data obtained for waking wear time and activity (sitting, standing and stepping) using each method (diary, algorithm) independently.

Activity Diary method (n  =  741) Automated algorithm
Version A (n  =  741) Version B (n  =  741)
Waking wear, h d−1 15.6 (1.0) 15.8 (1.2) 15.8 (1.2)
+1% (+18%) +1% (+19%)
Sitting, h d−1 8.8 (1.8) 9.1 (1.9) 9.1 (1.9)
+4% (+3%) +4% (+3%)
Standing, h d−1 4.8 (1.5) 4.8 (1.5) 4.8 (1.5)
2% (2%) 1% (2%)
Stepping, min d−1 119.5 (40.4) 116.7 (38.5) 116.6 (38.5)
2% (5%) 3% (5%)
Sitting, h 16h−1 d−1 9.0 (1.8) 9.2 (1.7) 9.2 (1.7)
+2% (4%) +2% (4%)
Standing, h 16h−1 d−1 5.0 (1.5) 4.8 (1.7) 4.8 (1.7)
3% (4%) 3% (4%)
Stepping, min 16h−1 d−1 122.3 (40.0) 118.2 (38.1) 118.1 (38.1)
2% (5%) 3% (5%)

Table shows mean (standard deviation; SD), estimated with STATA survey commands (linearized variance estimation) with % difference from diary method in mean (SD) in italics.

Table 5. Correlation between estimates of waking wear time and activity (sitting, standing and stepping) produced using each method (algorithm and diary) independently.

Activity Pearson's correlation (95% confidence interval)a
Version A (n  =  741) Version B (n  =  741)
Waking wear, h d−1 0.63 (0.58, 0.69) 0.63 (0.57, 0.69)
Sitting, h d−1 0.88 (0.86, 0.90) 0.88 (0.86, 0.90)
Standing, h d−1 0.98 (0.97, 0.98) 0.98 (0.97, 0.98)
Stepping, h d−1 0.99 (0.98, 0.99) 0.99 (0.98, 0.99)
Sitting, h 16h−1 d−1 0.97 (0.96, 0.98) 0.97 (0.96, 0.98)
Standing, h 16h−1 d−1 0.98 (0.97, 0.98) 0.97 (0.97, 0.98)
Stepping, h 16h−1 d−1 0.98 (0.97, 0.99) 0.98 (0.98, 0.99)

aEstimated by cluster-bootstrap method.

Discussion

This study, along with a recent publication (van der Berg et al 2016), present the first attempts at developing and validating automated estimation methods for isolating waking wear time in activPAL data collected via continuous (24 h) wear protocols. Across a very broad range of adult participants, for most individuals, the algorithm agreed acceptably with a referent method that was entirely independent of the algorithm. Notably, this finding was observed in a study population (AusDiab, Australian adults  ⩾35 years) that was independent and different from that used for the algorithm development (STAND study, UK overweight/obese adults aged 18–40 years), indicating good generalisability. Collectively, these populations covered most adult ages. The two versions in different software, with different coders and slightly different definitions performed near identically, indicating the algorithm is fairly robust to minor variations in how it may be implemented.

Without a gold standard, both methods can both contribute to disagreement. Nonetheless, the algorithm and diary-based method showed a high degree of agreement in many respects. For the valid days of data that would typically be used to examine physical activity and sedentary behaviour, each second was classed similarly as part of waking wear or not by both methods, with median sensitivity/specificity of 95%/>  99% and chance-corrected agreement of κ  =  0.94. The agreement was not constant across all levels of wear time, however, with a population similar to AusDiab in terms of waking hours and compliance, we would expect agreement in average daily waking wear time to be within a few hours for 95% of individuals (e.g.  −86 to 150 min d−1). This was more disagreement than van der Berg and colleagues saw between their algorithm and self-report waking hours (−1.1, 1.2 h d−1) (van der Berg et al 2016), reflecting either a lower level of accuracy, key differences in the validation process and populations, or both.

Our accuracy was comparable with that achieved with other monitors for detecting 'bed-rest', for example, a sensitivity/specificity of 97%/97% (waist-worn ActiGraph accelerometer) and 98%/97% (wrist-worn ActiGraph accelerometer) obtained in a small validation sample of youth in a laboratory setting relative to a whole-room calorimeter (Tracy et al 2014). Our agreement was less than has been reported for non-wear algorithms relative to their referent criteria, such as  −134, 143 min d−1 in free-living participants against a diary method (Winkler et al 2012) and  −52, 132 min d−1 against observation in a laboratory setting (Choi et al 2012). This might be expected with a free-living assessment against an imperfect referent, and the additional difficulties in identifying sleep, with a greater degree of movement difference for non-wear versus wear than sleep versus wake.

Older adults move less than younger persons during their waking day (Matthews et al 2008) and are particularly prone to sleep problems such as insomnia (Sivertsen et al 2009). A less pronounced movement difference between sleep and wake may have reduced algorithm accuracy. Tailoring of algorithm rules and/or thresholds to the population's movement patterns may improve accuracy. However, the algorithm relies heavily on assuming that very long periods spent in a single posture predominantly occur during sleep or non-wear. Our algorithm and any using similar general rules may have limited accuracy in populations prone to extremely prolonged sitting/ lying during their waking hours, who step very little, or who have interrupted sleep patterns.

The algorithm-derived activity measures correlated highly with those derived from the diary method, especially when standardising for waking wear time (⩾0.97). For many purposes, the practical impact of method choice is likely minimal. With correlations for waking wear time of 0.6–0.7 the impact may be more substantial for methods such as compositional analysis (Chastin et al 2015) that rely on estimates of each waking and sleeping activity. Quality controls may be used to increase accuracy. However, even with quality controls, the automated method likely entails less researcher burden than existing practice, avoiding data entry (when collecting paper-based diaries), cleaning and the need to estimate any unreported sleep or wake times.

Our algorithm excluded all but 48 of 2411 (or 50 of 2527) days that occurred after the diary ended as being non-wear without also excessively removing days during the diary period as invalid. This lends some support that a simple minimum wear rule with minimum movement criteria can screen out unwanted data, though thresholds other than 10 h, 95% and 500 steps should be tested and optimised. The 10 h wear rule is based more in common practice than evidence that it is optimal for removing unwanted data or providing unbiased coverage of a day. Other criteria could potentially lead to less bias and/or more valid days (with better reliability).

Study strengths included the large, diverse, population-based sample and the assessment of performance in free-living conditions, in a sample not used for algorithm development, over a seven-day continuous wear protocol reflective of usual practice for this monitor (Edwardson et al 2016). Though not population representative and with some biases in the subsampling and participation, generalisability is likely to be better than typical small-scale validity studies. Relatedly, this entailed an unavoidable weakness in the referent method, as direct observation was not a feasible option, and usual practice was used rather than a gold-standard. For the diary method, errors can include data entry and participants reporting times incorrectly (e.g. imprecise reporting, am/pm errors, not mentioning a removal, or mentioning a removal occurred for the other monitor they wore concurrently). Also, some disagreement would be expected as the algorithm excludes long periods with limited movement while the diary method excludes all reported monitor removals (regardless of duration or degree of movement) and days outside of the main wear period (regardless of whether the participant wore the monitor or not). Importantly, correlated errors, which overstate agreement (Rennie and Wareham 1998), are unlikely with such different sources of error for each method.

The findings may not generalise to populations not tested (children, adolescents, extremely elderly participants) or with limited inclusion in our study (shift workers, mobility impaired). Without rules to identify brief removals, the algorithm would not be recommended for studies using easy-removal attachment methods (e.g. pouches or PAL stickies). Additional rules for short removals would be needed. Further improvements may be obtained by more complex algorithms, such as by incorporating acceleration (for short removals and to separate sleep from non-wear), or raw data approaches, which may yield hitherto unavailable behavioural classifications (lying down, actual sleep) in the activPAL data. Preliminary work in these areas shows promise (Dall et al 2015, Lyden et al 2015).

Conclusion

A simple algorithm isolating valid waking wear time within activPAL events data generated similar (not identical) classifications to usual practice, with a much lower burden. It was robust to some variation in implementing the rules. Using the algorithm in a moderately large epidemiological dataset (n  ≈  700) suggested that for many purposes, adopting the low-burden algorithm is not likely to worsen data quality substantially relative to usual practice (a diary-based method), though the accuracy of either of these methods relative to true wake/sleep and wear/non-wear status remains to be seen.

Acknowledgments

The research was supported by the National Institute for Health Research (NIHR) Diet, Lifestyle & Physical Activity Biomedical Research Unit based at University Hospitals of Leicester and Loughborough University, the National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care—East Midlands (NIHR CLAHRC—EM) and the Leicester Clinical Trials Unit, United Kingdom. The views expressed are those of the authors and not necessarily those of the National Health Service, the NIHR or the Department of Health.

Elisabeth Winkler was supported by a National Health and Medical Research Council (NHMRC) of Australia Centre for Research Excellence Grant on Sitting Time and Chronic Disease Prevention—Measurement, Mechanisms and Interventions (#1057608). Genevieve Healy was supported by a NHMRC Career Development (#1086029) Fellowship. David Dunstan was supported by an NHMRC Senior Research Fellowship (#1078360) and by the Victorian Government's Operational Infrastructure Support Program. Neville Owen was supported by a NHMRC Program Grant (#569940), NHMRC Centre for Research Excellence Grant (#1057608), a NHMRC Senior Principal Research Fellowship (#1003960) and by the Victorian Government's Operational Infrastructure Support Program.

Please wait… references are loading.