Post 4. Measuring children’s physical activity using accelerometers – overview

Anyone involved in physical activity (PA) and health research knows that actually measuring children’s PA is fraught with challenges, such is the complexity of this behaviour. Kids routinely switch their activity behaviours throughout the day, depending on their routines, social interactions, physical environments, and even the weather! Throughout a typical day kids will accumulate time through the full intensity spectrum from being sedentary, in light, moderate, and vigorous PA. Getting an accurate measurement of these behaviours is difficult, and really researchers only end up with ‘estimates’ of PA behaviours and time spent in the different intensities, rather than precise values. This short post will introduce accelerometers as PA measurement tools, and discuss some issues, challenges, and developments in this area.


For many researchers, accelerometers are the accepted method of estimating PA, with kids typically wearing these activity monitors on the wrist, hip, back, or leg (depending on the research questions, protocol, and brand of monitor) for 7 or more days to get an idea of usual PA levels. Accelerometers can produce a massive volume of important information (for example, capturing accelerations/movement 100 times per second over 7 days), which can help us to better understand patterning of activity behaviours, associations between PA and health outcomes, or correlates, changes in PA over time, and the effectiveness of interventions or programmes which aim to increase PA levels. UK public health recommendations for kids’ are that at least 60 minutes per day should be accumulated in PA that is of moderate-to-vigorous intensity (commonly referred to as MVPA). Accelerometers are really useful because they record PA at high frequencies and produce time-stamped data, traditionally in the form of movement counts, and more recently as gravity units (more on this later on). This allows researchers to apply cutpoints or threshold values to their data to estimate time spent in MVPA (or other intensities of interest). So, for example it is common for researchers using the ActiGraph GT3X accelerometer worn at the hip to apply the vertical axis cutpoints developed by Kelly Evenson and colleagues (where MVPA is defined as being equivalent to at least 2296 counts per minute), which are reported to classify kids’ PA of different intensities with acceptable accuracy across a range of ages.

An issue though is that many different sets of cutpoints exist, depending on factors like the accelerometer brand and model, sample population of interest, and the protocols used in the different calibration studies from which the cutpoints are derived. This issue has been termed the cutpoint conundrum, and is unhelpful because it can confuse comparisons between study findings and translation of results to public health goals (e.g., prevalence of achieving daily PA recommendations). For example, Mota and colleagues compared two sets of ActiGraph cutpoints applied to the same group of children and found that boys’ MVPA differed by 114 minutes per day, and girls’ differed by 84 minutes per day. When these results were applied to daily PA recommendations of 60 minutes daily MVPA, the differences were vast. 96% of boys and 87% of girls achieved the recommended level using one MVPA cutpoint, but when the other was applied, the prevalence of achieving the guidelines was massively reduced to 17% (boys) and 5% (girls). Other studies have reported similar findings, with a very recent one published in the Scandanavian Journal of Medicine and Science in Sports. The authors compared sedentary time and PA estimates in a group of overweight/obese children. The children wore wore hip and wrist accelerometers and a number of outcome metrics were used based on various cutpoints to generate the estimates of sedentary time and PA. Like Mota et al., the authors found that time being sedentary and in different intensities of PA differed dramatically across cutpoints based on the different wear location of the accelerometer and the accelerometer outcome metrics. The authors concluded that “it is not possible (and probably never will be) to know the prevalence of meeting the PA guidelines based on accelerometer data since apparent differences range from almost zero to nearly everyone meeting the guidelines”.

So, the lack of comparability between accelerometer studies due to the cutpoint conundrum is something of a barrier for research users (particularly students, practitioners, commissioners, and policy makers) to make sense of the evidence base.

So what’s been done to address this issue?

A problem with cutpoints is that they are specific to the sample of people who took part in the cutpoint calibration study (i.e., the research study that generates the cutpoints), and then the cutpoints are used by other researchers and applied to different samples. Cutpoint calibration studies typically involve quite modest sample sizes but they should aim for the participants to be representative of the population of interest (e.g., healthy 8-10 year old boys and girls). Even when this is the case, use of the cutpoints by others introduces some degree of error into the resultant data. Applying published cutpoints can be avoided if studies generate their own population-specific cutpoints, by using kids from the study sample in the cutpoints calibration study. For example Mackintosh et al. generated ActiGraph cutpoints as a sub-study which were subsequently applied to examine the effectiveness of the CHANGE! school-based intervention. An alternative approach is to develop cutpoints for each individual child involved in the study, which is known as individual calibration. This approach requires a pre-study individual calibration phase before the kids wear the accelerometers to estimate their free-living PA. Both approaches place an additional time and resource burden on the research project, and the kids are asked to give more time to the project. However, the trade-off is that in theory, estimates of PA may be more representative of the children’s actual levels.

The last decade has seen efforts to move away from cutpoints, through the advent of machine learning techniques to predict PA mode and intensity. Machine learning involves the construction of algorithms that can learn from, and make predictions on patterns of data. This approach lends itself to prediction and as such has been applied to PA measurement research involving accelerometers. In 2012 Stewart Trost and colleagues showed that the degree of error in predicting children’s PA intensity (reported as MET values) was lower using an artificial neural network machine learning method, compared to traditional regression-based approaches. This machine learning approach relies on data collection through prescribed PA protocols where the activities being performed are known, and which therefore become ‘labelled data’ (e.g., Pavey et al., 2016). A review of machine learning approaches applied to PA accelerometer data was included in the 2016 review of emerging analytical techniques for objective PA measurement, published by Cain Clark and colleagues. More recently, a systematic review of machine learning approaches used for the validation and calibration of accelerometers highlighted the favourable predictive ability of various machine learning techniques for PA intensity and type derived from raw acceleration data, regardless of wear location. The authors though cautioned that the high predictive accuracy of machine learning used in lab-based studies is not reproducible for free-living PA. Accuracy though is reduced when this approach is applied to free-living contexts. ‘Unsupervised’ machine learning that employs non-labelled data may allow better estimates of free-living activity behaviours to be generated, but these techniques are only recently emerging (see van Kuppevelt et al., 2019 as a recent example).

Recent developments in measuring physical activity in children and young people using accelerometers

Historically, accelerometer manufacturers used proprietary acceleration filtering algorithms to convert the raw acceleration signals to ‘counts’. Counts are dimensionless values that researchers use to estimate PA levels from their research participants (i.e., using the counts per minute cutpoint approach). The last decade has seen a growth in the adoption of raw accelerometer data and open source processing, which has massively impacted on the degree to which PA data collected from certain accelerometers can be considered comparable. The introduction of the GENActiv accelerometer was significant because only raw acceleration data were produced and made available to users, which could then be processed using open source software. ActiGraph files can now also be converted to raw format, and the Axivity accelerometer also produces only raw accelerations and employs open sources data processing. As more researchers start to use raw acceleration signals rather than the traditional counts approach, efforts have been made to investigate the ‘backwards compatibility’ of raw acceleration data to counts data reported in the multitude of earlier counts-based studies (see Brond et al. 2018 as another example). Moreover, in response to the growth of raw acceleration data in PA research, open source software has become available to process and analyse these data. The GGIR R package developed by Vincent van Hees has been used extensively with GENEActiv, ActiGraph, and Axivity data and has grown organically to become the application of choice for many researchers using raw acceleration data to study not only PA and sedentary time, but also sleep. For example, GGIR has been used to process data in large scale projects such as the Millennium Cohort Study and UK Biobank.

In 2018, Rowlands et al. showed that the raw acceleration output from the GENActiv and Axivity devices was comparable through the full intensity range, with raw ActiGraph output approximately 10% lower. This study is significant because it means that for the first time, research using these three devices can be compared because the ‘black box’ element of the proprietary counts algorithms is removed. The caveat here though is that the researcher decisions related to recording frequency, wear location, non-wear criteria, and acceleration cutpoints need to also be comparable between studies using raw data from these 3 different devices. So, although the introduction of raw accelerations has tremendous potential, the use of cutpoints still presents a barrier to consistent and standardised interpretation and use of the data. To address this issue Rowlands introduced two new accelerometer metrics, termed average acceleration and the intensity gradient, which refer to the volume, and intensity profile of PA, respectively. The rationale for these metrics is that they based directly on raw accelerations and therefore not subject to the variation in output produced by application of different cutpoints. Rowlands argues that the metrics are therefore meaningful (relative to health outcomes), interpretable (translatable to public health messages), and comparable (with other studies). In relation to the relevance of these metrics to health outcomes, Rowlands and colleagues demonstrated independent associations with obesity-related outcomes and physical function in adolescent girls and diabetic adults. We found similar results in primary school children when we analysed the associations between the new metrics and a range of health indicators related to obesity, cardiorespiratory fitness, metabolic syndrome risk, and quality of life (manuscript under review).

As the utililty of software like GGIR continues to grow and more studies employ raw accelerations to produce PA outcomes, the greater the potential for using these types of new accelerometer metrics in PA research. Standardised raw acceleration data can be interpreted and analysed ‘post-processing’ by the original researchers or by others, thus allowing the data to be studied in various ways, whether this is applying appropriate cutpoints to study prevalence of meeting PA guidelines, or comparing contemporary raw acceleration data to previous research that employed count-based outcomes. Rowlands suggests that as more researchers continue to generate more raw acceleration data, there is scope to develop population-referenced age- and sex-specific percentile norms for average acceleration and intensity gradient, in much the same way as has been done for BMI, fitness, and ActiGraph counts. This would be a valuable resource for researchers, clinicians, and practitioners alike.

As the PA, computer science, and engineering research communities continue to develop new ways of processing, analysing and interpreting accelerometer data, one hopes that this will correspond with standardised metrics that are comparable between studies, and which can facilitate the interpretation and application of data to address important PA research questions.

Leave a Reply

Your email address will not be published. Required fields are marked *