Introduction
As the elderly population expands, increased health and social costs are expected with greater instances of conditions associated with low bone mass (e.g., osteopenia and osteoporosis). Osteoporosis is the most common age-associated disease of the musculoskeletal system, with over 22 million women and 5.5 million men in the European Union estimated to have this health condition [
1]. The major clinical consequence of osteoporosis includes fragility fractures, which affect around 35% of women and 17% of men in the United Kingdom (UK) and the rest of Europe [
2]. Not only is there a large health burden of osteoporosis, but the associated economic burden of fragility fractures is significant, with a recent review estimating associated costs of $17.9 billion and £4 billion per annum in North America and the UK [
3].
Diagnostic criteria for osteopenia and osteoporosis were developed by the World Health Organisation [
4], leading to operational definitions based upon standardized bone mineral density (BMD) assessments (T scores) at various skeletal sites. These criteria include a BMD T-score between −1.0 and −2.5 diagnosing osteopenia, and a T score of −2.5 or lower diagnosing osteoporosis [
4]. Currently, the criterion method for BMD measurement is dual energy X-ray absorptiometry (DXA), with diagnostic confirmation preferably sought prior to any pharmacological intervention. BMD measurement using DXA may not, however, be as widely available as desired due to the high cost of the equipment, the need for supported resources (including trained operators), and the need for a relatively permanent location due to limited transportability. Additionally, DXA omits a dose of ionising radiation, such that recommendations are that its use must be kept as low as reasonably achievable [
5]. Given the limiting factors of DXA and the need to screen individuals worldwide often in rural and resource-constrained areas [
6] to prevent large scale underdiagnosis of osteoporosis, other technologies have been developed with the hope they can provide suitable estimates of BMD without the associated costs, resource implications, and potential harms. One such technology includes quantitative ultrasound (QUS), which provides a non-invasive method to estimate BMD and other potentially relevant bone structural characteristics at peripheral skeletal sites [
7]. There are several types of QUS devices available (for a review, please see [
8]), with each measuring the velocity of transmission and amplitude of the ultrasound signal at specific skeletal sites, such as the hand or the heel, but also less commonly the tibia [
7]. The broadband ultrasound attenuation (BUA) measured by QUS is influenced by the density, architecture, and elasticity of the bone tissue (for a review, please see [
9,
10]).
Whilst the potential benefits of using QUS in terms of its cost and accessibility, particularly in paediatric populations and in areas with lower socioeconomic status, are clear, no direct measurement of bone mass is made. Systematic reviews have identified an association between QUS derived measurements and risk of overall fragility fracture [
11,
12], although there remain concerns regarding the precision of QUS, with measurement accuracy known to be highly susceptible to factors such as the thickness of the overlying tissue and orientation of the probe [
13]. In addition, comparative analyses of bone outcomes between QUS and DXA (as the reference) have produced inconsistent findings. To better assess the suitability of QUS to measure and screen BMD, large scale studies which are at present still limited, are required across different populations to accurately quantify reliability and agreement. Recently, Nguyen et al. [
14] quantified associations between calcaneal QUS and DXA in 1270 women and 773 men as part of the Vietnam Osteoporosis Study. Only modest correlations were obtained between BUA and BMD measured at the lumbar spine (
r = 0.34) and femoral neck (
r = 0.35), leading the authors to conclude that QUS was limited and unlikely to be suitable for osteoporosis screening. In a much smaller scale study, Weeks et al. [
15] compared calcaneal BUA and DXA measurements of bone mass in 389 children aged between 4 and 18 years. Weeks et al. [
15] also reported modest correlations (
r = 0.46 to 0.54) with poor agreement obtained between quartile rankings from the different measurements (27.3 to 38.2%). The conclusions of Weeks et al. [
15] and Nguyen et al. [
14] that QUS is not an appropriate tool for screening is in contrast to other authors that state QUS should be considered an accurate diagnostic tool, at least for certain populations such as postmenopausal women [
16]. More recent recommendations have called for QUS to be used as a pre-screening tool to reduce the number of DXA screenings required [
17]. Given divergent opinions on the use of QUS, there is still a need for further research, including large scale studies that seek to investigate the factors that may influence the appropriateness of QUS. One area that has received limited study is the reliability of QUS and factors that may influence measurement error. Herein, we aimed to further explore the potential utility of BUA estimations of BMD by first quantifying reliability and secondly by quantifying agreement with measures of BMD taken from DXA in a large sample. To do this, we made use of data available through the UK Biobank, which allowed for exploration of these questions in a very large population cohort.
Discussion
Herein, we aimed to determine whether calcaneal QUS could be used to produce reliable and informative data regarding BMD, considering the need for quicker, less expensive, less resource intensive, and less invasive methods that could be used at a population level. To achieve this, we made use of data available through the UK Biobank and first sought to determine the reliability QUS estimates of BMD using data taken from the left and right heel. Secondly, we sought to quantify agreement between BMD measurements recorded with QUS and those taken from the reference DXA. Measured in the absolute scale, QUS appeared to be reliable and most consistent for women and those with lower BMD measurements. Reliability decreased when measured in standardised scales, with large variation to be expected for BMD T scores. Similarly, when expressed in quartiles, a substantive proportion of individuals should be expected to vary between adjacent quartiles. Low to modest correlations were obtained between QUS variables and DXA BMD regardless of sex and region. These low correlations were accompanied by poor diagnostic performance, with low sensitivity and PPV for both osteopenia and osteoporosis diagnoses. Collectively, the results indicate that absolute QUS BMD data are reliable, but that these values are not likely to provide an accurate reflection of BMD of the whole body or of BMD at sites of clinical interest, such as the hip or lumbar spine. As such, using the same T score thresholds identified for DXA BMD would not seem to provide appropriate diagnostic criteria for QUS.
In this large population of middle-aged men (
n = 100,065; aged 58.7 ± 8.6 years) and women (
n = 116,688, aged 57.8 ± 8.3 years), assessment of reliability of QUS BMD was obtained comparing values between the left and right heel measured within the same testing session. Therefore, differences in measurements would be caused by both measurement error and true differences in BMD, which provides an upper bound determination of intra-session reliability. Given this limitation, the current study is not able to precisely quantify the actual intra-session reliability on a single heel, or indeed, the more important inter-session reliability on a single heel. From a practical perspective, however, it can be viewed that variation in QUS BMD measurements between the left and right heel should be low, such that this would indicate a stable and representative measurement of bone health. When expressed in absolute units, the standard deviation of the difference between left and right QUS BMD was equal to 0.12 g·cm
−2 for men and 0.07 g·cm
−2 for women indicating relatively small variation given central values of approximately 0.70 g·cm
−2 and maximum values of approximately 1.5 g·cm
−2. In contrast, differences between left and right heel appeared large and potentially unsuitable when expressed in standardised units. When expressed as a Z score, the standard deviation of differences was equal to 0.78 for men and 0.57 for women. From these initial values, we should expect 95% of QUS BMD Z scores obtained from the left and right heel to vary between ± 1.1 (e.g., 1.96·√2
−1·0.78) for men and ± 0.79 (e.g., 1.96·√2
−1·0.57) for women [
25]. However, results from distributional regression analyses identified the existence of heteroscedasticity, such that variation in all QUS BMD variables between the left and right heel were influenced by both sex and average value, with greater variation for men and participants with larger BMD values. Similarly, concordance analysis casted doubt upon the reliability of QUS BMD measurements when considering participants on standardised scales. The analyses identified that a substantive proportion of individuals (~35%) should be expected to change quartile ranking based upon measurement of the left and right heel. Collectively, these results indicate that, whilst the change in absolute measurement between the left and right heel may be reasonable, BMD measurements from a relatively homogenous middle-aged population are tight enough such that variation can induce substantive differences in any ranking type of assessment.
Comparisons between QUS variables and DXA were generally consistent with those reported from several previous studies [
14,
15]. Correlations obtained in the present study were slightly higher for comparisons between QUS variables and total BMD compared with comparisons that included lumbar or femur neck BMD (Table
5). Slightly higher correlations were also obtained for women compared with men. Across all analyses, however, correlations were low to modest, ranging from approximately 0.30 to 0.45. Nguyen et al. [
14] reported correlations of approximately 0.35 for BUA and DXA BMD measured at the lumbar spine and femoral neck. The analyses were part of the Vietnam Osteoporosis Study comprising 1270 women and 773 men, with a mean age of approximately 45 years, but with a greater range (as low as 18-year-olds) compared to the present study. Nguyen et al. [
14] proposed that the relatively low correlations were unsurprising given the fundamental differences in technologies and the differences in measurement sites, with the calcaneus comprising a lower proportion of cortical bone subjected to very different loading milieu to that of the proximal femur or lumbar spine.
In addition to investigating correlations between QUS-derived variables and DXA BMD, we also investigated differences in BMD values when the two measurement devices were placed on the same standardised scale (e.g., Z scores). Analyses identified that for both men and women, standard deviations of difference scores were approximately equal to 1.1. From these initial values we should expect that 95% of the differences in BMD Z scores between QUS and DXA would range between ± 1.5 (e.g., 1.96·√2
−1·1.1). The upper bounds of this interval represent a large difference in the placement of a participant within a population, thus demonstrating poor criterion agreement. Additionally, analyses identified the presence of heteroscedasticity, such that those with higher DXA BMD values would experience greater variation in their QUS BMD Z scores. For example, a man with a DXA BMD Z score of 1.5 should expect standard deviation of difference scores of approximately 1.4 for total body or lumbar spine (Table
6) leading to QUS BMD Z scores expected to range between −0.4 and 3.4, further demonstrating poor agreement.
The modest correlations and large potential differences between BMD scores reported herein culminated in a poor osteopenia and osteoporosis diagnostic performance of QUS. Similar to previous studies [
21], higher prevalence of osteopenia and osteoporosis were obtained using DXA BMD values from analyses at the lumbar spine and femur neck compared with the whole body. Correspondingly, higher sensitivity was obtained when using total body DXA BMD as the reference for both osteopenia (0.62) and osteoporosis (0.23) when compared with using the lumbar spine (0.40 and 0.04) or femoral neck (0.37 and 0.05). In contrast, specificity was high for both osteopenia (0.81 to 0.85) and osteoporosis (0.99). In a recent systematic review investigating QUS osteoporosis diagnostic performance in postmenopausal women, it was concluded that QUS should be considered an accurate diagnostic tool [
16]. The review included 15 studies ranging from sample sizes of
n = 43 to
n = 1132. The mean sensitivity value was equal to 0.73 ± 0.21 and the mean specificity value equal to 0.65 ± 0.18 [
16]. Most sensitivity and specificity values were, however, obtained after setting a T score threshold that optimised diagnostic performance in the reporting sample, meaning that the diagnostic performance would likely be inflated. The authors identified that diagnostic performance is likely to be influenced by the QUS device used, the prevalence of osteoporosis in the population, and that in order to achieve appropriate results, distinct T score thresholds from DXA would be required [
16].
In conclusion, despite concerns that QUS and DXA measure very different qualities, QUS is routinely used and evaluated for its potential use as a diagnostic tool [
6] as it represents a safer, lower cost, lower resource, and more portable alternative to DXA. The present study comprises one of the largest and most comprehensive analyses of QUS and despite the many practical advantages offered by the technology, several limitations must be acknowledged. QUS only demonstrates low to modest correlations with DXA BMD values; however, researchers have identified that correlations may be influenced by the specific QUS and DXA scanner comparison as there are no studies that provide standardized equations such as those that exist between major DXA manufacturers [
26,
27]. In addition, reliability of QUS BMD measurements may be limited, especially for men exhibiting larger values, or when results are expressed in standardised scales such as Z scores, T scores, or quartiles. In addition, osteopenia and osteoporosis diagnostic performance of QUS may be limited, depending upon a range of factors including prevalence in the population. In order to achieve appropriate diagnostic performance, research suggests that development of specific threshold values is required.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.