In this overview, we aimed at providing a comprehensive catalogue of frailty measures, reviewing evidence on their validity and reliability, and quantifying the use of each measure by investigators other than the originators. We identified 27 frailty scales used in 150 studies to date. We made a series of observations. First, although frailty, disability, and comorbidity are inter-related, they are distinct clinical entities [
63,
64]. Integrating disability or comorbidity items into a frailty scale may be debatable as they are not equivalent concepts. However, half the frailty instruments (n=14) include either disability or comorbidity components [
30,
32‐
34,
36‐
39,
48,
49,
51,
52,
55],[
56]. Second, at least five measures [
36,
39,
41,
42,
44] of frailty were originally created to measure vulnerability, functional status, and physical performances, suggesting a lack of terminological rigor. Third, we observed that four recent scales [
51,
53,
55,
56] are based on existing measures, in particular the Fried scale. Finally, confusion between frailty scales can be generated because sometimes a specific instrument is named differently in different studies (the Fried scale [
47] being labelled as Fried Frailty Index [
65] on occasion). Elsewhere, several instruments are identically named but have different item content: for instance, the term “frailty index” was used by different researchers [
34,
43,
54]. This was also the case with “frail scale” [
52,
66].
Assessment of the reliability and validity of frailty measures
The Standards for Educational and Psychological Testing [
67], a guideline which describes the best practice in the development of complex measures such as frailty, recommends the reporting of the basic principles of test construction such as reliability and validity. However, this information was available only for a few instruments: CSHA Clinical Frailty Scale [
32] and Edmonton Frail Scale [
52]. They had acceptable reliability (Kappa coefficient ≥ 0.7) and good concurrent and predictive validity. Two instruments were widely tested for their validity but not reliability: the Frailty Index [
34] and the Fried’s scale [
47]. Reliability and validity are the most important indicators when selecting one measure over another. However, even among 7 frailty measurements with such information [
33,
35,
37,
40,
43,
49,
52], none of them appear to be recognized as a “gold standard”. Comparing the performances of different frailty scales in predicting an objective health outcome such as mortality was complicated by the use of different confounding factors across studies.
In several studies, investigators have examined the inter-relationships between different measures of frailty. Thus, the Fried’s scale has been compared with the Frailty Index [
10,
68,
69] and the Study of Osteoporotic Fracture index [
15,
53] using different methods: correlation analyses [
69], comparison of strength of cross-sectional [
68] and prospective associations [
10,
15], and use of the c-index statistic [
53]. The Fried’s scale is moderately well correlated with the Frailty Index [
69], and shows a stronger association with age and sex (important criteria of construct validity [
28]) [
68] but a weaker association with mortality [
10]. The Fried’s scale and the Study of Osteoporotic Fracture index have a similar strength of association with falls, disability, hospitalization [
15] and death [
53]. As Streiner and Norman [
27] highlighted, we found that it was sometimes difficult to disentangle whether an assessment belongs to concurrent validity or construct validity. Therefore, certain classifications in either category might be arguable.
Use of the frailty instruments
We attempted to assess the use of a frailty instrument by counting the number of publications that had adopted the instrument other than the original creators. The two instruments which have had their external validity most extensively evaluated against adverse health outcomes were those developed by Fried group (Phenotype of Frailty) and Mitnitski group (Frailty Index). These are based on two different conceptual frameworks. The Fried group has suggested that frailty represents a phenotype which reflects underlying age-related changes in multiple systems. By contrast, the Mitniski group advances that frailty is the accumulation of multiple deficits, with the degree of frailty denoted by the number of such deficits. This highlights that although some investigators recognize that frailty, comorbidity, and disability are distinct entities [
28,
47,
70], for others they are overlapping. Most reviews or editorials on frailty have implicitly presented the Phenotype of Frailty as standard [
63,
71‐
81] whereas for others the standard is the Frailty Index [
82,
83]. Recommendations from other researchers are more nuanced. For Sternberg and colleagues [
84], the choice depends on the definition and outcomes that best suit the investigators or clinicians responsible for the screening. The European, Canadian and American Geriatric Advisory Panel [
66] recommend using a hybrid measure, the “FRAIL” scale, comprising components from both the Phenotype of Frailty and the Frailty Index.
The Fried’s scale [
47] has been the most extensively tested for its validity and is the most widely used instrument in frailty research [
65,
78,
85‐
134]. Randomized controlled trials have also used the scale to screen elderly participants [
24,
25,
135‐
140], or as an outcome of interventions [
22,
23,
139]. The Fried’s scale is widely used, allowing comparisons to be made between studies.
The main limitation of our assessment of use of these instruments is that it penalizes the more recently published frailty instruments. However, the Fried’s scale is not the oldest measure in the field and several more recent frailty instruments are either derived or similar to that measure, suggesting that qualities other than duration of availability explain the popularity of this instrument. Another limitation lies in the lack of elimination of articles that may have resulted from the original authors’ circle of influence. For example, some of the articles which report on the use of the Fried’s scale may have been produced from former co-workers who had previously utilized the CHS data – the dataset in which the Fried’s scale was first validated.
In spite of its wide use, the Fried’s scale has some drawbacks common to other frailty instruments. Chiefly, different scales utilize different classification of the individual components. For example, in the Cardiovascular Health Study (CHS), participants were considered positive for weight loss if they reported having lost more than 10 pounds unintentionally in the last year or they objectively lost 5% or more in comparison with the previous year’s body weight [
47]. In Women’s Health Aging Study-I, however, a cut-off of 10% in comparison with the self-reported weight at age 60 years [
4] was utilized. These important variations in the operationalization of frailty measurement render comparisons of findings between studies as problematic.
In addition to the manual counting procedure to estimate the use of the frailty instruments, we also examined the number of citations in original research articles (excluding those cited by the creators of a given frailty instrument) for the 27 papers describing the frailty instruments. Even though the rank of citations was different for some of the frailty instruments than that of the manual counting, the paper on the Fried’s scale was still the most highly cited. Although the number of citations can be easily accessed, this electronic database search cannot replace the manual counting method as the papers citing the original articles do not necessarily use the tool in question.
Among previously published reviews [
66,
83,
84,
141‐
145] on frailty measures, only one [
83] assessed them in terms of reliability and validity. Compared with the De Vries and colleagues’ paper [
83], our review presents additional strengths. First, to evaluate reliability and validity of a given instrument, we have extracted data from other studies, reflecting its level of external validation. Second, to our knowledge, no article has been published on the extent to which frailty measures have been used by other researchers. This finding might reflect the level preference of researchers for a given frailty measurement in the absence of a consensually recognized tool. Moreover, we identified 18 other frailty instruments [
30,
32,
35‐
38,
40‐
46,
48,
52,
54‐
56], 5 of them created in 2010 and after. Another limitation of our review may lie in the use of a unique keyword “frailty” to identify relevant publications on frailty measurements. One may find such a strategy restrictive, leading to miss some screening tools helping to identify frail elderly. In fact, we included similar frailty instruments than those comprised in the recent reviews [
83,
84].