Background
It is well-known that fall events constitute an important factor with regard to mortality, morbidity and costs in our aging population. These events have a high incidence especially in the elderly: 25.1% of the men and 37% of the women aged 65 years and above fall at least once within 12 months [
1]. The highest incidence is reported for geriatric inpatients [
1], which often have several risk factors [
2] at the same time and suffer from multiple diseases: The prevalence rate of five or more somatic diseases for persons aged 70 years and above has been reported in the Berlin Aging Study to be 88% [
3]. As fall events and their consequences are very costly - an estimated annual 19.2$ billion in the U.S. [
4] - preventive measures have been investigated intensively [
5]. These measures themselves are costly, so that two predominant questions are: Who should be treated in the first place, and who should receive which kind of preventive measure?
In order to identify persons at risk to fall down - thus being eligible for preventive treatment - many risk assessment tools, e.g. the
Timed Up&Go test (TUG) [
6] or the
St. Thomas Risk Assessment Tool in Falling Elderly Inpatients (STRATIFY) [
7] have been developed and evaluated in a multitude of studies. Comprehensive reviews can be found e.g. in [
2,
8,
9]. Several tests have also been used to predict falls in outpatients, often with a specific group of patients. Kikuchi et al. report that, in a prospective study with 79 patients having a diagnosis of cognitive impairment and lasting 12 months, only their fall-predicting score, a self-answered 21-item questionnaire, was predictive of future falls [
10], but not e.g. the TUG. The latter test was, in contrast, found as the only predictive parameter for falls in patients after hip surgery in a 6-month prospective study by Kristensen et al. [
11] Hale et al. found that mobility scores were not associated with falls in a 12-month prospective study with 120 geriatric outpatients, but history of falls was [
12]. Oliver et al. conclude that even the best tools are not able to identify the majority of fallers [
9]. Keeping this in mind along with the often time-consuming nature of fall risk assessment tests (e.g. the Performance-Oriented Mobility Assessment, POMA [
13]) that frequently require expert knowledge, several research groups have developed the idea to perform a sensor-based automatic or semi-automatic assessment using wearable inertial sensors [
14‐
17]. Apart from offering continuous and objective data, this approach may also serve to detect fall events once they have happened, being aware of the fact that many falls go by undetected and a person may lie injured hours or even days in her or his flat. Despite promising first results of this sensor-based approach developed by the authors [
18], it remains unclear how well the new methods perform in comparison with conventional fall risk assessment tools.
Therefore, the aim of our research work for this paper is to examine the predictive performance of our new sensor-based method for fall risk assessment in comparison with conventional and established methods. The comparison is based on one-year follow-up data obtained in a prospective study.
Results
Tables
1,
2 and
3 show the results of the single fall risk assessment tests for the prediction of actual fall events within a year after discharge from the geriatric ward. In Table
4, all +LR values are presented. The STRATIFY score (Table
1), a dedicated fall risk tool, has an overall classification accuracy of 48% with a good sensitivity of 79% but a low specificity of 26%. While the NPV is 63%, the PPV is only 43%, meaning that a positive assessment result does not predict actual falls well.
Table 1
Classification results and contingency table for the STRATIFY score [
7] (cut-off point ≥ 2 points)
classification accuracy | 48% | | fall within one year | |
sensitivity | 79% | | yes | no | Sum |
specificity | 26% | pred. yes | 15 | 20 | 35 |
negative predictive value | 63% | pred. no | 4 | 7 | 11 |
positive predictive value | 43% | sum | 19 | 27 | 46 |
Table 2
Classification results and contingency table for the Timed Up&Go test [
6] (cut-off point > 20s)
| | |
contingency table
| |
classification accuracy | 50% | | fall within one year | |
sensitivity | 90% | | yes | no | Sum |
specificity | 22% | pred. yes | 17 | 21 | 38 |
negative predictive value | 75% | pred. no | 2 | 6 | 8 |
positive predictive value | 45% | sum | 19 | 27 | 46 |
Table 3
Classification results and contingency table for multidisciplinary geriatric team fall risk score (4 missing values)
classification accuracy | 55% | | fall within one year | |
sensitivity | 63% | | yes | no | Sum |
specificity | 50% | pred. yes | 10 | 13 | 23 |
negative predictive value | 68% | pred. no | 6 | 13 | 19 |
positive predictive value | 44% | sum | 16 | 26 | 42 |
Table 4
+LR values of all five classification models including the confidence intervals
STRATIFY score | 1.07 | 0.71-1.61 |
Timed Up&Go test | 1.15 | 0.83-1.59 |
Team Assessment | 1.25 | 0.63-2.49 |
model CONV | 2.64 | 1.07-6.5 |
model SENSOR | 2.61 | 0.94-7.26 |
The
Timed Up&Go test results in Table
2 show an overall classification accuracy of 50%, where a high sensitivity of 90% is pitted against a very low specificity of 22%. Similar to the STRATIFY score results, the NPV is slightly higher (75%) than the PPV (45%).
The geriatric team's fall risk assessment score (Table
3) shows more balanced, though not really much better results: A classification accuracy of 55% is accompanied by a sensitivity of 63% and a specificity of 50%. The NPV is 68% and the PPV is 44%. The +LR values (Table
4) of all three simple fall risk assessments (1.07, 1.15 and 1.25) confirm their low predictive power, yet among these the team score has the highest hit ratio.
The automatically generated classification model CONV (Table
5) based on clinical and geriatric assessment data shows markedly better performance values than the three previous tests: The classification accuracy is 72% with a sensitivity of 68% and a specificity of 74%. Furthermore, both the NPV (77%) and the PPV (65%) are balanced and on a fair level. The overall good performance of this model is also shown by both the Brier score of 0.2, an AUC of 0.74 and a statistically significant +LR value of 2.64 (Table
4).
Table 5
Classification results and contingency table for logistic regression model based on clinical data and fall risk assessment tests
classification accuracy | 72% | | | | |
sensitivity | 68% | |
contingency table
| |
specificity | 74% | | fall within one year | |
negative predictive value | 77% | | yes | no | Sum |
positive predictive value | 65% | pred. yes | 13 | 7 | 20 |
Brier score | 0.20 | pred. no | 6 | 20 | 26 |
AUC | 0.74 | sum | 19 | 27 | 46 |
The classification model SENSOR (Table
6, [
18]) matches the CONV model in its measures: Classification accuracy is 70%, with a sensitivity of 58% and a specificity of 78%. NPV (72%) and PPV (65%) are also level. The +LR value of 2.61, however, does not reach statistical significance due to the broader confidence interval.
Table 6
Classification results and contingency table for logistic regression model based on sensor data and long-term physical activity level
classification accuracy | 70% | | | | |
sensitivity | 58% | |
contingency table
| |
specificity | 78% | | fall within one year | |
negative predictive value | 72% | | yes | no | Sum |
positive predictive value | 65% | pred. yes | 11 | 6 | 17 |
Brier score | 0.21 | pred. no | 8 | 21 | 29 |
AUC | 0.72 | sum | 19 | 27 | 46 |
Discussion
The performances of the simple fall risk assessment tools used in this study - the STRATIFY score, the Timed Up&Go (TUG) test and the geriatric care team rating - are limited. In a recent meta-analysis Oliver et al., who have developed the STRATIFY score, report the following values for geriatric patients: SENS 67.2%, SPEC 51.2%, PPV 23.1% and NPV 86.5% (n = 1285 patients, four different studies) [
9]. Kim et al. have also evaluated the STRATIFY score, albeit with a much younger cohort (mean age 56 years, n = 5489 patients, 60 fallers), and find: SENS 55%, SPEC 75.3%, PPV 2.4% and NPV 99.3% [
25]. Our results show a slightly worse performance than was reported in the meta-analysis by Oliver et al. [
9]. This may well be due to our very small sample size. The same applies to the Timed Up&Go test. Nordin et al. have studied the predictive validity of the TUG in 183 patients with a mean age of 84.3 years and a cut-off point of 20s [
26]. They report a sensitivity of 79% and a specificity of 32%. In the large
Tromsø study Thrane et al. find sensitivity values of 44-14% and specificity values of 58-90% for the TUG, depending on the choice of the cut-off points (here between 12 and 17s) [
27]. Kristensen et al. report SENS 95%, SPEC 35%, PPV 41%, NPV 93%, +LR 1.5, -LR 0.1 for the TUG's predictive performance for patients after hip surgery (mean age 81 years, n = 59 patients, 19 fallers, cut-off value 24s) [
11]. Our results (Table
2) also show a remarkable sensitivity of 90% for the TUG, yet the specificity is way too low for a screening test. This is confirmed by the low +LR value of 1.15.
Both tests are very simple to perform, either by history taking or by conducting a simple physical test, and both take only a couple of minutes. Therefore, these tests may serve well - and in fact are frequently used - as general screening methods, if necessary followed by more complex, multimodal assessment inventories such as the
Physiological Profile Assessment (PPA) [
28].
The geriatric care team fall risk score may be perceived as a very subjective measure, yet it represents the professional opinion of several experienced experts that is very likely based on an intuitive understanding of the complex concept 'fall risk' as well as on a multitude of observations of a certain patient. This solid foundation is reflected by the fair performance values of this score, which are the most balanced of the three simple tests. Similar results have been found in [
26], where 'global rating of fall risk' (
low/high) by staff members achieved a sensitivity of 56% and a specificity of 80%.
The automatically induced model CONV (Table
5) shows better performance values than all of the above tests. This is of course due to the approach of including basic clinical data such as sex, BMI and age in the induction process, but also to the combination of different assessment methods ranging from a physical test (TUG) over a measure of daily activity capability (Barthel index) to a fall risk score (STRATIFY). In the induction process, the most relevant parameters or scores are identified and included, so that performance is optimized. The multitude of candidate parameters may capture the multi-factorial concept of fall risk more adequately than a single test. The performance measures show that CONV can identify most of the fallers and non-fallers correctly, based on their one-year outcome. Thus, this model could be suitable as a screening test for geriatric patients, facilitating the prescription of preventive measures.
When compared to the previously computed SENSOR model, which is based merely on accelerometer sensor data and overall activity levels, we can state that this model performs almost equally well than the CONV model. The Brier scores (0.21 vs. 0.2) are nearly the same, as are the AUC (0.74 vs. 0.72) and +LR values (2.64 vs. 2.61). Therefore, regarding our preliminary results we may conclude that by using sensor data - which may be recorded over extended periods of time during normal daily activities with a small and unobtrusive device - we can match the performance of conventional methods with regard to fall risk assessment in a sample of geriatric patients. The advantage of our approach is of course the absent necessity of an expert physiotherapist, nurse or physician to perform the assessment. This could be done by the wearable device itself, using long-term motion data along with the developed algorithm. Potential drawbacks of our approach are possible technical failures such as data loss from the accelerometer device, acceptance issues, limited battery lifetime and the current lack of technical infrastructures (e.g. sensor-enhanced health information systems [
29,
30]). Technical equipment and infrastructures are of course costly, but so are falls and their consequences in the first place.
Future prospective studies will have to be conducted with more patients and over an even longer period of time, evaluating the validity of our approach and our preliminary results in an independent patient sample on the one hand and the cost-benefit relation on the other hand. From an ethics point of view, however, one might argue that every fall that is avoided is a big benefit for the individual.
From a technical point of view, more research work is needed to look into the potential predictive parameters that can be extracted from sensor data [
31]. Furthermore, in this study we have not considered any information from the patients' electronic health records (EHRs) [
32], such as diagnoses or additional history. Considering the multi-factorial aetiology of falls [
33], our sensor-based information may well be used in combination with resp. as supplement to conventional geriatric assessment tools and other clinical data.
Limitations
Our sample size is small and this - in comparison with large trials evaluating the conventional methods-limits the generalizability of our results. So does the fact that, due to the sample size, we cannot use separate training and test data sets for model induction. Nevertheless, we have chosen a well-established procedure to avoid over-fitting of our models, namely ten times, ten-fold cross-validation. The necessity for written consent to be returned by the patients via surface mail may have led to the exclusion of persons with cognitive impairments, even though consent by a third party was an option. Furthermore, in our follow-up study telephone interviews were used to identify fall events. This approach is error-prone, as many factors (e.g. cognitive impairment) may affect the recall of such events. In addition to this, considering the group of patients, within a period of twelve months risk factors may have changed significantly. Therefore, daily recordings as well as more frequent interviews, e.g. on a monthly basis as recommended in ProFaNE consensus criteria recommendation no.7 [
22], might have reduced the error rate, but have not been performed due to a lack of resources. Hence, in our future prospective studies, we will include a more stringent monitoring approach.
From an economic perspective, it remains unclear if the prediction results are good enough to justify the implementation of costly preventive measures for the false positives [
5]. A cost-benefit analysis should be conducted, comparing direct and indirect costs of fall events with those of preventive measures. Furthermore, despite promising preliminary studies (e.g. [
34‐
36]), the patients' acceptance of long-term monitoring should be assessed, e.g. using the
Sensor Acceptance Model[
37].
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
MM supervised the study, carried out the data analysis and drafted the manuscript. AR designed and conducted the telephone interviews. KHW participated in drafting the study protocol. MG conducted the sensor data measurements and participated in the data analysis. GN performed the geriatric assessment tests and participated in drafting the study protocol. HMZS conceived of the study and participated in its coordination. MS participated in the data analysis and the computation of performance values. All authors read and approved the final manuscript.