Discussion
The partitioning conditional tree algorithm allows to detect the underlying structure of how certain diseases within arbitrary multimorbidity patterns influence the costs of health care. Using this statistical approach, we found various variables which are associated with total costs, inpatient costs, medication costs and nursing care costs in multimorbid elderly. These results were verified using ensemble methods.
With respect to total costs and independent from the other co-existing comorbidities, PD and cardiac insufficiency were identified as the most influencial variables, with PD being the more important one. Compared to patients not suffering from any of the two conditions, PD increases predicted mean total costs 3.5-fold to approximately € 11,000 per 6 months, and cardiac insufficiency 2-fold to approximately € 6,100.
The high total costs of PD are largely due to costs of nursing care, for which the respective partitioning tree model predicted more than € 7,000 on average in this patient group. When excluding nursing care from total costs, PD disappeared in the tree for total costs, while the split from cardiac insufficiency remained significant (p = 0.004), predicting mean total costs of € 3,790 (n = 132) if cardiac insufficiency is present and € 2,177 (n = 918) otherwise (tree not shown). The same reduced tree structure resulted when only excluding costs of informal nursing care from total cost, predicting mean total costs of € 4,052 if cardiac insufficiency is present and € 2,260 otherwise (p = 0.001; tree not shown), and reflecting that high nursing costs of PD are largely due to informal care.
If PD is not present, mean nursing care costs are influenced by income and age, with low income being associated with higher costs and, in those with higher income, age being associated with higher costs. Taking comparatively more affluent patients aged ≤76 years not suffering from PD as the reference group, patients with similar income in the age groups 77–83 and >83 cause more than 3-fold and almost 11-fold mean nursing costs, respectively, if PD is not present. If PD is present, mean nursing costs are elevated almost 21-fold compared to the same reference group, irrespective of age and income. In patients with comparatively low income without PD, mean nursing costs are increased almost 5-fold compared to the reference group irrespective of age.
PD was also found to increase medication costs. Yet concerning medication costs, the coexistence of COPD and insomnia was identified as being associated with the highest mean medication costs. Besides these conditions, Diabetes mellitus significantly increases medication costs if COPD and PD are not present. Compared to patients in whom neither Diabetes nor PD or COPD are present, Diabetes (without PD) increases mean medication costs by 40%, PD by 222%, and COPD by 66% or even 271% if insomnia is also present.
While the partitioning tree algorithm identified no variable significantly associated with outpatient costs, cerebral ischemia and/or chronic stroke (CI/CS) was found to increase inpatient costs 3.5-fold, with no other variables being significant in the model.
Except for costs of nursing care, socio-demographic variables did not significantly influence costs of care.
Strengths and limitations
In general, one main advantage concerning partitioning tree algorithms compared to traditional analytical methods is to be seen in the simplified representation of high dimensional data and its direct interpretability. Due to the chosen 0/1 recursive partitioning framework, the lack of smoothness - a common disadvantage of tree based modelling - could be neglected.
Compared to classical parametric regression techniques, tree-based decision models avoid any distributional assumption. Therefore, the estimation of the coefficients is not affected by misspecification. At the same time, trees aim to discriminate disjunctive homogeneous subsets by minimizing within-variance and maximizing between-variance.
As a main disadvantage, CART or related decision tree algorithms like C4.5 face high variance caused by the inherent binary partitioning method, leading to a propagation of the error effect of the first split. Besides this, due to their focus on the maximization of the information criteria, the problem of overfitting and a selection bias of covariates with a maximum number of possible splits as a result of the numerical optimization arises.
Instead of using traditional classification and regression trees (CART) or related tree algorithms like ID3 or C4.5, we applied a different approach embedded in the context of statistical inferential theory (see [
29,
33]). We used a conditional inference tree (CTREE) based on multiple permutation tests which combines tree based regression and statistical theory of conditional inference. Opposite to CART or C4.5, CTREE controls for selection bias using splits based on statistical inference and significance values. Permutation tests are implemented to guarantee a solid stopping criteria. Thus, our model overcomes typical problems of classical tree algorithms.
To verify our results and detected splitting variables, variable importance scores were calculated based on conditional random forests for each cost sector.
When comparing CTREE results to CART, in all cost sectors CART lead to more splits on the one hand, while pruning lead to no suitable trees. On the other hand, CART verified the findings of CTREE by showing identical nodes and/or grown tree structures. Furthermore, CTREE lead to theoretically reasonable splitting variables. Based on these findings as well as on calculated error terms and results achieved from the conditional random forests, our study emphasizes the superiority of the CTREE algorithm.
Our statistical analysis was based on a pre-imputed master data set provided by the data management of the MultiCare study group which had used the hot deck method and conditional means for imputation of missing values. Although we are aware of benefits resulting from multiple imputation algorithms, we agreed on using the master data set for the sake of consistency and because the proportion of missing values in the variables used for our analysis was very small. Nevertheless, tree-based algorithms can handle complete data as well as missing data usually assuming Missing Completely At Random (MCAR).
Patients with response-limitations due to medical reasons (blindness, deafness, dementia, etc.) as well as nursing home residents were excluded from the study sample. Therefore the impact of respective chronic conditions on health care costs could not be analyzed. Yet conditions associated with response-difficulties may strongly influence health care costs. For example, dementia is a very important and prevalent condition in the elderly associated with high health care costs. Dementia is often present in late stages of different diseases, such as PD, CI, chronic stroke and others. Future studies analyzing the impact of multimorbidity on health care costs should therefore consider surrogate responders for data collection in such response-limiting conditions.
Acknowledgements
The study was funded by the German Federal Ministry of Education and Research (grant numbers 01ET0725-31 and 01ET1006A-K).
This article is on behalf of the MultiCare Cohort Study Group, which consists of Attila Altiner, Horst Bickel, Wolfgang Blank, Monika Bullinger, Hendrik van den Bussche, Anne Dahlhaus, Lena Ehreke, Michael Freitag, Angela Fuchs, Jochen Gensichen, Ferdinand Gerlach, Heike Hansen, Sven Heinrich, Susanne Höfels, Olaf von dem Knesebeck, Hans-Helmut König, Norbert Krause, Hanna Leicht, Melanie Luppa, Wolfgang Maier, Manfred Mayer, Christine Mellert, Anna Nützel, Thomas Paschke, Juliana Petersen, Jana Prokein, Steffi Riedel-Heller, Heinz-Peter Romberg, Ingmar Schäfer, Martin Scherer, Gerhard Schön, Susanne Steinmann, Sven Schulz, Karl Wegscheider, Klaus Weckbecker, Jochen Werle, Siegfried Weyerer, Birgitt Wiese, and Margrit Zieger.
We are grateful to the general practitioners in Bonn, Dusseldorf, Frankfurt/Main, Hamburg, Jena, Leipzig, Mannheim and Munich who supplied the clinical information on their patients, namely Theodor Alfen, Martina Amm, Katrin Ascher, Philipp Ascher, Heinz-Michael Assmann, Hubertus Axthelm, Leonhard Badmann, Horst Bauer, Veit-Harold Bauer, Sylvia Baumbach, Brigitte Behrend-Berdin, Rainer Bents, Werner Besier, Liv Betge, Arno Bewig, Hannes Blankenfeld, Harald Bohnau, Claudia Böhnke, Ulrike Börgerding, Gundula Bormann, Martin Braun, Inge Bürfent, Klaus Busch, Jürgen Claus, Peter Dick, Heide Dickenbrok, Wolfgang Dörr, Nadejda Dörrler-Naidenoff, Ralf Dumjahn, Norbert Eckhardt, Richard Ellersdorfer, Doris Fischer-Radizi, Martin Fleckenstein, Anna Frangoulis, Daniela Freise, Denise Fricke, Nicola Fritz, Sabine Füllgraf-Horst, Angelika Gabriel-Müller, Rainer Gareis, Benno Gelshorn, Maria Göbel-Schlatholt, Manuela Godorr, Jutta Goertz, Cornelia Gold, Stefanie Grabs, Hartmut Grella, Peter Gülle, Elisabeth Gummersbach, Heinz Gürster, Eva Hager, Wolfgang-Christoph Hager, Henning Harder, Matthias Harms, Dagmar Harnisch, Marie-Luise von der Heide, Katharina Hein, Ludger Helm, Silvia Helm, Udo Hilsmann, Claus W. Hinrichs, Bernhard Hoff, Karl-Friedrich Holtz, Wolf-Dietrich Honig, Christian Hottas, Helmut Ilstadt, Detmar Jobst, Gunter Kässner, Volker Kielstein, Gabriele Kirsch, Thomas Kochems, Martina Koch-Preißer, Andreas Koeppel, Almut Körner, Gabriele Krause, Jens Krautheim, Nicolas Kreff, Daniela Kreuzer, Franz Kreuzer, Judith Künstler, Christiane Kunz, Doris Kurzeja-Hüsch, Felizitas Leitner, Holger Liebermann, Ina Lipp, Thomas Lipp, Bernd Löbbert, Guido Marx, Stefan Maydl, Manfred Mayer, Stefan-Wolfgang Meier, Jürgen Meissner, Anne Meister, Ruth Möhrke, Christian Mörchen, Andrea Moritz, Ute Mühlmann, Gabi Müller, Sabine Müller, Karl-Christian Münter, Helga Nowak, Erwin Ottahal, Christina Panzer, Thomas Paschke, Helmut Perleberg, Eberhard Prechtel, Hubertus Protz, Sandra Quantz, Eva-Maria Rappen-Cremer, Thomas Reckers, Elke Reichert, Birgitt Richter-Polynice, Franz Roegele, Heinz-Peter Romberg, Anette Rommel, Michael Rothe, Uwe Rumbach, Michael Schilp, Franz Schlensog, Ina Schmalbruch, Angela Schmid, Holger Schmidt, Lothar Schmittdiel, Matthias Schneider, Ulrich Schott, Gerhard Schulze, Heribert Schützendorf, Harald Siegmund, Gerd Specht, Karsten Sperling, Meingard Staude, Hans-Günter Stieglitz, Martin Strickfaden, Hans-Christian Taut, Johann Thaller, Uwe Thürmer, Ljudmila Titova, Michael Traub, Martin Tschoke, Maya Tügel, Christian Uhle, Kristina Vogel, Florian Vorderwülbecke, Hella Voß, Christoph Weber, Klaus Weckbecker, Sebastian Weichert, Sabine Weidnitzer, Brigitte Weingärtner, Karl-Michael Werner, Hartmut Wetzel, Edgar Widmann, Alexander Winkler, Otto-Peter Witt, Martin Wolfrum, Rudolf Wolter, Armin Wunder, and Steffi Wünsch.
We also thank Corinna Contenius, Cornelia Eichhorn, Sarah Floehr, Vera Kleppel, Heidi Kubieziel, Rebekka Maier, Natascha Malukow, Karola Mergenthal, Christine Müller, Sandra Müller, Michaela Schwarzbach, Wibke Selbig, Astrid Steen, Miriam Steigerwald, and Meike Thiele for data collection as well as Ulrike Barth, Elena Hoffmann, Friederike Isensee, Leyla Kalaz, Heidi Kubieziel, Helga Mayer, Karine Mnatsakanyan, Michael Paulitsch, Merima Ramic, Sandra Rauck, Nico Schneider, Jakob Schroeber, Susann Schumann, and Daniel Steigerwald for data entry.