Introduction
Orthodontic care and especially its effectiveness have increasingly become the focus of political and public attention in the recent past. In 2018, the German Federal Ministry of Health commissioned an evaluation about orthodontic treatments and their potential influence on oral health by the Institute for Health and Social Research (Institut für Gesundheits- und Sozialforschung, IGES). The IGES report came to the conclusion that the dental health benefits of orthodontic treatments currently lack evidence which in turn is no proof against such benefits [
15]. Moreover, oral health not only comprises dental health aspects like tooth loss or caries, but also revolves around functional, emotional, and social issues. In this context, oral health-related quality of life has proven to be reduced in children, adolescents and adults with specific malocclusions [
1,
23,
38]. Dental appearance might have significant psychosocial effects. As our population becomes increasingly more aware of dental appearance and is highly informed about orthodontic treatment opportunities, the general demand for orthodontic treatment has risen [
17,
20]. According to the German Health Interview and Examination Survey for Children and Adolescents (KiGGS Wave 2) by the Robert Koch Institute with a cohort of 15,023 children and adolescents, 25.8% of all 3‑ to 17-year-old girls and 21.1% of all 3‑ to 17-year-old boys were in active orthodontic treatment between the years 2014 and 2017. During this time span, 13-year-old girls and 14-year-old boys underwent orthodontic treatment most frequently (55.0% and 50.8%, respectively). The authors mentioned an increase in uptake of orthodontic care over the past decade [
35]. Population-based data about orthodontic treatment in Germany along with its outcome and effectiveness are however lacking.
Generally, measuring treatment outcome and effectiveness have been discussed thoroughly in international orthodontic literature. In order to assess the severity and complexity of malocclusions before and after orthodontic treatment, numerous grading systems have been proposed [
6,
7,
24,
31]. However, currently no internationally recognized and consistently used quality assessment tool exists. One of the many systems is the Peer Assessment Rating (PAR) Index developed by Richmond et al. to provide an objective assessment of treatment success [
32]. The PAR Index is an occlusal index that is able to quantitatively evaluate orthodontic treatment outcome by measuring pre- and posttreatment models and the respective improvement rate. It has shown excellent validity and reliability [
31,
36]. When reporting about effectiveness and treatment outcome using the PAR Index, some authors suggest to not solely report about the change between pre- and posttreatment PAR scores, but rather take the final total PAR score as an indicator for a good occlusal outcome. Improvement rates seem to be less sensitive because of the confounding factor of the pretreatment PAR score [
30].
The outcome of orthodontic treatment might be influenced by specific patient-, practitioner-, and treatment-related factors [
21,
30]. There is varying evidence which patient gender or type of malocclusion is associated with better occlusal outcomes [
5,
19,
41,
42]. Treatment-related factors like the type of appliance (fixed vs. removable) or the number of arches treated (single vs. dual arch treatment) were part of corresponding research as well [
25,
33,
39]. Although international study groups have previously reported about orthodontic treatment outcome and potential influencing factors in large cohorts and for a variety of treatment modalities [
16,
30], such research about orthodontic reality in Germany is scarce and often involves patients of only one or two orthodontic providers/university hospitals [
21,
40].
In the light of the above-mentioned need for national research, the aim of this explorative multicenter cohort study was to evaluate the effectiveness of orthodontic treatments in Germany as well as potential predictive factors within this cohort.
Statistical analyses
Due to the exploratory nature of this study, no sample size calculation was performed. We considered previous studies [
3,
11,
29,
37] and aimed for a similar sample size, yet considering the wider range of patient-, practitioner- and treatment-related factors involved in our cohort study. A sample size of 60 patients per study center appeared to be sufficient.
Reliability testing was performed by evaluating intraexaminer (IG) and interexaminer reliability (IG vs. NCB) using the intraclass correlation coefficient (ICC) for total PAR scores at T0 and T1. Furthermore, 20% of the study models at one of the study centers were randomly selected and rescored by the principal investigator (IG) after a 30-day period in relation to the first scoring. Finally, all cases scored by NCB were additionally scored by IG to test for interexaminer reliability.
Our primary endpoint was the weighted PAR score reduction between T0 and T1. Because our data failed the Shapiro–Wilk normality test, nonparametric tests were performed. For continuous variables, descriptive statistics (mean, standard deviation, minimum, 1st quartile, median, 3rd quartile and maximum) were calculated and compared by the Wilcoxon test. Qualitative variables were summarized by count and percentage, and their influential impact was analyzed by using crosstabs in combination with Pearson’s chi-square and Fisher’s exact test. Binary logistic regression helped to determine predictive factors of final PAR score (≤5). Independent variables that indicated statistical significance and/or clinical significance in crosstab analyses were tested in this model. Odds ratios (OR) and 95% confidence intervals (CI) were provided for potential influencing factors.
Statistical analyses were performed with SPSS® statistical package (version 23, IBM, Armonk, NY, USA), in cooperation with the Institute of Medical Statistics and Computational Biology of the University Hospital of Cologne. A two-sided p-value of less than 0.05 was considered to indicate statistical significance. No adjustment for multiple testing was performed; thus, all analyses, except those related to the primary endpoint, were considered to be exploratory.
Discussion
Since the aim of this study was to report about orthodontic reality in Germany, we defined very few inclusion and exclusion criteria. The age limit was chosen because we incorporated patient-reported outcomes such as oral health-related quality of life in our cohort study and therefore used questionnaires with an age recommendation. Note that these patient-reported outcomes will be part of future publications, however. Every study center was asked to consecutively screen all patients with upcoming posttreatment record takings. Yet, only an average of 57.2% of the screened patients were recruited, mostly because of a lack of patient and/or parent acceptance with regards to reading and filling-out the informed consent documents and the above-mentioned questionnaire. Although our recruitment rate and the corresponding drop-out-rate are in line with previous studies revolving around patient-reported outcomes and questionnaires [
18,
34], one has to keep in mind that the present results might not fully represent orthodontic
reality in Germany. The results should be regarded as a hint towards the
potential quality of orthodontic care in Germany, which proved to be high within the selected study sample.
The study centers involved were chosen in order to represent orthodontic care at university hospitals as well as at orthodontic practices, yet patient and treatment characteristics along with the detected treatment outcome might not be representative for all orthodontic practitioners in Germany. Nevertheless, uniting seven national—geographically and conceptually different—study centers within this quality of orthodontic care study is unique [
2,
4,
10,
21,
28,
29].
The patient characteristics within our study were similar to comparable studies: An enlarged overjet (= KIG ‘D’) was the most frequent indication to treat among the study participants. In Western Europe and especially in Germany, this occlusal feature is a frequent trait, which underlines that our study sample seems to be representative [
14,
22]. More females than males made up the sample, which is characteristic for the gender distribution in orthodontics [
13,
30], especially the older the patient becomes. Patient gender was not a significantly influencing factor regarding treatment effectiveness, in accordance with other studies [
8,
13]. Mean age at active treatment start was 14.8 years. Quach et al. found similar gender and age distributions among their UK sample [
30]. González-Gil-de-Bernabé et al. reported about older patients within their Spanish sample—17 years—[
13], while Freitas et al. analyzed data from patients who were 13.5 years old on average [
11]. Depending on the research question and methodology—analyses of specific treatment modalities or analyses of quality of care in general—there is no consistency in international literature about the mean age at the beginning of orthodontic treatments. With regards to patient age, we found a significantly different distribution within the PAR categories ‘greatly improved’/’improved’/’worse or different’: On the one hand, patients with great improvement were 0.6 years younger than patients who achieved mere improvement. On the other hand, these patients were older than the patients who finished with an improvement less than 30%. Thus, according to our data, there is no clinically relevant conclusion regarding the correlation between patient age at active treatment start and occlusal outcome.
Average active treatment duration within this sample was 31.3 months; active MBA treatments lasted 26.2 months on average. This is comparable to other studies that reported about treatment duration [
2,
3,
10,
11,
40]. There are many factors that may potentially influence treatment duration in orthodontics, for example, the treatment modality or the need for orthognathic surgery [
26]; other influencing factors are individual occlusal traits like impacted canines. Our sample comprised almost every aspect of malocclusion and treatment modality because the aim of this study was to report about orthodontic reality with its numerous facets. Although we found a significantly different distribution of treatment duration within the PAR categories of improvement—namely that the treatment duration was longer in the ‘greatly improved’ group than in the ‘improved’ group but shorter than in the ‘worse or no different’ group—no clear conclusion can be drawn about the influence of treatment duration with regards to occlusal outcome. Yet, one has to keep in mind that unwanted side effects for oral soft and hard tissues might be more probable in prolonged orthodontic treatment. Therefore, active treatment duration should be as long as necessary and as short as possible.
In general and regardless of potential confounding factors, the PAR score reduction within this German sample indicated a high standard of orthodontic care. An improvement rate higher than 70% is generally considered a good standard of orthodontic treatment [
32]. In our sample, a mean PAR improvement rate of 83.54% was achieved, which seems to be rather high compared to the reported improvement in similar studies [
2,
3]. Freitas et al. reported about 78.54% improvement in their Brazilian sample that underwent premolar extraction treatments [
11]. Ponduri et al. investigated the PAR Index reduction of orthodontic as well as orthognathic surgery treatments and found an improvement rate of 77 and 74%, respectively [
29]. An improvement rate about as high as in our sample was reported by Isherwood et al. [
16] and Onyeaso et al. [
28]. Moreover, the distribution of cases within our sample with regards to the PAR categories ‘greatly improved‘ (49.0%), ‘improved’ (50.1%), and ‘worse or no different’ (0.9%) represents a very high standard of treatment. Although other study groups have reported about negligible numbers of treatments that resulted in a ‘worse or no different’ outcome as well, 0.9% of ‘worse or no different’ cases, like in our sample, seems to be a very low proportion [
2,
3]. On the other hand, researchers like Ponduri et al. [
29] and Isherwood et al. [
16] even reported about no single ‘worse or no different’ treatment within their samples. Within the present study sample, all PAR components significantly improved throughout treatment course. Yet, improvement rate of the PAR components ‘buccal segments’ and ‘centerline’ was only 44.02 and 33.23%, respectively, whereas the PAR component ‘upper anterior segments’ showed an improvement of 90.94%. These differing PAR component-specific improvement rates are in line with previous studies [
2,
10,
37]. Interestingly, in the present sample the PAR component ‘buccal segments’ was reduced by 52.86% in the high SE group and only by 36.92% in the low SE group (
p < 0.001), highlighting the potential impact of staff experience.
While many authors investigate
improvement measures with regards to the PAR Index, few report about the final occlusal
result as an indicator of treatment quality. Quach et al. expressed the importance of this indicator [
30]. Mere improvement measures should be read with caution because the initial PAR score seems to be highly relevant and influential for the categories of PAR Index improvement (‘greatly improved’, ‘improved’, ‘worse or no different’), having in mind that a case only classifies for ‘great improvement’ when the initial PAR score counts more than 22 points. Thus, Quach et al. had a closer look at the final occlusal outcome and the percentage of treatments that finished with an almost ‘ideal occlusion’ of ≤5 PAR points; 67.9% of the 495 treated and analyzed patients from the UK had such an almost ‘ideal occlusion’ at the end of the treatment, while the improvement rate was 80.5% [
30]. The 335 analyzed patients from our study were treated towards high-quality results more frequently; 81.5% fell in this outcome category, again an indicator of a very high standard of orthodontic care in this German sample. Remarkably, even if a final PAR score of 5 points or less stands for an almost ‘ideal occlusion’ and a high-quality treatment result, this low score might as well comprise a bilateral single tooth crossbite, for example.
In specific, several treatment modalities were significantly more often associated with ‘great improvement’ and high-quality treatment outcome than others. Yet, based on the aims and design of the study, no scientific explanation for the difference in appliance performance can be given. Treatments with HA resulted in high-quality treatment outcome, as it has been proven before [
4]. Furthermore, treatments with L‑MBA were associated with high effectiveness. Note that most of the HA treatments and all L‑MBA cases of this sample were treated in specialized practices/university hospitals, which might be a biasing factor regarding the quality of treatment outcome. A recent systematic review on lingual orthodontics came to the conclusion that especially individualized treatment goals seemed to be achievable by fully customized L‑MBA such as those used in the present sample [
27]. In addition, the PAR component ‘buccal segments’ was significantly more improved in the present L‑MBA group compared to all other treatments, possibly due to their biomechanical properties.
While patients who were treated with RME appliances had only half the chance of achieving an almost ‘ideal occlusion’ compared to the rest (OR 0.441), this negative correlation was not the case with regards to achieving the PAR category ‘greatly improved’. Significantly more patients with RME classified for ‘great improvement’ in comparison to the rest (78.6 vs. 44.9%), while significantly less patients who underwent RME treatment achieved an almost ‘ideal occlusion’ in comparison to the rest (66.7 vs. 83.9%). This apparent contradiction reflects the above-mentioned difference of the two outcome measures ‘high-quality treatment result’ and ‘great improvement’. A case can be regarded as ‘greatly improved’ because of a reduction of the mean PAR score of >22 points, but it does not necessarily finish with an almost ‘ideal occlusion’ with ≤5 PAR points at the end of treatment. RME is used when treating crossbites and a compromised transverse occlusion. In general, especially the PAR component ‘buccal segments’ proved to finish with rather high PAR points. Therefore, it seems to be clinically comprehensible that RME treatments do not finish with a low total PAR score, but can be regarded as ‘greatly improved’ nevertheless. Our result that RME treatments significantly reduce the chance to achieve high-quality treatment results should not lead to direct clinical implications or restrictions, particularly because of the above-mentioned thoughts.
Similar results were found when analyzing the effectiveness of OS treatments compared to the rest. Significantly more patients who underwent orthognathic surgery classified for ‘great improvement’ in comparison to the rest (80.0 vs. 47.1%), while significantly lower percentage of patients who were treated with surgery achieved an almost ‘ideal occlusion’ in comparison to the rest (65.0 vs. 82.8%). These findings could be explained by the previously mentioned difference between both outcome measures, but should be regarded with caution because the subgroup of OS compromised only 20 patients, while the rest made up 314 patients within the mentioned analyses.
Based on our results, a key predictive factor for finishing with an almost ‘ideal occlusion’ was a high staff experience in years. This result is not surprising and supported by other studies [
10,
11,
30]. High staff experience and an increased skill level is likely to come along with increased treatment effectiveness, especially with regards to achieving one of the most important and at the same time most challenging orthodontic goal—to correct buccal occlusion. However, this result should encourage university hospitals with a high number of postgraduates and rather low staff experience, to take care of sufficient supervision by highly experienced staff members, so that the difference in clinical experience and skills does not necessarily have an effect on treatment quality.
Furthermore, the results of this study imply a trend, indicating that a combination of removable—mostly functional—and fixed appliances might result in high-quality treatment outcomes. Whenever treatments were carried out with a RA/functional phase, the chance to finish with a PAR score ≤5 points was almost twice as high (OR 1.92). Yet, this result should only be interpreted as a trend because of missing statistical significance. Quach et al. found a similar correlation between the combination of functional plus fixed appliances and high effectiveness [
30]. However, treating patients with removable plus fixed appliances prolongs the total treatment duration. As orthodontists we try to treat our patients as quickly and efficiently as possible, but should also take the above-mentioned findings into account during treatment planning.
There are several limitations of the present study with some of them already mentioned. Although recruitment procedures were discussed with professionals from the Clinical Trials Center of the University Hospital of Cologne, a positive selection of potential study participants and with it, a selection bias, cannot be completely ruled out. However, a willful (positive) selection of patients was defined as unacceptable and existing good clinical practice guidelines as well as the individual commitment of research partners should not be doubted in general. Study centers with a rather long recruitment duration did not prove to provide study participants whose treatments were more efficient than the rest; in fact, the duration of recruitment was rather short at study centers that contributed a large amount of high-quality treatments. Yet, we cannot be certain about the generalizability of data. Especially with regards to specific treatment modalities such as L‑MBA or HA it is important to keep in mind that highly specialized centers were part of this study. In addition, the study sample comprised a large number of patients, which was not necessarily the case for the analyzed subgroups. Thus, results revolving around treatment modalities with only a small number of patients should be read with a degree of caution. In addition, not every potential aspect of the variety of orthodontic treatment modalities was analyzed. We chose the treatment modalities carefully with regards to frequently applied procedures, yet some aspects of orthodontic reality, like aligner treatments, might be missing. Another characteristic of our methodology was the use of the PAR Index for measuring treatment effectiveness. Although this index can be regarded as the gold standard for measuring treatment effects, there are some PAR-specific aspects to keep in mind. One of them is that the PAR Index is preferred in permanent dentition cases and often scores higher in these cases than in mixed-dentition cases. Another crucial aspect is the above-mentioned fact that a final PAR score of 5 points might be far from representing a truly ideal occlusion. Using the PAR Index, very good insight into the quality of orthodontic care within a sample is obtained, but it does not represent the absolute occlusal truth—as no occlusal index does. Finally, when looking at our results in detail, one has to keep in mind that T1 was not directly after active orthodontic treatment, but rather after a retention phase at the time of final record taking. Furthermore, the time interval between the end of active treatment and record taking varied considerably (6.3–26.9 months) between the study centers. Potential relapse or further improvement due to specific retention protocols were not accounted for.
In addition to the above-mentioned and -discussed findings, patient-reported outcomes were measured and analyzed in the course of this multicenter cohort study. This specific aspect of quality of orthodontic care will be part of future publications.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.