Background
From the classic to contemporary orthodontics, treatment modalities of analyzing the spatial relationships of teeth, jaws, and cranium rely heavily on the cephalometry. Using standardized cephalometric x-ray, predefined anatomic landmarks are marked so that various orthodontic and facial morphometric analyses can be applied for the diagnosis and treatment planning. Despite the several methodological limitations such as nonlinear magnification and distortion of images, its integral role in orthodontics, as well as orthognathic and facial plastic surgery is indisputable [
1‐
3].
The accuracy of marked cephalometric landmarks can affect the results of the clinical performance of analyses and resulting treatment decisions [
4,
5]. Since the boundary of distinguishing abnormalities is concentrated within the unit range of millimeters or several degrees, even a slight error can have the potential to cause misclassification that can lead to malpractice. What makes this even more challenging is that a human skull is a highly sophisticated 3D object, whereas in a lateral cephalogram which has its model projected onto a sagittal plane, causes cubic features of the direction perpendicular to the plane to be overlapped [
6,
7].
Although cephalometric tracing is generally conducted by trained orthodontists in clinical practice, numerous reports were concerned about the significant intra- and inter-observer variabilities among them due to the various limitations and its time-consuming nature [
7,
8]. To achieve better accuracy and reliability for cephalometric tracing, the need for fully automatic tracing software has constantly been raised [
9].
Throughout the decades, there have been several studies on computer-aided landmark detection. Template matching and gray-scale morphological operator was used for works of Cardillo and Sid-Ahmed, et al [
10]. Ibragimov et al. used Random Forest and Game Theoretic techniques on the ISBI grand challenge 2014 for good performance [
11]. Tree based approaches such as random forest regression with a hierarchic framework and binary pixel classification with the randomized tree were used by Chu et al. and Vandaele et al. respectively [
12,
13]. Despite these efforts, the developed methods have shown limitations over the accuracy and the uncertainty issues, proposing around 70% of estimated landmarks within the clinically acceptable range of 2 mm distance from the ground truth points [
9].
Through the last decade, various clinical fields have reported an increase in clinical efficiency according to the application of artificial intelligence. In particular, recent studies in the dental field have shown excellent performance in clinical applications as a diagnostic aid system for deep learning models [
14,
15]. Many deep learning based computer aided landmark detection studies have performed better than previous studies. Lee et al. and Arik et al. adopted the basic concept of CNN for pixel classifying algorithm [
16,
17]. U-shaped deep CNN has widely been used to precisely estimate their points [
18‐
21]. However, in the case of a single CNN model, there is no uncertainty provided over model calculations, which works as a medical obstacle for the users to accept the outcome produced from the algorithm.
In this paper, we propose the novel framework for locating cephalometric landmarks with confidence regions based on uncertainties using Bayesian Convolutional Neural Networks (BCNN) [
22]. With Bayesian inference over iterative CNN model calculations, we can derive the confidence region (95%) of an identified landmark considering model uncertainty, and significantly higher the in-region accuracy. Given the uncertainty and confidence areas of the estimated location, clinicians are expected to determine whether the results of the framework are reliable and to make a more accurate diagnosis.
Discussion
Cephalometry is used as a very important criterion in the diagnosis and treatment planning of orthodontics. However, the lack of certainty about the definition of cephalometric landmarks causes inter-observer variability due to individual tendencies involved in the measurement. Manual inspection errors due to fatigue can also add to intra-observer variability. Moreover, plotting cephalometric landmark in a manual manner is a time-consuming behavior, and hence should be reduced in their time requirement to be productive. Therefore, it is necessary to establish an automatic framework of orthodontic analysis that can rapidly estimate and analyze accurate and reliable cephalometric landmarks.
We presented the automated framework for detecting cephalometric landmarks which is the first attempt to implement confidence regions (95%) around the estimated positions of landmarks. By providing the confidence regions, the clinician can intuitively gauge the accuracy of the calculated landmark of the system, especially according to its location and size. Landmark detection along with confidence regions will not only inform clinicians of the confidence on the estimated landmarks, but also efficiently reduce time by narrowing areas to be considered for clinicians’ decision making.
A considerable part of the result is that there is a distinct negative correlation between the error and SDR. The correlation is analyzed using the Pearson correlation coefficient [
28] and the coefficient between error mean and SDR (2 mm) is − 0.689. It is natural that the larger the error, the lower the SDRs, but exceptionally low SDRs are observed for Porion and A-point. This is due to low tracing accuracy. In fact, the difference between the landmarks indicated by the two experts is 3.31 ± 2.29 and 2.89 ± 2.31 (mm), respectively, which shows a large SD. This means that the tracing position of one or both experts fluctuates greatly. Indeed, there is research showing that Porion and A-point have low reliability of manual tracing [
29,
30]. The effect of A-point with low reproducibility can also be seen in Table
2. SNA and SNB have the same components other than A-point or B-point, however, their diagonal accuracy is 72.69 and 83.13, showing a high difference. Labeling errors are inevitable in areas such as landmark detection, where there is no golden standard and where ground truth must be generated through manual labeling. This problem can be mitigated with very large amounts of data or very accurate landmark tracing.
There are landmarks that are difficult to identify: A-point, Articulare, Soft tissue pogonion, Orbitale, and Gonion [
11,
20,
23,
31]. These landmarks show higher errors or lower SDRs than other landmarks. We discuss the reasons for the low performance of these landmarks. A-point is a landmark located on the curve of the premaxilla and is often influenced by the head position which makes tracing difficult. It has been discussed in preceding studies that A-point is one of the most common landmarks suffering from errors in identification [
29,
32]. Articulare, the intersection of the external dorsal contour of the mandibular condyle and the temporal bone, is an example of the cephalostat affecting the performance. When tracing Articulare, the ear rods of the cephalostat used to fix the head position are also shown on the lateral cephalogram. Those distract the model from the temporal bone. In the case of Soft tissue pogonion, as mentioned in [
31], there is a large difference between annotations in test 1 and test 2. In this study, we used both data for testing at once. Therefore, the average performance decreases. In addition, we confirmed the same tendency in the annotation for Orbitale. Due to the limitation of the lateral cephalogram of 2D, when both jaws are not exactly superimposed, Gonion could be marked on the left or right jaw. This could reduce the performance.
There remain several limitations on the framework. Because the model is trained on regional geometrical features only, spatial relationships of landmarks are not considered. This contributes to the aberrant outcome, e.g. B-point plotted inside the mouth. In addition, the proposed framework does not consider contours, so that multiple landmarks lie directly on the edges of the bone and on the skin, especially at the lower jaw. However, by implementing a mechanism which considers spatial feature such as game theoretic framework [
11] in the convolution structure, we expect our performance to improve further. Moreover, image preprocessing techniques, such as the Laplacian filter, to make the edges more prominent, can improve the accuracy of landmarks located on the lower jaw.
There are also issues regarding the generalizability and reliability of ground truth. The study considered small data (only 400 patients) but had a wide range of ages (six to 60 years). In addition, the mean intra-observer variability of the two experts was 1.73 and 0.90 mm, respectively, and the manual inter-observer variability between two experts produced a LE of 2.02 ± 1.53 mm on the test set. This is quite large variability, considering that the first precision range is 2 mm. Therefore, there is a high probability that unnecessary bias exists in the trained model, suggesting that there is a limit to clinical applications merely with this dataset. Many high-quality data generated by consensus of several experienced specialists will solve this problem. As an extension of our study, we plan to collaborate with several medical centers to collect patient data from various races and regions in order to build a model that can be applied to everyone.
An automated framework for the detection of cephalometric landmarks using Bayesian BCNN is proposed in this study. It not only has high overall accuracy within 2 mm range prediction but also provides a confidence region (95%) of each landmark. Accuracy rate within 2, 2.5, 3, and 4 mm range are recorded at 82.11, 88.63, 92.28 and 95.96% respectively.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.