Discussion
The mouth serves as the entry point for the digestive system, where food is prepared for digestion with the help of the teeth. Oral health is a significant public health concern beyond personal health problems, as disorders in oral health have been linked to various diseases, including gastrointestinal system conditions [
53], cardiovascular diseases [
54], and diabetes mellitus [
55]. Additionally, these disorders can impose a significant financial burden on countries [
56]. Therefore, the assessment of oral hexalth-related risk factors is critical to the maintenance of both general and oral health.
Our study revealed that several factors contribute to a high risk of caries, including age, sex, body mass index, tooth brushing frequency, socioeconomic status, employment status, education level, marital status, hypertension, diabetes, renal disease, consumption of sugary snacks, dry mouth, and time spent in front of the TV, telephone and computer. Age is considered to be a risk factor for poor oral health, as the impact of factors causing caries and periodontal disease on teeth increases with age [
57]. Studies using artificial intelligence have reported an increase in tooth loss [
58], root caries [
35], and early childhood caries [
30,
32] with age. The findings of our study are consistent with the literature in this regard.
When dental caries incidence rates are analysed according to sex, it is generally observed that the prevalence of dental caries is greater in females than in males. This is often attributed to one of three factors: earlier tooth eruption in females, resulting in longer exposure to the caries-forming oral environment; easier access to food sources for women; and hormonal fluctuations during processes such as menstruation and pregnancy [
59]. Hung et al. [
35] demonstrated a significant relationship between sex and caries risk, which is similar to the findings of our study, whereas Park et al. [
30] concluded that there was no significant relationship between sex. However, it is important to note that their study investigated only early childhood caries in children aged 1 to 5 years and did not consider the factors that contribute to a higher incidence of caries in women. However, it is important to note that their study investigated only early childhood caries in children aged 1 to 5 years and did not consider the factors that contribute to a higher incidence of caries in women. We believe that these differences in findings are due to the age group studied and the exclusion of relevant factors.
Overweight and obesity are major global public health problems that are characterized by excess body fat relative to lean body mass [
60]. Factors strongly correlated with the predisposition to overweight and obesity include decreased physical activity, increased sedentary lifestyle, and poor dietary habits [
61]. In our study, a significant relationship was found between body mass index and caries risk group. The greater and more frequent consumption of foods rich in fat and carbohydrates in overweight individuals may explain this relationship.
Among the habits affecting oral health, tooth brushing frequency and consumption of sugary snacks were found to be among the factors associated with increased risk of caries according to our study. It is important to note that the consumption of sugary snacks is a significant risk factor for caries formation, with studies showing that it increases the risk of caries fivefold [
62]. This finding is supported by the findings of other studies in the literature [
29,
30,
32,
58]. Flossing, which is a crucial aspect of oral health, is known to prevent root caries [
63]. Although flossing was not identified as a significant caries risk factor in our study, Hung et al. [
35] reported that nonusers had a greater incidence of root caries than flossers. It is important to note that their study focused only on root caries, and the limited number of flossers in our study may have contributed to the difference in findings.
Regular dental care by a professional increases the chances of early detection, prevention and treatment of oral diseases [
64,
65]. Previous studies have shown that people who do not receive regular dental care from a professional have worse oral health than those who do receive regular dental care [
66]. Our study revealed significant correlations between socioeconomic status, employment status, education level, and caries risk groups, which is consistent with the findings of other studies in the literature [
28‐
30,
35,
58]. Social factors are likely to affect access to dental care, and it can be concluded that they affect oral health and caries risk.
Chronic diseases are defined as medical conditions that require a life-long course of treatment and last for more than 3 months. These diseases affect elderly people more frequently, with 80% having one chronic disease and 50% having at least two [
67]. There is a relationship between oral diseases and systemic chronic diseases, with inflammation being a key factor linking most of these conditions [
68]. The study revealed that caries risk groups were associated with hypertension, diabetes and chronic kidney disease. Diabetes causes periodontal damage [
69] and dry mouth [
70] by directly affecting the salivary glands, which has a negative impact on oral health. Hypertension indirectly exacerbates caries, as antihypertensive drugs can cause xerostomia by decreasing saliva secretion [
71]. Our study found that dry mouth is a significant factor for increasing the risk of caries (
p = 0.005). Other studies by Hung et al. [
35] and Elani et al. [
58] also identified diabetes mellitus and hypertension as risk factors for caries. However, unlike our study, Hung et al. [
35] reported that stroke, heart disease, COPD, vision, walking and memory problems, and Elani et al. [
58] reported that heart disease was also associated with an increased risk of caries. Individuals with physical and mental disabilities, such as visual impairment, inability to walk, and memory problems, may have difficulty maintaining oral hygiene. As a result, they are at an increased risk of developing caries. In our study, we found that systemic factors such as stroke, heart disease, COPD, and physical and mental disabilities, such as visual, walking, and memory problems, were not associated with caries risk groups. This may be due to the lower number of individuals with these diseases and disabilities in our study, unlike those with diabetes mellitus and hypertension.
There is a suggested correlation between spending more than 3 h in front of a screen, being married, and oral health [
26,
35]. Although marital status did not directly affect caries risk, it was strongly correlated with age, which is one of the factors directly affecting caries risk (
p < 0.00001). Additionally, married individuals may neglect personal care and oral hygiene due to their busy schedules. Increased screen time may lead to a sedentary lifestyle and unhealthy living conditions. Thus, it can be assumed that social factors such as these can indirectly increase the risk of caries.
Research indicates that the use of tobacco and the consumption of alcohol increase the risk of dental caries [
72,
73]. By altering the temperature and humidity of the oral environment, smoking negatively affects the buffering capacity of saliva [
74]. This altered environment causes the bacterial flora to deteriorate, leading to an increase in cariogenic bacteria [
75,
76]. Similarly, toxic substances such as nicotine found in cigarettes can cause periodontal disease by affecting the immune response in the surrounding tissues [
74]. Our study showed that smoking is not a contributing factor to caries risk. In the present study, we aimed to investigate the prominent indicators of dental caries at the level of the community. Therefore, we included factors that may be directly associated with dental caries, as well as other variables that may affect these factors. This study did not aim to investigate the effect of any factor alone on the risk group but rather to evaluate all factors together and select the appropriate machine learning algorithms to determine the risk group. Therefore, although tobacco use is expected to have an effect on dental caries incidence, the lack of significant results may be due to the fact that dietary habits, oral hygiene knowledge, lifestyles and social factors other than smoking vary from person to person. Additionally, alcohol consumption may increase host susceptibility to infections such as periodontitis because ethyl alcohol increases susceptibility to infections by impairing the function of neutrophils, macrophages and T cells [
77]. In addition to its direct effects, poor oral hygiene in alcoholic patients is one of the main effects of alcohol on oral health [
78]. In our study, alcohol consumption was not found to be a significant risk factor for caries. This is probably due to its low prevalence (7.7%) in our study group. In contrast to our findings, Hung et al. [
35] reported that tobacco and alcohol use significantly contribute to the risk of root caries. Most of the machine learning studies in the literature dealt with caries risk in children. Smoking status and alcohol consumption were not evaluated. For this reason, as there are no studies in the literature that have evaluated this variable in adults, our results could not be discussed with another study other than the study by Hung et al. [
35]. We recommend that future studies should evaluate the effect of smoking on the risk of caries in adults.
Machine learning is being used in oral health to provide dentists with a tool to improve the oral health status of individuals, enabling them to make early decisions to prevent dental caries and thus improve overall quality of life. There are many studies in the current literature using machine learning techniques to assess oral and dental health. Kang et al. [
26] collected data from a child oral health survey conducted by the Korean Centre for Disease Control and Prevention in 2018 and created a dental caries prediction model using the RF, gradient boosting decision tree (GBDT), SVM, LR, artificial neural network, convolution neural network, and long short-term memory machine learning algorithms. RF achieved the highest performance compared to the other machine learning methods, with 92% accuracy, 90% F1-score, 94% precision and 87% recall. As in this study, the RF algorithm was very successful in our study with 86% accuracy, 87% F1-score, 87% precision and 87% recall.
Kang et al. [
27] conducted another study with the same dataset and used GBDT, RF, LR, SVM and long short-term memory algorithms; GBDT achieved the highest success, with an accuracy, F1-score, precision and recall of 95%, 93%, 99% and 88%, respectively. In this study, the DT model achieved 82% accuracy, 82% F1-score, 82% precision and 82% recall.
Ramos-Gomez et al. [
28] analysed the answers given by the parents or caregivers of children to questions asked to predict the probability of dental caries in children aged 2–7 years using the RF machine learning algorithm and obtained accuracy rates of 62% and 73% for active caries and caries history, respectively.
Sadegh-Zadeh et al. [
29] sampled a total of 780 parents and children under the age of five to assess the risk of dental caries in children aged 5 years and under. They employed ten different machine learning modeling techniques to build a highly accurate classification model to predict caries risk with this data and showed that RF and MLP machine learning models had the best accuracy of 97.4%. In our study, as in this study, the MLP model was the most successful model with 96% accuracy.
Hung et al. [
35] used data from the 2015–2016 National Health and Nutrition Examination Survey to predict root caries and revealed that the SVM algorithm performed best, with 97% accuracy, 94% specificity, 95% precision and 99% recall, for identifying root caries. In our study, this algorithm demonstrated 84.2% accuracy, 84% F1-score, 84% precision and 46% recall.
Park et al. [
30] analysed the data of 4195 children between 1 and 5 years of age from the Korean National Health and Nutrition Examination Survey (NHANES) data from 2007 to 2018 for the prediction of early childhood caries using the LR, XGBoost, RF and LightGBM algorithms and calculated the model with the highest accuracy rate among the four prediction models as the LR with an accuracy rate of 76%. The LR model achieved 47% accuracy in the present study.
Yang et al. [
31] used linear regression and RF classifier machine learning algorithms to estimate the DMFT scores of 12-year-old children and reported prediction accuracies of 15.24% and 43.27%, respectively.
Kumar et al. [
79] utilized machine learning algorithms, including RF, DT, LR and NB, to provide a model for dental caries detection and showed that DT provided a more accurate model with an accuracy level of 85.62%. The NB model, which was also used in this study, showed 77% accuracy, 85% F1-score, 80% precision and 90% recall in this study and 29% accuracy, 19% F1-score, 46% precision and 30% recall in our study, making it the least successful model in both studies.
Qu et al. [
32] used the LR, RF and AdaBoost algorithms to create an early childhood caries risk prediction model based on behavioural factors and showed that the RF model had the highest accuracy (82%).
Elani et al. [
58] conducted a study using extreme gradient boosting trees, RF, neural networks, a light gradient boosting machine, and LR models to determine the socioeconomic predictors of tooth loss and reported that the RF model achieved the highest performance, with an accuracy rate of 84.3% for edentulism.
Karhade et al. [
33] used Google Cloud AutoML to develop an automated machine learning algorithm to classify children according to early childhood caries status, and the model considering only 2 variables (child's oral health status and child age) showed a high accuracy rate of 67%.
Wang et al. [
34] used machine learning algorithms, including the extreme gradient boosting and NB algorithms, to predict the oral health status index score and referrals for treatment needs (RFTN) in children aged 2–17 years. They used random bootstrap samples with manually added Gaussian noise and achieved 93% recall and 49% specificity in predicting RFTN.
Kang et al. [
26], Sadegh-Zadeh et al. [
29], Elani et al. [
58], Yang et al. [
31] and Qu et al. [
32] found that RF is the machine learning model with the highest success rate. In our study, this model was the second most successful model after the MLP algorithm. Additionally, we found that the DT machine learning model has an accuracy of over 80%, which is also consistent with the findings of Kang et al. [
26] and Kumar et al. [
79].
Some of our features exhibited a low correlation level with DMFT, prompting us to explore feature selection. During the feature selection process, we considered the correlation of independent variables with DMFT. Initially, we set the correlation threshold at 0.1. In this case, variables such as age, tooth brushing frequency, socio-economic status, employment status, education level, and marital status remained in the dataset, while other independent variables were excluded. With a correlation threshold of 0.1, the accuracy rates for NB, LR, SVM, DT, RF, and MLP models were 52%, 48%, 44%, 53%, 54%, and 52%, respectively.
Subsequently, we adjusted the correlation threshold to 0.05. This time, 18 independent variables, including age, gender, BMI, tooth brushing frequency, socio-economic status, employment status, education level, marital status, hypertension, diabetes, renal failure, consumption of sugary snacks, frequency of snack consumption, radiotherapy history, dry mouth, time spent in front of the TV, phone, or computer, and walking disorder were included, while 12 independent variables with a correlation with DMFT lower than 0.05 were excluded. In this case, the accuracy rates for NB, LR, SVM, DT, RF, and MLP models were 38%, 48%, 45%, 74%, 76%, and 77%, respectively.
In both scenarios, there was a slight increase in accuracy for NB and LR models. However, notably, when the correlation threshold was set at 0.1, SVM, DT, RF, and MLP models exhibited dramatic decreases in accuracy. As a result, the highest level of accuracy was achieved when all independent variables were included without feature selection.
This study is distinctive from other studies in the literature because it focuses on caries risk group assessment rather than caries presence. This approach goes beyond existing studies and offers a more effective strategy for identifying the caries potential of individuals and taking preventive measures in advance. In addition, to the best of our knowledge, this is the first study to address oral health risk groups in adults with machine learning algorithms. Another valuable advantage of this study is that it clearly demonstrates the relationships between oral health risk factors of individuals and the interactions of these factors with DMFT risk groups. However, interpreting these relationships only from the table may lead to misleading results, as all the other variables are not equal. Furthermore, this study reflects only the dietary and social practices of the Turkish population.
The prospective collection of the data used in our study can be considered one of the limitations of the study because of the limited dataset. From a scientific point of view, using larger datasets may increase the strength of the general validity of the findings obtained in the study.