Introduction

Age estimation from developing teeth is frequently required in forensic cases of skeletal remains, mass disasters, children with no identification papers and asylum seekers as well as in the fields of archaeology and anthropology. After about 14 years of age, the third molar is the only immature tooth available to estimate age. Several questions relate to estimating age from the third molar. Which methods have little bias (consistently over- or under-estimating age)? How accurately can age be estimated? What do we mean by accuracy? Do population differences in dental maturity influence accuracy? There is an urgent need for an evidence–based reference to address some of these questions. The first aim of this study was to calculate the bias (difference between dental and real age) of age–estimating methods that use mandibular third molar (M3) root formation. The probability of being at least 18 given M3 root stage is also of interest. The second aim was to apply diagnostic tests of accuracy showing how root stage discriminates between individuals at least 18 years of age and those younger and to apply this knowledge to predict the likelihood of age 18 for a single individual. For this part of this study we used a separate reference sample of 1,663 radiographs. We propose an age interval for M3 root stages to aid interpretation of the term 'on the balance of probabilities'. We highlight the similarity in M3 apex maturity between world groups from published data and illustrate how a small group difference in average age has little impact on the confidence interval of estimated age for an individual.

Materials and Methods

Bias and accuracy of age estimation

The target sample used to calculate bias was panoramic radiographs of 300 individuals aged 11 to 25 (n = 20 per year of age) shown in Figure 1. This sample consisted of radiographs from 78 male and 93 female White Caucasians and 60 males and 67 females of Bangladeshi ethnic origin (total 138 males and 160 females). These were patients X-rayed during 2002 for diagnosis and treatment at the Royal London Hospital Dental Institute and were selected consecutively from the archive of available radiographs and do not form part of the reference study. An identical number of each age group was selected. Maturity of permanent teeth is not significantly different in the two ethnic groups.1 Selection criteria were the presence of a left mandibular M3 and age from 11 to 25. Exclusions were M3 with abnormally short, malformed roots and pathology other than dental caries or where the radiographic image was inadequate to visualise the root stage or apex. Root stage of M3 was assessed by the second author with the aid of a magnifier using Figure 2 (from Liversidge,2 which are adapted from Moorrees et al.3 with added descriptions) and Levesque et al.4 (based on Demirjian et al.5). Kappa was calculated by re-assessment of root stages from 30 radiographs to determine reliability. Dental age was calculated using methods listed in Table 1.1,2,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32 Three of these do not give sex-specific data,20,30,32 several omit some stages26,30 and one stage ('Ri', root initiation) has been interpolated from an illustration.30 Two methods are detailed in Table 2 and include adapted maturity data from Levesque et al.,4 and adjusted mean age within stage from Liversidge2 for combined groups. This adjustment was the addition of 0.33 year to the mean age within stage for the combined groups. The 95% confidence interval (CI) for estimated age for a single individual for each stage was calculated from the product of 1.96 and the standard deviation (SD) of mean age for each root stage. Levesque et al.4 do not give SD but this was interpolated for this paper from the cumulative curves for each stage in their illustration and calculated using the normal equivalent deviate.33 Actual age was subtracted from dental age and mean difference (defined as bias), standard deviation (SD) and mean absolute difference were calculated for each method.34 Bias was tested using the t-test with significance level of 0.05. Methods with bias not significant to zero were further analysed by root stage.

Figure 1: Age distribution of the target sample of radiographs.
figure 1

Individuals with M3 in crown stages (blue bars), root stages (yellow bars) and mature (green bars)

Figure 2
figure 2

Descriptive criteria for root stages of mandibular left third molar (M3)

Table 1 Methods of age estimation using M3, region, sample size, age range and type of data tested on the target sample
Table 2 Estimated mean age and 95% confidence interval (CI) in years for M3 molar root stages. Adapted maturity data (halfway between mean age entering tooth stage); CI for Demirjian stages5 calculated from estimated SD from Figure in Levesque et al.4 Mean for Moorrees stages adjusted by addition of 0.33 year, pooled groups from Liversidge.2 CI calculated as mean ± product of SD and 1.96

Diagnostic tests

Diagnostic tests were carried out using data from Liversidge2 where M3 stage was crown complete ('Cc') or later with additional data from 67 individuals with mature M3 apices from London aged 23 to 25. This total reference sample was radiographic data from 1,663 individuals (White and Bangladeshi groups in London and Black and Cape Coloured in South Africa). The number of individuals in each root stage was tabulated against two age categories (younger than 18 and at least 18). The probability of an individual in this sample being at least 18 was calculated by root stage. The 95% CI of this ratio was calculated and compared between males and females and groups. No significant differences were noted and data were combined. The accuracy of diagnostic tests was investigated. A positive diagnosis was defined as being at least 18 years of age and the test was root stage of M3. A test threshold divided the reference sample into two groups. If the test threshold was 'Cc' and 'Ri', a positive test were root stages 'Ri' and any later maturity stage and a negative test was M3 having no initial root visible. If the test threshold was 'A1/2' and 'Ac' (apex closed), a positive test was the latter stage and a negative test were stages up to and including 'A1/2'. Sensitivity, specificity and likelihood ratio of a positive (LR+) and negative (LR-) test result were calculated (including 95% CI) for each threshold (root stage) as well as the area under the receiver operator characteristic (ROC) curve. Confidence intervals were calculated using SPSS 14.0 and the Excel program at http://vl.academicdirect.org/applied_statistics/binomial_distribution/ref/CIcalculator.xls. The checklist for the reporting of diagnostic accuracy studies (STARD) has been followed where possible for the reference sample (http://www.stard-statement.org/).35 Data were collected at the Institute of Dentistry, Barts and The London School of Medicine and Dentistry, London, Dental Schools of the University of the Western Cape, Tygerburg near Cape Town, University of Witwatersrand, Johannesburg, and University of the Limpopo MEDUNSA campus, Pretoria, South Africa from 2003 to 2007. Criteria for selection were recorded date of birth and date of X-ray allowing decimal age to be calculated, and a clear image of an unimpacted M3. The reference standard was age at least 18 on the date of X-ray. Root stage was assessed without blinding the age category as this reference data are part of a worldwide collaborative study (age 2-25 years) comparing the timing of all permanent tooth formation by the first author. The age range of this study was suitable and the patients were drawn from teaching hospitals that provide primary care and treatment of dental caries. For further details see Liversidge.2

Results

Bias and accuracy of age estimation

Kappa was 0.95 for Demirjian stages and 0.91 for Moorrees stages, showing excellent agreement in root stage assessment. Figure 1 shows the age and sex distribution of the target sample as well as the proportion of individuals in M3 crown stages (grey bars), root stages (open bars) and mature (grey bars). Only individuals with M3 root stages from initial root to apex half closed were included in this part of the analysis (N = 157), as once the tooth is mature, age cannot be estimated from development. Bias, SD and mean absolute difference are shown in Tables 3,4,5. Most methods showed significant bias, that is, consistently over- or under-estimated age. All methods followed the pattern of over-estimating younger individuals and under-estimating older individuals. Methods based on maturity data under-estimated age significantly. Six methods estimated age with bias not significantly different to zero. These include maturity data from Levesque et al. adapted for age estimation (see Table 2),4 data from Spain, Turkey, China and South African Blacks11,12,13,23 and adjusted data from the reference study (all groups combined).2 Measures of accuracy include SD of bias as well as mean absolute difference between known and dental ages. From the six methods showing little bias, the method with the lowest mean absolute difference was adjusted data for combined groups from the reference data at 1.45 years (Moorrees stages) and for Demirjian stages was the adapted maturity data from Levesque et al.4 at 1.71 years. Standard deviation of bias was very similar for all methods; the exception was a study based on a small sample. The reliability of estimated age is expressed as the 95% CI and indicates a 95% chance that the actual age falls within the interval. This is calculated using the SD of bias; for almost all methods this was around 2 years, making the 95% CI ± 4 years around the mean. One aspect of age estimation is to assess if an individual has reached 18. The 95% CI in Table 2 extends to 18 years of age for most root stages. A large proportion of individuals in later root stages will be at least 18 and diagnostic tests can help quantify this probability.

Table 3 Bias (mean, SD) and mean absolute difference between dental and known ages. Negative bias indicates an under-estimate. Ns: not significant
Table 4 Bias (mean, SD) and mean absolute difference (mean abs diff) of two best methods, by age group. Age groups 22-24 have only one individual each
Table 5 Bias (mean, SD), mean absolute difference (mean abs diff) for two methods by root stage.*ns p >0.05

Diagnostic tests

The probability and 95% CI of an individual from the reference sample of 1,663 radiographs being at least 18 years by M3 formation stage (positive predictive value) are shown in Table 6. Only a small proportion of individuals in early root stages were at least 18. Once the inner root walls of the distal root were parallel ('Rc') or apex was half closed ('A1/2'), the probability of being at least 18 was high and a large proportion of individuals in these stages were 18 or older. Tests of diagnostic accuracy using root stages as thresholds are shown in Table 7. A threshold is where we divide the reference sample into two groups: for instance, those up to stage 'Cc' and those with 'Ri' or more root formed. From the reference sample 967 out of 1,663 individuals were at least 18 and had root present and seven were at least 18 and were staged as 'Cc'. If M3 has initial ('Ri') or more root the sensitivity of this diagnostic test tells us that the probability of being at least 18 in this sample was 0.99. Sensitivity measures how well a test (root stage) detects disease in a study group (in our case correctly identifying age 18). Specificity is the probability that the test will produce a true negative result, in our case the former stage of the threshold, in an individual aged younger than 18. For the threshold 'A1/2' and 'Ac', specificity is high, meaning that a negative test result (M3 up to and including stage 'A1/2') discriminates well between the two age categories detecting individuals younger than 18. For each threshold there is a combination of sensitivity and specificity and these can be combined in the ROC plot reflecting the performance of a test. The area under the ROC curve was 0.904 (95% CI 0.889, 0.919) indicating that a randomly selected individual from the older age category will have a more advanced root compared to a randomly chosen individual from the younger age category 90% of the time. This suggests that diagnosing age at least 18 from M3 root stages can discriminate reasonably well between the two age categories although there is fairly high level of false negatives and false positives.

Table 6 Probability (95% confidence interval CI) of being at least 18 years by M3 stage of the reference sample. 18+/N, number of individual in that stage at least 18 divided by the total number of individuals in that stage
Table 7 Tests of diagnostic accuracy for age at least 18. N = 1,663. For root stage abbreviations and descriptions see Fig. 2 TP true positive (in latter (or more mature) stage of threshold and at least 18), FN false negative (up to former stage and at least 18), FP false positive (in latter stage or more mature stage and younger than 18), TN true negative (up to former stage and younger than 18), LR+ likelihood ratio for a positive test (latter stage of cut off), LR- likelihood ratio for a negative test (former stage of cut off)

The likelihood ratio of a positive test (LR+) for cut off point 'A1/2' and 'Ac' was 13.61. This means that a mature M3 (stage 'Ac') is more than 13.61 times more likely in an individual in the older age category compared to an individual younger than 18. LR- at cut off point 'Cc' and 'Ri' shows that if M3 is in a formation stage less than 'Ri' (negative test result) this is one twentieth times more likely in an individual in the older category compared to one in the younger category. In other words, an individual with M3 with no root present is twenty times more likely in an individual younger than 18 compared to one 18 or older. Our results show that at early root stages, a negative test result is good at predicting the probability of being younger than 18, while for apex stages a positive test is good at predicting the probability of being at least 18.

Discussion

Accuracy of age estimation

The first question we wish to answer is which methods have little bias (consistently over- or under-estimating age)? On average only six of the 37 methods tested have little bias. How accurately can age be estimated and what do we mean by accuracy? An accurate method calculates dental age close to known age. Reliability or precision of this estimate relates to the SD of bias and for all methods based on adequate sample size was around two years. Absolute mean difference is another measure of accuracy.34 In order to test bias and accuracy, several important features of both target and reference sample should be fulfilled such as sufficient number, a sufficiently wide age range in order to include early and late maturing individuals and a uniform age distribution.36 Having similar numbers across the age range ensures that accuracy is consistent over the entire age range, rather than peaking at mean age, where the largest number of individuals occur in a normal distribution. Selection of radiographs is an important issue and an archived collection of patients attending a teaching hospital is not ideal and our selection of consecutive radiographs from those available only partly overcomes this problem. Although our target sample of 300 is not large, it is clear from Figure 1 that a considerable proportion of individuals aged 20 or older had a mature mandibular left M3 and the number of individuals with developing M3s in older age cohorts decreases. Only individuals with immature third molars qualify to have age estimated by root development. Once a tooth is fully mature, age cannot be estimated by root development and these individuals are excluded from the dentally immature target sample. The inclusion of individuals with mature M3s is something that previous studies of bias and accuracy have failed to clarify.

Which method of age estimation using M3 is best? Our understanding of measuring and assessing this has advanced in the last few years with contributions from anthropology, palaeo-anthropology and forensic identification of genocide victims.37 Measures such as bias, SD of bias, mean absolute difference between dental and known ages, diagnostic tests, and Bayesian statistics are all useful to determine which method is best.34,37 It is clear from our results that only a handful of methods estimated age with little bias and the two detailed in Table 2 are the recommended methods of choice. Few previous studies compare performance of M3 age estimation. A common feature is the over-estimation of younger individuals and an under-estimation of older individuals and this probably relates to early and later maturers. An over-estimation at early root stages and under-estimation at late root stages was noted by Thevissen et al. using a sample of 780 aged 16-22.38 Our considerably smaller sample also showed this pattern. Thorson and Hägg39 found a systematic bias, under-estimating age that increased with age from a sample of 375 aged 14-25. The 95% CI of bias was considerably greater in girls (± 4.5 years) compared to boys (± 2.8 years). This greater variation in females is apparent in late root stages of maturation.2,4 Kullman40 found that age was over-estimation by over a year using Moorrees stages of third molars (age range 12-19 years, n = 72).

The mean absolute difference between biological and real age is an indication of the magnitude of inaccuracy and in this study ranged from 1.45 to 1.97 years in the six methods that showed bias not significant to zero. Previous reports are 1.3 and 1.5 years in males and females respectively7 and 1.13 years using a polynomial and Bayesian approach.38 It is unclear if these studies used independent target and reference samples. Mean absolute difference using M3 to estimate age differed considerably to the value of 0.66 year using earlier developing teeth and the method of Willems et al.41 (n = 827 developing teeth, n = 946 age 3-16 years).42

How to use Figure 3 to estimate age for a single individual

Figure 3
figure 3

Root stages of M3 with 95% confidence interval (CI) and 51% age coverage

To estimate age, a developing M3 must be assigned a formation stage. Tooth formation is a continuum which we divide into specific stages and having clearly written criteria of root maturation helps answer the question: has the root reached a specified stage or not? If a tooth appears to be between stages, Demirjian suggests that it be assigned to the earlier stage.5 Once the root stage has been selected, dental age can be read from Table 2 or Figure 3 depending on which stage assessment is used. A feature of biological maturation is that it varies considerably with age. We assume that an individual is maturing at an average age and assign a mean age after assessing M3 root stage. The 95% CI of this mean age can be interpreted as the age interval within which we are 95% sure that the individual's chronological age occurs. The average age of most M3 formation stages has a large SD up to two years resulting in a 95% CI of estimated age for a single individual of between four to six years. The range for initial root formation of M3 was 11 to 20 years; similarly the age range for apex half closed was 15 to 24 years.2 This nine year interval in timing of these maturational stages makes age estimation using M3 inaccurate compared to other developing permanent teeth. Age in individuals who are dentally advanced will be over-estimated and in those dentally delayed will be under-estimated (see Fig. 4). In our sample a few individuals mature M3 very much later than average and age is hugely under-estimated (three individuals in the penultimate root stage were 22-24 years of age).

Figure 4: Scatterplot of the difference between dental age (Levesque et al. adapted) and known age and known age.
figure 4

This shows how each root stage varies that is, stage G varies from age 16 to 24. Horizontal line indicates no difference between dental and known age

The balance of probabilities is the burden of proof in English Civil Law and this applies to age-disputed asylum-seeking individuals. We propose a new age interval (51% coverage) from the reference sample for each developing root stage. This represents an age interval centred around mean age that includes just over half of individuals in that stage. We interpret this as appropriate to express if the age of an individual with an immature M3 lies above or below an age threshold on the balance of probabilities. For example if a male presents with M3 at R1/2 (root length equal to crown height), the 51% age interval from Table 8 is 15.70 to 17.90 years. Looking at Figure 5, this interval lies below the line at 18 years. Thus, on the balance of probabilities, this individual is younger than 18. Similarly, if M3 is in stage A1/2 or mature, on the balance of probabilities, age is at least 18. These data are also available for other permanent teeth.44

Table 8 Age interval 51% coverage for M3 formation stages. N* includes 15 individuals of unknown sex
Figure 5: Age interval that includes 51% coverage of reference sample for males, females and combined sex.
figure 5

Lines at age 16 and 18 indicate that if an individual presents with M3 in A1/2 or mature, on the balance of probabilities, age is at least 18

Dental maturation and regional differences in dental maturation

Maturity is measured as the age when 50% of individuals have reached or passed a specific maturity event (see Cameron45). The mean age of a maturity event (such as gingival eruption or a tooth stage) is also defined as the average age of entering the stage; some individuals will enter the stage at a younger age, some will enter considerably later. In third molars this age range can be nine or ten years from the age of the youngest individual to the age when all individuals have reached the specific stage. The mean age of entering a stage is not equivalent to the average age 'within stage' most frequently reported. Figure 6 (left) shows the proportion of boys who have reached M3 maturity (Demirjian stage H) for age. Data were interpolated from Levesque et al. and are shown as the dotted line.4 Smoothed cumulative curves were calculated using probit regression for Koreans21 and Hispanic Texans.22 The proportion of 15 year olds who have reached M3 maturity is plotted against the midpoint of the age interval (15.5) and so forth for each subsequent age group until the age when 100% of individuals have reached this stage. Comparing maturity between groups is done by comparing the age when half of individuals have reached this stage, shown as an arrow. Figure 6 (right) shows the standard error of mean age for the group and 95% CI of mean age for an individual entering the final maturity stage. This shows that the 95% confidence intervals of mean age for a single individual in these groups overlap. Although a significant ethnic difference has been shown in South African Blacks compared to the three other groups of the reference study,2 this is at the group level where the difference in standard error of mean age in groups differed significantly to zero. A single individual from one group is not significantly different from any other group, because of the magnitude of the standard deviation and this ethnic difference at the group level is of no consequence when estimating age at the individual level.37 Results from this study show that methods based on data from South African Blacks12 and southern China23 can estimate age with little bias and similar accuracy on our target sample. These findings suggest that features of a reference sample such as size, shape, range of the age distribution and selection of radiographs are more important than the ethnic or geographic group. This also suggests that population specific reference studies are not required for age estimation of a single patient or forensic case.

Figure 6: Smoothed maturity curves (left) of M3 for males showing proportion of age group that has reached stage H (apex closed).
figure 6

Dashed line is interpolated from Levesque et al.4, other lines calculated from data of Koreans21 and Hispanics in Texas.22 Arrows indicate mean age entering this stage, when 50% have reached stage H. Confidence intervals of mean age entering stage H from these studies (right). Standard error of the group (open circle) and 95% CI for a single individual (filled circle)

Diagnostic tests: probability of being at least 18, given M3 root stage

What is the probability of an individual at a specific M3 stage being at least 18? This is expressed as the positive predictive value (PPV); it increases with root formation stage and in the reference study for a mature apex was 557/607 = 0.951. Previous studies of much smaller samples also report high values, shown in Table 9, and unreported data from Australian Aborigines are 0.885, n = 25 (Liversidge and Townsend 2006),46 Japan 0.941, n = 68 (personal communication, K.Kuroe), Malaysia 1.00, n = 84 (personal communication, K. Peariasamy), Sudan 0.934, n = 212 (personal communication, F. Elamin) and Nigeria 0.907, n = 118 (personal communication, M. Ukpong). PPV is useful, however it is highly sensitive to prevalence of the positive test group (older age category), takes account of neither false negative nor false positive values and can rarely be generalised beyond the study group.47

Table 9 Positive predictive value of M3 stage 'Ac' (left side) and majority age from published reports. This expresses the probability of an individual being at least 18 years of age if M3 is mature. N, number of individuals in stage 'Ac'

Diagnostic tests: discriminating between age categories

Positive predictive value, sensitivity and specificity are measures of how well the test (root stage of M3) discriminates between the two age categories (at least 18 years of age and younger than 18). However, these are partial measures of performance and should not be interpreted separately.48 Diagnostic tests are positive or negative, and in our case, the test result is classified as positive or negative depending on which side of the threshold level the third molar occurs. For each threshold there is a combination of sensitivity and specificity and these can be combined in the ROC plot which is independent of prevalence.48 A test that discriminates completely will have an ROC area under the curve of one, while an area of 0.5 has no discrimination. Our result of 0.904 compares well with values of 0.831, 0.863, 0.899,18 0.847 and 0.853,38 0.72 from other studies.49 Diagnostic tests help us answer a number of questions.48,50 How good is M3 root stage in detecting age 18 (sensitivity)? Our results show that thresholds at early root stages are better at detecting age 18 than late root stages. How good is this test in detecting individuals younger than 18 (specificity)? Apical stages of formation are better than early root stages. How many false conclusions occur when using this test (error rate)? Error rate is considerable at 36% and 26% at early and late root stages respectively. Lower error rate of 17% is reported using the ratio of tooth length to apical width.19 Our results show that diagnostic multilevel tests are effective at discriminating between the two age categories when the test results are at the extremes of the root maturation; that is, the diagnostic test performs well at early root stages and apical stages and as such is a useful and valid test. In forensic dentistry it is useful to know how well M3 root stage predicts the likelihood of age being at least 18. Sensitivity and specificity do not do this but describe how these two age categories predict particular test results.47

Diagnostic tests: application to a single individual

Given a root stage, what is the probability of being at least 18? Every threshold in Table 7 has a positive or negative test result and the likelihood ratios of a positive (LR+) and negative test result (LR-) can be calculated (see Habemma et al.,48 Knottnerus et al.50). These are independent of prevalence and clinically relevant to express the probability of a diagnostic test result at the patient level. Applying diagnostic tests allow us to predict the likelihood of being at least 18 given the M3 root stage. The test result of a mature M3 apex is more than 13 times more likely to occur in an individual at least 18 as opposed to someone younger than 18. Similarly LR- of 0.05 for threshold 'Cc' and 'Ri' means that an individual with a negative test (stage 'Cc') is one twentieth more likely to be seen if age is at least 18 than if age is in the younger category. In other words, an individual younger than 18 is 20 times more likely to have a negative test than an individual aged 18 and older. The values of LR+ (mature apex and age at least 18) were calculated for Hispanics in Texas (left and right side combined)22 and Koreans21 as 8.20 (95% CI 6.30-10.67) and 367.70 (95% CI 51.82-2,608.20) respectively. The smaller the proportion of dentally advanced individuals who reach M3 maturity before 18 years of age, the larger the LR+, and in this Korean group only one individual had reached maturity before 18.

Before the test is applied, the only measure of probability of being at least 18 is the prevalence of this age category in the population. In the United Kingdom, this can be calculated as 0.74 and 0.75 for males and females respectively (Office of National Statistics 200751) indicating the chance of being at least 18 if randomly selected from the population. The nationalities accounting for the highest number of applications of asylum were Afghani, Iranian, Chinese, Iraqi, Eritrean (Home Office 200752) and prevalence of being at least 18 from these countries is not well documented. Likelihood ratios indicate by how much a given diagnostic test result will raise or lower the pre-test probability of the target disorder. LRs greater than 1 increase the probability that the target disorder is present, and the higher the LR the greater this increase. Conversely, LRs less than 1 decrease the probability of the target disorder, and the smaller the LR, the greater the decrease in probability and the smaller its final value. Likelihood ratios greater than 10 or less than 0.1 are useful as the meaningful change in pre to post test probability can aid decision making for the individual. After performing a diagnostic test (assessing the root formation of M3) the probability of being at least 18 can be quantified using the likelihood ratio and Fagan's nomogram based on Bayesian theorem47 and the probability of being at least 18 if M3 is mature changes from 0.75 to 0.98. This is similar to that reported by Cameriere using ratio of tooth length to apical width.19

Conclusions

This study provides two new methods of age estimation using M3 using root stages of Demirjian and Moorrees. Both these methods show bias (difference between known and dental ages) not significant to zero and mean absolute difference as 1.45 and 1.71 years. Standard deviation of bias was around 2 years and 95% CI of estimated age will be at least ± 4 years. If M3 is 'A1/2' or 'Ac', on the balance of probabilities, age is at least 18. The similarity in M3 maturity between some world groups suggest that population specific reference studies are unnecessary to estimate age at the individual level.

The probability of being at least 18 in the reference sample if M3 was mature was 0.945. Early root stages had high sensitivity (M3 initial root and age at least 18) and rule out being at least 18, while the two last stages ('A1/2' and 'Ac') had high specificity (M3 being mature and age being at least 18 was 0.96) ruling in the age category at least 18. Area under the ROC curve was 0.904 (95% CI 0.889, 0.919). Once the M3 is mature, age cannot be estimated from root stage and the likelihood of being at least 18 is an appropriate measure at the individual level. The likelihood ratio of being at least 18 if M3 was mature was 13.61, ie 13 times more likely in an individual aged 18 or older compared to less than 18.