Introduction
Chile was one of the many countries in Latin America immersed in a socio-political conflict between neoliberalism supporters and those who opposed them with socialist ideologies [
1,
2]. This complex political situation contributed to the disruption of democracy in Chile through a violent military coup led by the armed forces in 1973. During 17 years of the dictatorial mandate, mass disappearances, illegal detentions, executions, and torture occurred, affecting 3,227 people, according to reports on human rights violations in Chile [
1].
The latest Chilean public policy is the implementation of
‘The National Search Plan for Truth and Justice’, which seeks to clarify the circumstances of victims’ disappearance and/or death and the continuation of the search, recovery and identification [
3]. The Human Rights Unit (HRU) of the Chilean Forensic Medical Service is the only public institution in charge of analysing, selecting and sampling skeletal material to identify these victims [
4]. Since 2007, following the recommendations of a panel of international forensic experts, positive identifications have been performed exclusively through DNA analysis conducted in accredited international laboratories [
5]. However, the HRU employs a multidisciplinary approach, which includes anthropological analysis of all skeletal material [
1].
Sex is an essential biological attribute when analysing unknown human remains in forensic investigations because accurate sex estimation has the possibility to eliminate individuals with inconsistent profiles from further investigative consideration in relation to identification (e.g., opposite sex) [
6]. Skeletal sex estimation in adults relies on morphological and physiological differences between males and females [
7]. This sexual dimorphism is determined by a complex interaction between genetic, functional and environmental factors [
8,
9]. In forensic investigations, the areas in the pelvis are considered the most dimorphic and accurate for sex estimation in adults [
10‐
12]. This is because the shape and functionality of the female pelvis are specific due to its obstetric adaptation, which, to some degree, transcends population variances [
8,
13]. Morphoscopic (visual) and morphometric (metric) methods have been developed to assess sex using the pelvis. The former has been criticised for being subject to observer bias; however, practitioners still prefer visual rather than metric assessments because they are faster to apply, more cost-effective and straightforward [
14,
15].
The most frequently used morphoscopic sex estimation method, also currently used by the HRU, is Phenice [
16]. This standard was created using African-American and European-American individuals and involves observing three sexually dimorphic traits of pubic bone: ventral arc (VA); subpubic concavity (SPC); and the medial aspect of the ischio-pubic ramus (MA). The presence or absence of specific characteristics in those traits classifies the individual as female or male, respectively; the assignation of 2 of the 3 traits classifies sex. The reported original accuracy of this method is 96.00% (95.56% males; 96.84% females; -1.28% sex bias). Numerous validation studies of the original method in different population samples have been performed, resulting in accuracies ranging from 58.6 to 96.6% [
12,
17‐
22]. The differences in achieved accuracy can be attributed to the application of the method to populations outside the original reference sample (e.g., distant geographically, genetically and/or temporally) as the level of sexual dimorphism is known to vary between populations [
23,
24].
Klales et al. [
22] revised the Phenice [
16] method to facilitate the admissibility requirements in court (i.e., Daubert guidelines) by including an ordinal scale of scores and a regression analysis with classification probabilities [
25]. Klales et al. [
22] used the Hamann-Todd and the W.M. Bass skeletal collections of mixed ethnicity (predominantly from the U.S.) to develop their modified version of Phenice [
16]. The accuracy of the logistic regression for estimating sex was 86.2% using all three traits (74.4% males; 98.0% females; -23.6% sex bias). The sex bias value in this method is massive, demonstrating a proportionately greater correct classification of one sex (female over male). Thus, the high differential bias renders the method’s practical application unreliable [
26]. However, paradoxically, the authors never referred to this issue. The Klales et al. [
22] method was also tested in non-US samples, showing correct classification accuracies ranging from 66 to 95% [
27‐
30].
Considering the importance of providing scientific information in the medico-legal system, sex assignation should include statistical probability of correct classification established on population-specific data [
26]. Thus, some researchers have recalibrated the logistic regression equation by Klales et al. [
22] to include
population-specific applications, all of which improved classification accuracy and decreased sex bias values [
28‐
30].
A small number of studies have considered population-specific sex estimation standards for Chileans; all of them are morphometric, using measurements of long bones and the scapula [
2,
8,
31]. However, forensic practitioners in the HRU still prioritise morphoscopic sex estimation methods, including Phenice [
16]. Therefore, the present study is designed for direct end-user application and aims to examine the accuracy of the original Phenice [
16] and Klales et al. [
22] methods, and thereafter present new population-specific logistic regression equations for the Chilean population. The latter will be particularly important for those cases associated with the identification of human rights victims and unknown skeletal cases dated from the second half of the 20th century.
Discussion
The present study assessed the performance of two well-known morphoscopic sex estimation methods in a Chilean population. Currently, there are no validation studies for those methods specific to the Chilean population. Therefore, the results of this study serve to facilitate informing forensic practitioners of error rates associated with both methods. The classification accuracies obtained were over 85%. However, they demonstrated a high level of misclassification between sexes, revealing the need for population-specific models. Therefore, 14 population-specific equations derived from Chilean data were presented, most providing correct classification according to sex > 90% and half with an associated sex bias value of ~ 5%. These functions will enhance the ability of forensic practitioners working with Chilean human rights cases and unknown skeletal remains associated with atrocities of the second half of the 20th century to achieve more accurate outcomes leading to potential identifications.
Intra-observer agreement
The reliability of any forensic method (i.e., quantification of observer’s error) is just as important as achieving an accurate classification of sex; ethical and professional practice mandates that you cannot have one without the other. According to the Kappa statistic values presented here, all pelvic traits for both methods showed an ‘almost perfect agreement’ (
K > 0.81), according to Landis and Koch [
34]. The only trait that showed a Kappa value under 0.90 was the MA when applying the Phenice [
16] method. This result corresponds with a comparable study testing the same method in a Portuguese population, indicating that MA was the least reliable trait among the three assessed [
21]. In addition, this result aligns with the warnings by Phenice [
16], who noted that the medial aspect of the ischiopubic ramus was likely to be the most ambiguous trait of the three assessed.
Frequency distributions
When analysing the distribution of the presence-absence of features applying Phenice [
16], the most accurate in females was the VA, with only five individuals misclassified, and the least accurate was the SPC. The number of misclassifications in males was noticeably low for all traits (< 3.0%). VA and SPC showed the highest accuracies in males, and MA was the lowest. Overall, for both sexes, the VA was shown to be the most accurate sex indicator, which accords with Phenice [
16] and previous studies examining this method [
19,
36,
37]. On the other hand, similarly to this study, the MA has also been reported as the least accurate indicator in European males [
18], Mexicans [
28], Hispanics [
29] and Portuguese [
21].
When analysing the frequency distribution after applying the Klales et al. [
22] scoring system, females predominantly clustered into the lower scores (1 and 2), with a score of 5 not being assigned. Similar score distributions were described by Gómez-Valdés et al. [
28] in Mexican females. A score of 3 was present in less than 10.0% of Chilean females for each trait, except for the SPC. Most females scored 2 (62.3%) and 3 (31.9%) for the SPC, indicating predominately intermediate shapes in this trait for Chilean females. Further, 48.0% of males also scored 3 in this trait, showing a considerable overlap between sexes, which could indicate a smaller level of sexual dimorphism for this feature in this population.
Males were slightly more variable than females in score frequency when applying Klales et al. [
22], similar to what was observed in the
‘Hispanic’ samples in the study by Klales and Cole [
29]. Chilean males were mainly grouped into mid-high scores (3 and 4), similar to the scores reported by Kenyhercz et al. [
30] for their
‘Hispanic’ sample. ‘
Hispanic’ has been defined by the U.S. Census Bureau as a
‘person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race’ [
38]. Thus, it is not surprising that these scores are similar to those recorded by previous studies on Hispanic populations, considering that the term ‘
Hispanic’ encompasses all people of Spanish lineage [
39]. Only two Chilean males scored 1, and less than 5.0% of the male sample for each trait scored 5, except for the VA. These results indicate that Chilean males predominantly display intermediate shapes in most pelvic traits and are less robust overall than the males in the sample analysed by Klales et al. [
22].
Classification accuracy of Phenice (1969) and Klales et al. (2012)
The Phenice [
16] method applied to the Chilean population performed as expected, with 96.98% correct classification, slightly higher than reported in the original study (Table
5). This result is comparable to Rae Jager and Eliopoulos [
21], who examined a Portuguese population, achieving 96.0% accuracy. Similarities in skeletal morphology might exist between these populations, considering that Spain conquered Chile in 1541, and the immigration of other European countries (including Portugal) to Chile started at the beginning of the 19th century [
40].
The sex bias value for the Chilean population is higher than reported by Phenice [
16] (Table
5). Only two males (65 and 78 years old) and six females were misclassified; all females of those misclassified were over 50 years old (average age 63 years old). Previous studies suggested that the accuracy of the Phenice [
16] method decreases in females with increasing age at death, which corresponds with the results of this study [
12,
17].
A recent study by DesMarais et al. [
41] examined age relative to greater sciatic notch (GSN) morphology in Australian females; it was demonstrated that this trait becomes narrower with increasing age. Interestingly, the latter was significant only in menopausal females (> 49 years old) and not in males of the same age. This finding could indicate that female pelvic morphology changes as age increases, affecting the GSN and potentially other features in the pelvis. However, it is worth noting that Sharma et al. [
42] examined morphological changes in pelvic bone remodelling in women through life, with specific reference to parturition. That study concluded that the phenotypic plasticity detected in older women was due to childbirth and not related to increasing age, as other studies suggested. The present study has no clinical information about parturition, so the hypothesis of Sharma et al. [
42] cannot be tested. Nevertheless, all females misclassified in this study were > 50 years old, with 50 years being the average menopause age in Chile [
43].
The overall original classification accuracy of Klales et al. [
22] was shown to be similar to that achieved in the present study (87.2%) (Table
5). However, the percentage of correct classification achieved in this study was lower than in other non-U.S. populations testing the same method, such as Mexico (95%) [
28], South Africa (93.5%) [
30], and Portugal (92.7%). A possible explanation for this result is that due to the variations in levels of sexual dimorphism between populations, the range of variation and descriptions given by Klales et al. [
22] (scoring 1 to 5) might not align with the degree of morphological variation existent in the Chilean population.
Relative to the sex bias values, the Klales et al. [
22] method showed a lower value in this population than in the original study (Table
5). However, it was still unacceptably high at -15.4%. From a total of 196 males, 33 aged 20 to 94 (49 years old average age) were misclassified; no evident trend in the age distribution of these individuals was observed. The fact that a higher percentage of males was classified as females could indicate that Chileans have a smaller degree of sexual dimorphism than the population analysed by Klales et al. [
22] and/or the range of variation proposed by Klales et al. [
22] does not fit with the morphology of Chilean males.
Furthermore, the 33 males misclassified using Klales et al. [
22] included the same two males misclassified using the Phenice [
16] method (see above). Further, the single female misclassified using Klales et al. [
22] was similarly misclassified using the Phenice [
16] method. Thus, those three individuals are likely outliers relative to sex in the Chilean sample, especially considering they were misclassified using both standards. It is also possible that biological sex is incorrectly recorded in the collection records.
Population-specific models
A total of 14 functions were formulated using the Chilean population data. The univariate population-specific equations using VA (P1) and the multivariate equation using the combination of all three pelvic traits (PM4) showed higher overall accuracies than the original method of Phenice, both functions with a sex bias slightly over the acceptable limit (5.2% and 7.7%). The univariate population-specific equation using VA (K1) and the multivariate equation using the combination of VA and SPC (KM1) showed better overall accuracies than the original Klales’s method, with a sex bias of 5.2% and 4.6%, respectively. These results support previous studies indicating that population-specific equations outperform the original non-specific methods, increasing percentage of correct classification and reducing the sex bias [
28‐
30,
44].
The VA was the most accurate trait in the univariate functions for the Chilean population. This supports Phenice’s statement, indicating this feature
‘is the least likely to be ambiguous’ [
16]. In addition, this also accords with previous studies indicating that the VA is the most accurate indicator of sex [
19,
21,
28,
29]. Multivariate functions varied in classification performance; from the eight proposed, the most accurate included all pelvic traits (PM4) using Phenice’s method, and the function that includes the VA and SPC (KM1) using Klales et al. [
22], both with over 96.0% overall correct classification.
When comparing univariate and multivariate functions, the univariate function analysing the VA is the most accurate, considering the percentage of correct classification and the sex bias value. Using this univariate function will be beneficial in analysing human rights and forensic cases, especially because most associated skeletal remains are found incomplete or fragmented.
Finally, although the focus of this study was to create population-specific models to be applied mainly in cases of human rights, and to some extent to criminal cases of the same temporality (~ 1970s), it would be beneficial to explore if these models could be applied with the same accuracy to contemporary forensic cases, or if there is a need to update these models to the modern contemporary population.
Limitations of the study
The main limitation of this study concerns ‘
collection biases’ inherent to the analysis of physical skeletal collections. These biases can include the under-representativeness of one particular sex, socio-economic status, and age distribution (amongst other factors) [
24,
45]. The present study has an under-representation of females, representing only 26.0% of the total sample. 53.6% of the female sample is between 50 and 79 years old, with the male sample more equally distributed relative to age. In addition, most individuals analysed came from areas of low-income status, occupying the cheapest burial sites in the General Cemetery of Santiago [
32]. It is acknowledged that the equations derived from the data analysed are optimised for the sample studied [
23,
24]. Therefore, applying these models to a broader, more diverse Chilean sample (e.g., including different socio-economic backgrounds) needs to be tested as adjustments could be needed.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.