Introduction
Differentiated vulvar intraepithelial neoplasia (dVIN) is the immediate precursor of human papillomavirus (HPV)–independent vulvar squamous cell carcinoma (VSCC), and is postulated to develop on the background of chronic dermatoses, driven by
TP53 mutations [
1‐
4]. Recent literature suggests that dVIN has an accelerated rate of progression to VSCC (median interval: 41.4 months), and a high recurrence rate [
5‐
7]. In view of this, current treatment guidelines [
8,
9] recommend surgical excision of lesions that are histologically diagnosed as dVIN. Evidently, accurate histological diagnosis is crucial to allow appropriate patient management.
On histology, distinguishing dVIN from dermatoses, such as lichen sclerosus (LS), can present a challenge, as dVIN often exhibits subtle atypical features that mimic the reactive changes seen in chronic dermatoses [
10‐
12]. The difficulty of diagnosing dVIN can give rise to diagnostic variability, which has the potential to critically affect treatment decisions [
13,
14].
Although the diagnostic difficulty of dVIN has been acknowledged in literature [
2‐
5], there is insufficient data on the inter-observer agreement in the histological assessment. In a previous study, we established the features that helped to reliably distinguish dVIN from LS, and could be interpreted with substantial agreement by pathologists at our center [
15]. However, it remains to be determined whether similar level of agreement can be achieved between pathologists from different practice settings.
In the current study, therefore, we evaluated the inter-observer agreement for the diagnosis, and in the interpretation of histological features of dVIN, among a bi-national, multi-institutional group of pathologists. We also assessed the perception of the pathologists regarding the diagnostic usefulness of the histological features. Our aim was to thereby identify reliable diagnostic features that may facilitate the diagnosis of dVIN. In addition, we correlated the immunohistochemical expression patterns of p53 with the consensus histological diagnoses, as this marker is frequently used as an ancillary tool to support the histological diagnosis of dVIN.
Discussion
To the best of our knowledge, this is the first bi-national, multi-institutional, ring-study to assess the inter-observer agreement in the histological assessment of dVIN. Agreement on the diagnosis between nine participating pathologists was moderate, while that between the participant pairs varied from slight to substantial. These results were similar to that of the only previous study on inter-observer agreement in dVIN [
13], and indicate that the diagnostic agreement for dVIN remains suboptimal.
As histological diagnoses guide treatment decisions, variability in the diagnoses can result in treatment disparities [
31]. Therefore, to improve the diagnostic reliability and to assure a similar standard of care, we suggest consensus evaluation of dVIN cases with a panel of pathologists experienced in vulvar neoplasia. Regular inter-disciplinary communication between gynecologists/dermatologists and pathologists can also enhance relevant knowledge and expertise.
An essential step to ensure a reliable histological diagnosis is to identify representative features which can be reproducibly interpreted by pathologists. We identified the most helpful features as parakeratosis, cobblestone appearance, chromatin abnormality, angulated nuclei, atypia discernable under × 100, and altered cellular alignment, based on the proportions of substantial/near-perfect agreement between the participant pairs, and the ratings of diagnostic usefulness. We observed that the participants recorded parakeratosis and cobblestone appearance as very useful for diagnosing dVIN, particularly where the nuclear atypia could not be discerned under × 100.
Previously, van den Einden et al. proposed that the presence of atypical mitoses in the basal layer, basal cellular atypia, dyskeratosis, prominent nucleoli, and elongated and anastomosing rete ridges were the most predictive features of dVIN [
13]. In a subsequent survey among vulva pathology experts, only basal layer atypia was judged by consensus as an “essential” diagnostic feature [
14]. However, neither of these studies assessed the agreement in the interpretation of these features. In our previous study, we obtained substantial agreement in the interpretation of macronucleoli, angulated nuclei, individual cell keratinization, deep keratinization, and deep squamous eddies, between pathologists at our center [
15]. In the current study, however, similar level of agreement for these features was not observed. We speculate that our previous results may have been influenced by the similar standard of histological interpretation among participants who work in close collaboration at the same center.
In this study, we also correlated the histological consensus diagnoses with the immunohistochemical expression of p53, as this marker is commonly used to aid the diagnosis of dVIN. p53-mutant patterns have been reported to accurately reflect underlying
TP53 mutations, which characterize dVIN [
19,
20,
32]. Substantial concordance of p53-IHC patterns with the histological consensus diagnoses was recorded, which confirms that routine use of this marker can improve the diagnostic accuracy for dVIN.
However, 6 (26%) of the slides in this study that were diagnosed as dVIN by consensus, showed wild-type p53-expression. This is in line with recent literature, which states that 17–42% cases of dVIN can show wild-type p53-expression [
4], and implies that p53-IHC may not effectively inform the diagnosis in every case of dVIN. Furthermore, p53-IHC patterns in VSCC and the adjacent dVIN may not show perfect concordance [
22]. A recent study reported that while dVIN adjacent to p53-wild-type VSCC always shows wild-type p53-expression, dVIN adjacent to p53-mutant VSCC can show wild-type p53-expression in 31.4% of cases [
22]. In our study, all of the lesions judged as dVIN by consensus and showing wild-type p53-expression were present adjacent to VSCC. Similarly to the previous study [
22], we observed that 67% (4/6) of these VSCCs showed wild-type p53-expression, while 33% (2/6) showed p53-mutant patterns (results not presented). This limitation of p53-IHC should be borne in mind particularly when using this marker to confirm the presence of dVIN in resection margins of VSCC. For dVINs that show wild-type p53-expression, the diagnosis defers to histological assessment, which, as our study indicates, may be fraught with variability. In view of this, we believe that ancillary biomarkers (immunohistochemical/molecular) need to be established to aid the diagnosis of the p53-wild-type subcategory of dVIN.
Through this study, we intended to estimate the diagnostic variability of dVIN in the real world. To ensure an accurate representation of this variability, (i) pathologists with varying levels of experience and from academic and non-academic centers were included, (ii) diagnostic criteria were not pre-determined to allow the participants to interpret the histology in light of their own experience, and (iii) assessments of outlier participants were not excluded.
Nevertheless, there are several limitations of this study. We used the majority (consensus) diagnosis of each slide to determine the diagnostic gold standard. It could be argued whether the consensus represents another diagnostic opinion rather than a standard of truth. dVIN is known to originate in a background of chronic dermatoses, and there is no clear, universally accepted threshold for identifying atypia/dysplasia. This threshold is often influenced by the pathologists’ training and/or practice experience. Unless a reliable IHC marker is established, every method to ascertain a gold-standard diagnosis will have some bias.
There is also little consensus on the ideal method for measuring observer agreement in pathology diagnosis. It has been suggested that both percentages of agreement and
ĸ-statistics do not take into account the prevalence of a particular diagnosis in a set of cases, or completely rule out concordances due to chance [
33,
34]. Validity of the cut-offs that are used to interpret levels of agreement from
ĸ-values has also been challenged [
30,
35].
It could also be argued whether our study over-estimated the diagnostic variability. Unlike in routine practice, participants diagnosed the slides without clinical information, serial sections, or IHC. The selection contained a higher proportion of dVIN than no-dysplasia slides, which may not reflect routine practice. We lacked statistical power to evaluate the influence of level of experience or practice setting on the diagnostic variability. Furthermore, the inter-observer agreement in the interpretation of p53-IHC was not assessed. To gain further insights on these contexts, we have set up a larger study among geographically disparate group of pathologists, which includes the assessment of p53-IHC.
In conclusion, the suboptimal level of diagnostic agreement for dVIN observed in this study affirms the difficulty of the diagnosis. We identified parakeratosis, cobblestone appearance, chromatin abnormality, angulated nuclei, atypia discernable under × 100, and altered cellular alignment as helpful diagnostic features of dVIN. For cases with a histological suspicion of dVIN, we suggest consensus-based pathological evaluation to improve diagnostic reliability.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.