Introduction
Despite therapeutic advances over the past 2 decades, the prognosis of glioblastoma remains poor. Virtually all patients eventually progress after standard therapy with radiation and temozolomide, and succumb to their disease. Time of progression, however, is variable, and determination of disease status is required to select patients for second line therapies, including clinical trials. This is often challenging as treatment-related changes, commonly referred to as “pseudo-progression”, occur in 20–30% of patients and at a higher rate in patients with MGMT promoter methylation [
1‐
5]. Unfortunately, contrast enhanced magnetic resonance imaging (MRI) as well as other advanced imaging techniques are currently unable to reliably differentiate true tumor progression from pseudo-progression [
4‐
6]. As a result, the “gold standard” for assessment of disease status in many cases of presumed progression of glioblastoma rests on histopathological assessment. Several studies have suggested that pseudo-progression can be confirmed histologically, but determination of pseudo-progression versus active disease may or may not be prognostically significant [
7‐
12].
Although considered the current “gold standard” for assessment of disease status, neuropathological assessment of recurrence can be difficult, as tissue frequently contains a mixture of treatment-related changes along with viable tumor in varying amounts [
8,
10‐
12]. As a consequence, there can be discrepancies in the pathologic diagnoses when the same specimens are read by different neuropathologists. Incomplete tissue sampling can also be a confounding factor.
Confidence in a “standard” clinical diagnostic assay requires that similar results are obtained from identical specimens evaluated by different observers. As a result, we designed this survey to evaluate the consistency of a neuropathological diagnosis in patients who had completed standard radiation with concurrent temozolomide and subsequently underwent early surgery for a presumed recurrence of their glioblastoma.
Discussion
Assessment of disease status in patients with presumed recurrence of glioblastoma remains a major challenge in neuro-oncology. As clinical histories, physical examinations, and advanced imaging studies fail to reliably distinguish active tumor from treatment effect, or pseudo-progression, patients require a surgical procedure to clarify the underlying cause of their deteriorating clinical and/or radiographic status. Prior studies have focused on the diagnostic challenges of histopathological determination of disease status at presumed recurrence, highlighting the importance of this issue. One of these studies demonstrated a relationship between diagnosis of disease activity and survival [
8], when cases were distinguished into active tumor present in any amount versus no active tumor present. Other studies have failed to find consistent relationships between histopathological findings and outcome [
9‐
12]. However inter-interpreter discrepancies in pathological interpretation have so far not been addressed.
This study had two principal findings. First, while some surgical specimens were generally agreed to have active tumor or no active tumor, there were other cases wherein the pathologic pictures were diverse. Highly variable diagnoses were the result. In the latter cases, there were almost equal distributions of responses between “active tumor”, “inactive tumor/treatment effect” and “unable to classify”. The second finding was the variable discordance between the percent of “active tumor” or “treatment effect” and the final pathologic diagnosis (Fig.
4).
Confidence in any “standard” clinical diagnostic assay requires that similar results are obtained from identical specimens evaluated by experienced interpreters. Certainly, diagnostic evaluation of tissue samples requires a degree of subjective distinction that may be difficult to standardize across diverse clinicians and institutions. However, the fact remains that there is a lack of diagnostic consistency in patients with glioblastoma who undergo surgery to establish a firm diagnosis after completing chemoradiation. The findings brought to light in this study have important implications, since an accurate pathologic diagnosis is critical in directing standard and experimental care in this setting. Ideally, a given test would reflect certainty about the underlying disease status with 100% sensitivity and specificity. However, as these criteria are difficult to meet in clinical practice, the best
available test is often accepted as the “gold standard” [
16]. The results of this survey suggest that uniform agreement is lacking among pathologists reading the same tissue sections. Entry into clinical trials would be directly affected if one pathologist designated a patient as having active tumor while another decided that the same specimen was not diagnostic for active tumor. Histopathology thus might not be predictable enough to be used as a final reference standard.
The problem of inter-interpreter consistency is not unique to neuro-oncology. One prominent example in which the lack of consistency in tissue testing provided a great challenge is Her-2 testing in patients with breast cancer. Testing and test interpretation was not uniform, and this problem was of great clinical relevance as presence or absence of Her-2 overexpression has direct implications for treatment and prognosis. Recognizing this problem, convened experts standardized the interpretation of Her-2, which in turn directly impacted clinical practice and research [
17]. Similarly, there was uncertainty about the reproducibility of Gleason grading in prostate cancer, which is the basis for clinical staging and stratification of patients in clinical trials. Inter-observer studies eventually demonstrated that Gleason scoring was, for most prostate cancers, reproducible enough to use it as a reference standard [
18].
The present study has several significant limitations. First, cases and images were selected based on availability and presumed suitability for this survey by one of the neuropathologists on the study team (PCB) and only a small number of cases (13) was included in this review. It needs to be stated that the selection of cases could certainly have influenced the study’s results and that it is possible that more complex cases were relatively over- or under-represented in this survey. Second, the way the survey was designed, did not truly represent a “real life” clinical scenario and it was somewhat artificial. For feasibility of this survey study, only digitalized images were sent to participants instead of actual glass slides and images selected for this study represented only a small cross-sectional area of the entire specimen. Moreover, immunohistochemical stains and proliferation markers such as Ki67 or the mitosis marker p-HH3 were not included in the survey material. Third, differential understanding of the terminology could have affected inter-observer variability in responses. This is in part as “active tumor”, “inactive tumor” and “treatment effect” are not well established in the literature and as the participants had not been provided with a clear definition as part of this survey. The reason for only providing these three answers was to make survey participants “commit” to a summary diagnosis of overall disease activity or to state that they were unable to classify the case. This was felt to be of importance as differentiation of active versus inactive disease is critically relevant for therapeutic decisions. The survey findings illustrate that such a dichotomization of results was not feasible in all cases that were part of this survey. In contrast to this survey, in current clinical practice, neuropathologists have a variety of ways to formulate a clinical impression of presence, absence and relative abundance of tumor, inactive tumor and treatment effect. Fourth, there may be considerable variability in interpretation of pathology slides between neuropathologists versus general pathologists and regarding variable levels of experience. This has not been captured in our study as we sent the survey only to pathologists that were routinely interpreting glioblastoma specimens at the respective NCI-designated cancer centers, 92% of whom were neuropathologists by training. The issue of different level of specialization and experience between pathologists will need to be carefully considered in future studies addressing this clinical topic. In addition, future studies looking at this clinical question should include higher numbers of cases than in this pilot survey study and cases should be selected in an unbiased way, for example by selection of consecutive cases as they present in clinical practice and ideally between several participating institutions.
We do not believe that these limitations negate the message that histopathology, at least as applied to tumors sampled by the surgeons in this study, may not be consistent enough to be a reliable reference standard. Across multiple institutions, formal criteria need to be developed to assure more uniform and reproducible diagnosis in patients who are undergoing repeat surgery for presumed progression of disease. Given the complexity and importance of this topic, we feel that this challenge be best taken on by an expert committee such as the recently launched subcommittee by the Response Assessment in Neuro-Oncology (RANO) group, in order to assure input from neuropathologists from different institutions and other clinical disciplines involved in the treatment of malignant gliomas. Aspects to consider when developing these criteria will include optimization of tissue sampling (sending the complete tissue to pathology), as well as new or improved strategies for quantification of viable tumor within a given sample, including immunohistochemistry, next generation sequencing and proteomics. Once a new set of criteria are proposed, these should be rigorously tested, initially using a training set and then, for validation, in an adequately powered prospective clinical study.