Introduction
MRI provides the highest sensitivity for the detection of breast cancer [
1‐
5] and it plays a central role in the screening of patients with a hereditary or familial high-risk for developing breast cancer [
6]. To achieve a significant risk reduction, either prophylactic bilateral mastectomy or annual screening is provided in the high-risk population [
7,
8]. Moreover, women at an increased risk for the development of breast cancer are usually prone to develop breast cancer at a much younger age [
7] and are consequently screened from a younger age and for a longer period of time. Although these patients usually undergo multimodality screening, it has been shown that MRI is the best modality with which to detect familial breast cancer, regardless of patient age, breast density, or risk status [
9,
10]. An important proportion of these lesions are MRI-only lesions [
9] and it has been shown that MRI particularly detects the small (less than 10 mm in diameter) and more aggressive types of breast cancer [
11]. However, it has been postulated that the imaging characteristics of cancer that develops in women at very high-risk are less specific and may resemble benign lesions (fibroadenoma-like masses and benign kinetic features) [
12,
13]. Consequently, on the basis of these results, it has been recommended that, in high-risk women, small enhancing lesions should be regarded with suspicion and biopsied, or patients should be followed up at 6 months [
13]. The BI-RADS lexicon can be used to describe enhancing breast lesions in a standardized and commonly understandable way.
While the BI-RADS lexicon provides a common language for lesion description in a standardized and structured approach [
14,
15], it does not provide guidance on how lesions that present with certain features should be managed. The Kaiser score is able to fill this gap [
16,
17]; it is a clinical decision rule that combines BI-RADS features in a simple machine-learning derived flowchart. Following the flowchart results in a diagnostic score that reflects the increasing probabilities of malignancy, ranging from 1 to 11, with scores greater than 4 requiring biopsy. As the Kaiser score combines several criteria to achieve a diagnosis, we hypothesized that the cancers detected in high-risk women could objectively be diagnosed as such using the Kaiser score, even though they might present with a circumscribed appearance that was referred to as “fibroadenoma-like” in prior works.
Consequently, we assessed the ability of the Kaiser score to diagnose malignancy in a consecutive population of histologically proven suspicious (MR BI-RADS 4), contrast-enhancing lesions diagnosed in a high-risk breast cancer patient screening program.
Discussion
This study investigated the benefit of implementing the Kaiser score as a decision tool in MRI suspicious (BI-RADS 4) contrast-enhancing lesions diagnosed in patients at high-risk for developing breast cancer. This is clinically highly relevant as it refutes the notion of benign-appearing cancers in the investigated setting. Furthermore, we could show that the Kaiser score is applicable in high-risk patients independent of lesion appearance as mass, non-mass, or foci. The diagnostic performance equaled that of the Kaiser score applied in other scenarios [
26‐
28]. The thresholds established in other indications could be reproduced, allowing exclusion of cancer with high certainty. Potentially, 45 to 72% of all unnecessary biopsies could have been avoided by applying the Kaiser score prior to biopsy.
The Kaiser score uses a small set of morphological and dynamically relevant features that were chosen by machine-learning methodology (presence of spiculations/root sign, enhancement kinetics, lesion margin, internal enhancement pattern, and ipsilateral edema). The result is a three-step flowchart with the score providing the probabilities of malignancy, ranging from 1 to 11. Thus, enhancing lesion assessment can be simplified and structured and the results can be used for evidence-based decision-making. Scores below 5 should be considered benign, while histological workup is mandatory for higher scores [
16]. This was initially tested in an exploratory study on biopsy-proven lesions in a mixed study population [
17] and thereafter validated in consecutive problem-solving cases [
26], suspicious MRI-only lesions [
27], and in lesions that presented as suspicious mammographic microcalcifications [
28]. The application of the Kaiser score relies on generally recommended standard breast MRI protocols (T2-weighted sequences and dynamic, contrast-enhanced, T1-weighted sequences), and it was shown to be independent of the type of scanners/vendors used [
27] and helpful for less experienced radiologists [
26]. It does not require any additional functional imaging, such as DWI or MR spectroscopy, or postprocessing software [
17]. Yet, it allows the integration of further diagnostic data, either clinical (such as bloody discharge), conventional findings (e.g., suspicious mammographic calcifications), or quantitative information (e.g., DWI), as discussed in [
16].
We found that the Kaiser score is highly accurate in the setting of high-risk patients. All readers achieved a high sensitivity, with the only false-negative results in non-mass lesions and foci. This could be explained by the difficulty of determining the margin type or discerning the enhancement pattern in lesions smaller than 5 mm, especially on old examinations of a lower quality. Notably, although statistically not significant due to a low sample size, all false-negative ratings were obtained in examinations older than 10 years, stressing the importance of high image quality for interpretation of these lesions. The already established cutoff value for a biopsy recommendation in Kaiser scores exceeding 4 [
16,
26,
27] was applicable in our study cohort. Thus, even if initially categorized as BI-RADS 4 lesions, scores of 4 or lower were robustly indicative of a benign outcome. Diagnostic tests are not perfect. If low Kaiser scores are applied to avoid unnecessary biopsies, this comes at the cost of false-negative findings: missed cancers. In healthcare, the application of a decision-making tool such as the Kaiser score is always an ethical issue: how many avoided unnecessary biopsies are worth one missed cancer? None of the false-negative lesions presented as masses on MRI. We think it is safe to conclude that the Kaiser score can without a doubt be safely applied to downgrade mass lesions but caution should be used when interpreting non-mass lesions and foci. The number of false-negative findings in this study was low: lesions were either luminal A type invasive cancer or DCIS. It can therefore be relatively safely assumed that downgrading a lesion would not have changed the patients’ prognosis but rather led to a delayed diagnosis in a biologically less significant malignancy. Patients in this setting undergo annual screening, equaling the maximum diagnostic delay. Whether such downgraded lesions should be primarily assigned BI-RADS 3 and undergo an additional follow-up at 6 months is discussed elsewhere [
29].
The results once more corroborate the usefulness of a structured and evidence-based diagnostic approach. In high-risk MRI screening, the low prevalence of malignancy is connected to an inherent risk of false-negative findings [
30]. Radiologists seemingly compensate for this by using a rather low biopsy threshold. Although the 5th BI-RADS lexicon edition [
15] can be used for standardized lesion description [
14], the results of our paper point out the limitations of empirical BI-RADS 4 category assignments that do not follow objective rules in high-risk patients.
Previous studies have shown that the imaging phenotypes of malignancy differed in women at high risk, with a high percentage of invasive cancers appearing as fibroadenoma-like masses, but without fibroadenoma-like internal enhancement or enhancement kinetics [
12,
31]. However, our results demonstrate that there are no cancers with exclusively benign criteria. The structured combination of morphological and functional criteria provided by the Kaiser score avoids misinterpretations of a single diagnostic criterion such as circumscribed margins.
The combination of diagnostic criteria is available due to the multiparametric character of breast MRI. Recently, alternative, abbreviated protocols have been proposed for screening women with dense breast tissue [
6,
32]. The aim is to reduce the scan time by acquiring only one pre-contrast and one early post-contrast T1-weighted image set. Consequently, the reader can obtain a quick overview of presence or absence of enhancement on a single, high-contrast, maximum intensity projection (MIP) image, followed by subsequent characterization of enhancement with respect to configuration, morphology, margins, and internal architecture based on an analysis of the individual subtracted images [
32]. Nonetheless, the shape of the enhancement curve was shown to be relevant for estimating the probability of malignancy, increasing from a type I (persistent) to a type III (wash-out) curve. In the framework of the machine learning–derived Kaiser score, the enhancement curve type is the second most important diagnostic criterion. Thus, in the setting of a high-risk patient, with no information about the enhancement kinetics, a circumscribed lesion with enhancement must always be considered suspicious. Our study, therefore, provides indirect evidence against abbreviated, non-dynamic protocols for high-risk screening: due to the lack of diagnostic information provided by the enhancement kinetics, unnecessary biopsies will be performed. While the alternative approach of ultrafast early perfusion imaging may potentially compensate for that, its applicability for avoiding unnecessary biopsies in a combined diagnostic model has not yet been proven.
The main limitation of this study was that the MRI scans analyzed were acquired with old protocols and on different MRI equipment, with different field strengths and sequence parameters. This was not avoidable, as patients were recruited consecutively from a longitudinal, prospective, high-risk screening study. On the other hand, this limitation can also be seen as a strength, as it corroborates the general applicability of the Kaiser score, which is based on regular BI-RADS features intended to be used independent from MRI protocols and scanning equipment. Nevertheless, the heterogeneous image quality may be the reason only a fair-to-moderate inter-reader agreement could be achieved, in contradiction to previously reported data [
26,
27]. Another reason for this might be the fact that readers were not trained before the study as it was done in a previous study, further contributing to inter-reader variation [
14].
In conclusion, this study provides evidence that the Kaiser score may be used in high-risk patients recalled from screening due to the detection of BI-RADS 4 lesions to avoid unnecessary biopsies, in particular those lesions presenting as masses. This has a positive potential to impact healthcare costs, as well as patient concern.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.