Introduction
Coronavirus disease 2019 (COVID-19) caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has claimed over 6.5 million lives in more than 200 nations as at October 2022. The clinical manifestations of severe COVID-19 are dominated by respiratory symptoms including acute respiratory distress syndrome (ARDS) [
1] and pneumonia, while some patients have also developed severe myocardial damage [
2]. Currently, COVID-19 is diagnosed through polymerase chain reaction (PCR) tests and rapid antigen tests to determine the presence of SARS-CoV-2 virus in biological sample [
3]. SARS-CoV-2 gains entry to the human body via angiotensin-converting enzyme 2 (ACE2), a membrane-bound aminopeptidase that is abundantly expressed in the lungs and the heart [
4,
5]. ACE2 plays a central role in the renin–angiotensin–aldosterone system (RAAS) [
6], which has principal effectors that regulate vasoconstriction, oxidative stress, and inflammation [
7,
8]. Recent research has associated the pathophysiology of COVID-19 with altered expression of the ACE2 gene after viral infection. Gheware et al. [
9] observed markedly increased ACE2 protein expression in lung tissue of patients with severe COVID-19. Other studies analysed the involvement of ACE2 in SARS-CoV and extrapolated to COVID-19, given that SARS-CoV and SARS-CoV2 are genetically similar and induce similar symptomatology [
10,
11]. Li et al. [
12] found that SARS-CoV2 affects ACE2 expression during viral entry, which may involve local immune responses and result in lung and cardiovascular injury. Similar findings were reported by Tay et al. [
13], where SARS-CoV2 infection altered ACE2 expression and resulted in the dysfunction of the RAAS system. RAAS dysfunction therefore results in increased inflammation and vascular permeability in the airways, and acute lung damage. Patients with severe COVID-19 may develop the acute respiratory distress syndrome (ARDS) which can be fatal.
Patients with lung adenocarcinomas (LUAD) also display variable expressions of ACE2 across the different cell types within the tumors [
14‐
16].Similar to COVID-19 infections, altered ACE2 expression in LUAD is associated with the inflammatory signalling pathway via the actions of RAAS [
17,
18]. Yang et al. [
14] showed the prognostic value of altered ACE2 expression for LUAD, where ACE2 is associated with tumour immune infiltration and prognosis. In addition, Feng et al. [
19] has identified ACE2 as an inhibitor of cancer development, metastasis, and angiogenesis in adenocarcinoma-dominated non-small cell lung cancer (NSCLC). Therefore, clinical symptoms of altered ACE2 expression, such as inflammation and ARDS, are comparable in LUAD and COVID-19 [
20]. However, gene expression profiling necessitates adequate tissue samples, which are obtained by core biopsies, which capture only a portion of the abnormality, and are invasive and expensive. Thus, gene expression profiling is not routinely done for COVID-19 and, to the best of our knowledge, has not been conducted on large patient cohorts.
Medical imaging, on the other hand, plays a vital role in routine clinical practice for its ability to capture visual representations of the function of organs or tissues (physiology). These visual representations are known as ‘image features’ and they can describe the size and location of abnormalities. Computed tomography (CT) provides an alternate means of detecting COVID-19 by detecting its clinical manifestations in the lung, such as widespread regions of ground glass changes and consolidation [
21]. Advances in computerized medical image analysis have enabled ‘radiomics’, a high-throughput and quantitative technique which extracts imaging visual characteristics that cannot be quantified by visual inspection alone [
22]. In a recent study, Li and Xia [
23] determined the diagnostic value of CT radiomics features for COVID-19. COVID-19 was found to be associated with CT radiomics features such as ground-glass opacities (GGOs), consolidation with vascular enlargement, interlobular and septal thickening.
The diagnostic capabilities of CT enable ‘radiogenomics’, a developing research discipline that aims to identify image features that share statistical associations with molecular characteristics (‘radiogenomics features’). These features can be determined by identifying image features that have statistically significant associations with gene expression [
22,
24,
25]. Previous studies have demonstrated that radiogenomics features can detect a variety of diseases other than COVID-19 and predict prognosis and treatment response. An et al. [
26] reported that radiogenomics features are associated with Mammalian target of rapamycin (mTOR) pathway gene activity in hepatocellular carcinoma (HCC), where the mTOR signalling pathway governs cellular activities and offers opportunities for targeted anti-tumour treatment. Lee et al. [
27] identified a collection of radiogenomics features that are predictive of postsurgical metastases in patients with pathological stage T1 renal cell carcinoma (pT1 RCC). In contrast to conventional imaging features, radiogenomics features have been shown to provide unique insights into intratumor heterogeneity, which can be linked to clinical outcome. Despite the potential of radiogenomics, the association between ACE2 expression and COVID-19 clinical manifestations has not been previously investigated.
In this study, we propose a radiogenomics framework for identifying and selecting radiogenomics features that signify altered ACE2 expressions (‘ACE2-RGF’). This is achieved through the determination of radiogenomics relationships using imaging and ACE2 expression data from LUAD patients. We hypothesize that CT data may be used to derive ACE2-RGF that can serve as surrogate biomarkers for altered ACE2 expression. In addition, it is anticipated that the ACE2-RGF could encode unique insights about pathophysiologic information common to LUAD and COVID-19 and may serve as a biomarker for COVID-19 classification and the identification of critical illness. We investigated our hypotheses on several publicly available CT datasets of lung cancer (LUAD) and COVID-19, and its ability to separate LUAD and COVID-19 from healthy normal patients (hereby denote as ‘normal’), and to identify COVID-19 critical illness from those with mild symptoms.
Experiments
The proposed radiogenomics framework was assessed by conducting two sets of experiments: i) ACE2-RGF classifying LUAD/normal and COVID-19/normal and, ii) ACE2-RGF classifying COVID-19/normal subjects, and in identifying critical illness subjects.
First, we derived ACE2-RGF from the NRG-S and NRG-H datasets according to their correlation to ACE2 gene profiles; these features were then used with MLR to measure their ability to classify LUAD/normal and COVID-19/normal subjects. Radiomics features were also extracted from the NRG-S and the NRG-H datasets. A variety of conventional feature selection techniques were employed to determine the best representative features for the tasks, including analysis of variance (ANOVA), mutual information [
42], recursive feature elimination (RFE) [
43] using a support vector classifier estimator, minimum redundancy maximum relevance (mRMR) [
44], ReliefF [
45], random forest with 100 estimators and Gini impurity, least absolute shrinkage and selection operator (Lasso) [
46], Ridge, and Elastic Net [
47] with an L1 ratio of 0.5. These conventional feature selection techniques were implemented with their default parameters to ensure model generalizability and reproducibility. Our approach aligns with recent radiomics and radiogenomics machine learning research [
48,
49]. The resulting collections of selected image features are denoted as LUAD-RF. For instance, LUAD-RF
ANOVA represents radiomics features extracted from LUAD subjects and was processed using the ANOVA feature selection technique. The performance of ACE2-RGF was compared to LUAD-RF and all extracted radiomics features (‘LUAD-AF’).
Next, the ACE2-RGF was used with MLR to measure its ability to separate COVID-19/normal. For this experiment, radiomics features were extracted from CC-CCII datasets. The same feature selection techniques were applied to the extracted radiomics features and the resulting collection of selected image features were denoted as COVID-19-RF. The performance of ACE2-RGF was compared to COVID-19-RF and all extracted radiomics features (‘COVID-19-AF’).
Lastly, our ACE2-RGF was used with MLR to measure its ability for identifying COVID-19 critical illness. For this experiment, radiomics features were also extracted from CC-CCII datasets. We followed the same feature selection procedure as for the extracted radiomics features and the resulting collection of selected image features were denoted as COVID-Crt-RF. The performance of ACE2-RGF was compared to COVID-Crt-RF and all extracted radiomics features (‘COVID-Crt-AF’).
fivefold cross-validation was performed for all experiments. We randomly sampled 250 patients each of LUAD and normal classes (500 in total), and further randomly divided the sample into training and validation sets with an 80/20 split, resulting in 200 examples for training and 50 for validation from each class. Identical patient splits were used for both methods and no subject existed in both the training and validation sets of a fold. For the test set, all available COVID-19 patients and control subjects not chosen in the cross-validation sample were included. Each training set, despite having different datasets to each other, extracted the same set of ACE2-RGF features. We evaluated our MLR models using performance metrics including accuracy (ACC), area under the ROC curve (AUC), F1 score, F1 score of only the positive (LUAD/COVID-19) class (F1 POS), precision (PREC), recall (RECA), and specificity (SPEC). We define the best model based on the highest average score between F1 and AUC on the validation set of its fold.
Discussion
Our main findings are that our framework can: i) encode ACE2-RGF imaging biomarkers using LUAD data, which are distinct to radiomics features extracted for COVID-19 classification and critical illness identification; ii) the ACE2-RGF can distinguish COVID-19 from normal subjects, and can be combined with COVID-19 RF to improve classification performance; iii) the ACE2-RGF can also effectively identify COVID-19 patients with critical illness and, iv) the ACE2-RGF can be used as a biomarker for various applications, as shown for both COVID-19 classification and critical illness identification.
The ACE2-RGF comprises 12 radiomics features (Table
1) that encodes textural information in CT images. Notably, none of the ACE2-RGF features were among the most frequently selected features when compared with COVID-19-RF (Table
4) and COVID-Crt-RF (Table
7). The ACE2-RGF encoded texture descriptors are a 2D isotropic quantification of the second spatial derivative of an image, and they identify locations with rapid intensity changes within the CT image. Such ACE2-RGF encoded textural information were consistent to the CT findings reported in ARDS and COVID-19 [
50,
51], including ground glass opacity, vascular enlargement and crazy-paving pattern. In contrast, the COVID-19-RF encoded statistical and texture features from decomposed images using 3D wavelet decomposition with LLH filters. In comparison, COVID-Crt-RF encoded a distinct collection of image features that were derived from decomposed images using a variety of low and high-pass filters, including LLL, LLH, HLL, and HLH filters and LoG filtered image with Gaussian sigma values at 1 and 4 mm. Our findings indicate that our radiogenomics framework enabled the derivation of image features associated with ACE2 and encoded unique features regarding disease manifestation related to variations in ACE2 expression. In contrast, conventional machine learning-based approaches quantify and select image features that are optimized for particular tasks, thus may neglect important imaging representations related to the pathophysiology of the disease. This is owing to the possibility for multiple ‘optimal’ feature sets to be selected for a particular task, despite different feature sets may offer distinct information [
52,
53].
When compared to LUAD-AF and LUAD-RF variants, our radiogenomics framework derived ACE2-RGF demonstrated consistent performance for classifying LUAD (Table
2) and COVID-19 (Table
3) patients from normal subjects. MLR models using LUAD-AF and LUAD-RF demonstrated a substantial decline in performance for classifying COVID-19 patients from normal subjects. Our results show that our framework derived ACE2-RGF encoded imaging representations of pathophysiology information that are common to LUAD and COVID-19. Despite the ACE2-RGF having inferior performance when compared with COVID-19-RF for separating COVID-19 patients from normal subjects (Table
5), the use of ACE2-RGF did not require identifying and extracting COVID-19-RF features. Our findings indicate that the ACE2-RGF encoded imaging representations are associated with alterations in ACE2 expression and are relevant to the pathophysiology of both LUAD and COVID-19. However, such information may not provide the optimal classification value that is specific to both LUAD and COVID-19.
Notably, MLR models trained with COVID-19-AF performed similarly to MLR models trained with multiple COVID-19-RF in classifying COVID-19 patients from healthy subjects (Table
2). Our findings suggest that despite radiomics features (COVID-19-AF) may encode distinctive information, these features have demonstrated their capability to classify COVID-19 when used collectively. In contrast, the conventional machine learning frameworks that quantify task-specific image features may neglect radiomics features that encode relevant information for classifying COVID-19, such as statistical and textural features using various LoG filters.
The classification performance for COVID-19 was enhanced when ACE2-RGF was fused with COVID-19-RF (Table
6). In contrast to COVID-19-RF, ACE2-RGF encoded distinct pathophysiological image features linked with COVID-19, and therefore is complementary to COVID-19-RF. Our results suggest that the conventional machine learning frameworks that quantify task-specific image features may neglect the underlying pathophysiology information of COVID-19 and its clinical manifestation due to altered ACE2 expression. For instance, the involvement of the lower respiratory tract in individuals with early-stage or moderate COVID-19 and the possibility of ARDS progression [
54].
Our framework showed it could identify COVID-19 patients with critical illness. The performance of the MLR model trained with ACE2-RGF for identifying COVID-19 critical illness was similarly to that of models trained with COVID-Crt-RF (Table
8). Our findings suggest that the ACE2-RGF may not contain imaging representations exclusive to COVID-19 critical illness status, but rather imaging characteristics associated with ACE2 expression alterations that are tied with the progression of COVID-19 critical illness [
55]. Notably, the performance gap between ACE2-RGF and the best performing COVID-Crt-RF for identifying COVID-19 critical illness was less than the gap between ACE2-RGF and the best performing COVID-19-RF for COVID-19 classification. One explanation of our finding is that patients with COVID-19 critical illness commonly have multiple complications that are related or results of ACE2 and RAAS failure, such as ARDS [
56,
57].
Our framework demonstrated potential to serve as an imaging biomarker for COVID-19 classification and COVID-19 critical illness identification using the same set of ACE2-RGF. We attribute this to the encoding of altered ACE2 expression in ACE2-RGF. Recent research has implicated the role of ACE2 in the infection, development, and clinical manifestations of COVID in the human body [
58]. It is also suggested that ACE2 and its variants affect the binding of SARS-COV2 virus and hence the disease severity following COVID-19 infection [
59]. Therefore, our framework has the potential to serve as a valuable biomarker that complements existing image-based frameworks and offer new research possibilities to derive additional features for future automated COVID-19 classification and critical illness identification.
We used traditional handcrafted image features encompassing shape, first-order statistics, and texture. These features are widely adopted for radiogenomics research due to its wide acceptability, comprehension and for its explainability. Recently, deep learning feature extractors have made significant advancements, notably on extracting a complementary set of deep image features to the handcrafted features. For instance, in a recent study by Xia et al. [
25] on lung cancer radiogenomics, deep learning features were found to generate unique features that differed from the traditional set. However, these deep learning features lacked interpretability and descriptiveness. In our study, our primary focus was to analyze the ability to encode ACE2-RGF from CT images while providing explanatory insights, which the traditional handcrafted feature set adequately fulfilled. In future work, we plan to explore whether deep learning features can complement our study and offer additional insights.
A limitation of our study is the lack of ACE2 expression for the COVID-19 patients. This limits the ability to optimize the ACE2-RGF for COVID-19 classification and critical illness. We anticipate that with ACE2 expression data of COVID-19 patients, our model can be improved by identifying and selecting ACE2-RGF directly on COVID-19 imaging data. In addition, with the increasing availability of data on COVID-19 critical illness and ACE2 expression, our future work will explore and assess the performance and robustness of the proposed radiogenomics framework across multiple independent datasets.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.