Multi-input Cardiac Image Super-Resolution Using Convolutional Neural Networks

Oktay, Ozan; Bai, Wenjia; Lee, Matthew; Guerrero, Ricardo; Kamnitsas, Konstantinos; Caballero, Jose; de Marvao, Antonio; Cook, Stuart; O’Regan, Declan; Rueckert, Daniel

doi:10.1007/978-3-319-46726-9_29

Multi-input Cardiac Image Super-Resolution Using Convolutional Neural Networks

Ozan Oktay²⁴,
Wenjia Bai²⁴,
Matthew Lee²⁴,
Ricardo Guerrero²⁴,
Konstantinos Kamnitsas²⁴,
Jose Caballero²⁶,
Antonio de Marvao²⁵,
Stuart Cook²⁵,
Declan O’Regan²⁵ &
…
Daniel Rueckert²⁴

Conference paper
First Online: 02 October 2016

9977 Accesses
85 Citations
9 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9902))

Abstract

3D cardiac MR imaging enables accurate analysis of cardiac morphology and physiology. However, due to the requirements for long acquisition and breath-hold, the clinical routine is still dominated by multi-slice 2D imaging, which hamper the visualization of anatomy and quantitative measurements as relatively thick slices are acquired. As a solution, we propose a novel image super-resolution (SR) approach that is based on a residual convolutional neural network (CNN) model. It reconstructs high resolution 3D volumes from 2D image stacks for more accurate image analysis. The proposed model allows the use of multiple input data acquired from different viewing planes for improved performance. Experimental results on 1233 cardiac short and long-axis MR image stacks show that the CNN model outperforms state-of-the-art SR methods in terms of image quality while being computationally efficient. Also, we show that image segmentation and motion tracking benefits more from SR-CNN when it is used as an initial upscaling method than conventional interpolation methods for the subsequent analysis.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

3D magnetic resonance (MR) imaging with near isotropic resolution provides a good visualization of cardiac morphology, and enables accurate assessment of cardiovascular physiology. However, 3D MR sequences usually require long breath-hold and repetition times, which leads to scan times that are infeasible in clinical routine, and 2D multi-slice imaging is used instead. Due to limitations on signal-to-noise ratio (SNR), the acquired slices are usually thick compared to the in-plane resolution and thus negatively affect the visualization of anatomy and hamper further analysis. Attempts to improve image resolution are typically carried out either during the acquisition stage (sparse k-space filling) or retrospectively through super resolution (SR) of single/multiple image acquisitions.

Related work: Most of the SR methods recover the missing information through the examples observed in training images, which are used as a prior to link low and high resolution (LR-HR) image patches. Single image SR methods, based on the way they utilize training data, fall into two categories: non-parametric and parametric. The former aims to recover HR patches from LR ones via a co-occurrence prior between the target image and external training data. Atlas-based approaches such as the patch-match method [15] and non-local means based single image SR [10] methods are two examples of this category. These approaches are computationally demanding as the candidate patches have to be searched in the training dataset to find the most suitable HR candidate. Instead, compact and generative models can be learned from the training data to define the mapping between LR and HR patches. Parametric generative models, such as coupled-dictionary learning based approaches, have been proposed to upscale MR brain [14] and cardiac [3] images. These methods benefit from sparsity constraint to express the link between LR and HR. Similarly, random forest based non-linear regressors have been proposed to predict HR patches from LR data and have been successfully applied on diffusion tensor images [1]. Recently, convolutional neural network (CNN) models [5, 6] have been put forward to replace the inference step as they have enough capacity to perform complex nonlinear regression tasks. Even by using a shallow network composed of a few layers, these models [6] achieved superior results over other state-of-the-art SR methods.

Contributions: In the work presented here, we extend the SR-CNN proposed by [5, 6] with an improved layer design and training objective function, and show its application to cardiac MR images. In particular, the proposed approach simplifies the LR-HR mapping problem through residual learning and allows training a deeper network to achieve improved performance. Additionally, the new model can be considered more data-adaptive since the initial upscaling is performed by learning a deconvolution layer instead of a fixed kernel [6]. More importantly, a multi-input image extension of the SR-CNN model is proposed and exploited to achieve a better SR image quality. By making use of multiple images acquired from different slice directions one can further improve and constrain the HR image reconstruction. Similar multi-image SR approaches have been proposed in [11, 12] to synthesize HR cardiac images; however, these approaches did not make use of available large training datasets to learn the appearance of anatomical structures in HR. Compared to the state-of-the-art image SR approaches [6, 15], the proposed method shows improved performance in terms of peak signal-to-noise-ratio (PSNR) and structural similarity index measure (SSIM) [18]. We show that cardiac image segmentation can benefit from SR-CNN as the segmentations generated from super-resolved images are shown to be similar to the manual segmentations on HR images in terms of volume measures and surface distances. Lastly, we show that cardiac motion tracking results can be improved using SR-CNN as it visualizes the basal and apical parts of the myocardium more clearly compared to the conventional interpolation methods (see Fig. 1).

2 Methodology

The SR image generation is formulated as an inverse problem that recovers the high dimensional data through the MR image acquisition model [7], which has been the starting point of approaches in [3, 11, 15]. The model links the HR volume \(\varvec{y} \in \mathbb {R}^M\) to the low dimensional observation \(\varvec{x} \in \mathbb {R}^N\) (\(N \ll M\)) through the application of a series of operators as: \(\varvec{x} = \varvec{D} \varvec{B} \varvec{S} \varvec{M} \varvec{y} + \varvec{\eta }\;\) where \(\varvec{M}\) defines the spatial displacements caused due to respiratory and cardiac motion, \(\varvec{S}\) is the slice selection operator, \(\varvec{B}\) is a point-spread function (PSF) used to blur the selected slice, \(\varvec{D}\) is a decimation operator, and \(\varvec{\eta }\) is the Rician noise model. The solution to this inverse problem estimates a conditional distribution \(p(\varvec{y} | \varvec{x})\) that minimizes the cost function \(\varPsi \) defined by \(\varvec{y}\) and its estimate \(\varPhi (\varvec{x}, \varvec{\varTheta })\) obtained from LR input data. The estimate is obtained through a CNN parameterized by \(\varvec{\varTheta }\) that models the distribution \(p(\varvec{y} | \varvec{x})\) via a collection of hidden variables. For the smooth \(\ell _{1}\) norm case, the loss function is defined as \(\min \limits _{\varvec{\varTheta }} \, \sum _{i} \, \varPsi _{\ell _1} \left( \varPhi (\varvec{x}_i, \varvec{\varTheta }) - \varvec{y}_i \right) \), where \(\varPsi _{\ell _1} (r) = \{ 0.5\, r^2 \; \text {if} \; |r|<1 \, , \, |r|-0.5 \; \text {otherwise} \}\) and (\(\varvec{x}_i,\varvec{y}_i\)) denote the training samples. The next section describes the proposed CNN model.

Single Image Network: The proposed model, shown in Fig. 2, is formed by concatenating a series of convolutional layers (Conv) and rectified linear units (ReLU) [8] to estimate the non-linear mapping \(\varPhi \), as proposed in [6] to upscale natural images. The intermediate feature maps \(h_{j}^{(n)}\) at layer n are computed through Conv kernels (hidden units) \(w_{kj}^{n}\) as \(\max \left( 0,\, \sum _{k=1}^K h_{k}^{(n-1)} *w_{kj}^n \right) = h_{j}^n\) where \(*\) is the convolution operator. As suggested by [16], in order to obtain better non-linear estimations, the proposed architecture uses small Conv kernels (3 \(\times \) 3 \(\times \) 3) and a large number of Conv+ReLU layers. Such approach allows training of a deeper network. Different to the models proposed in [5, 6], we include an initial upscaling operation within the model as a deconvolution layer (Deconv) \((\varvec{x} \uparrow U) *w_j = h_j^{0}\) where \(\uparrow \) is a zero-padding upscaling operator and \(U = M/N\) is the upscaling factor. In this way, upsampling filters can be optimized for SR applications by training the network in an end-to-end manner. This improves the image signal quality in image regions closer to the boundaries. Instead of learning to synthesize a HR image, the CNN model is trained to predict the residuals between the LR input data and HR ground-truth information. These residuals are later summed up with the linearly upscaled input image (output of Deconv layer) to reconstruct the output HR image. In this way, a simplified regression function \(\varPhi \) is learned where mostly high frequency signal components, such as edges and texture, are predicted (see Fig. 2). At training time, the correctness of reconstructed HR images is evaluated based on the \(\varPsi _{\ell _1} (.)\) function, and the model weights are updated by back-propagating the error defined by that function. In [19] the \(\ell _1\) norm was shown to be a better metric than the \(\ell _2\) norm for image restoration and SR problems. This is attributed to the fact that the weight updates are not dominated by the large prediction errors.

Multi-image Network: The single image model is extended to multi-input image SR by creating multiple input channels (MC) from given images which are resampled to the same spatial grid and visualize the same anatomy. In this way, the SR performance is enhanced by merging multiple image stacks, e.g. long-axis (LAX) and short axis (SAX) stacks, acquired from different imaging planes into a single SR volume. However, when only a few slices are acquired, a mask or distance map is required as input to the network to identify the missing information. Additionally, the number of parameters is supposed to be increased so that the model can learn to extract in image regions where the masks are defined, which increases the training time accordingly. For this reason, a Siamese network [4] is proposed as a third model (see Fig. 3) for comparison purposes, which was used in similar problems such as shape recognition from multiple images [17]. The first stage of the network resamples the input images into a fixed HR spatial grid. In the second stage the same type of image features are extracted from each channel which are sharing the same filter weights. In the final stage, the features are pooled and passed to another Conv network to reconstruct the output HR image. The view pooling layer averages the corresponding features from all channels over the areas where the images are overlapping. The proposed models are initially pre-trained with small number of layers to better initialize the final deeper network training, which improves the network performance [5].

Table 1. Quantitative comparison of different image upsampling methods.

Full size table

3 Results

The models are evaluated on end-diastolic frames of cine cardiac MR images acquired from 1233 healthy adult subjects. The images are upscaled in the direction orthogonal to the SAX plane. The proposed method is compared against linear, cubic spline, and multi-atlas patchmatch (MAPM) [15] upscaling methods in four different experiments: image quality assessment for (a–b) single and multi-input cases, (c) left-ventricle (LV) segmentation, (d) LV motion tracking.

Experimental details: In the first experiment, an image dataset containing 1080 3D SAX cardiac volumes with voxel size 1.25 \(\times \) 1.25 \(\times \) 2.00 mm, is randomly split into two subsets and used for single-image model training (930) and testing (150). The images are intensity normalized and cropped around the heart. Synthetic LR images are generated using the acquisition model given in Sect. 2, which are resampled to a fixed resolution 1.25 \(\times \) 1.25 \(\times \) 10.00 mm. The PSF is set to be a Gaussian kernel with a full-width at half-maximum equal to the slice thickness [7]. For the LR/HR pairs, multiple acquisitions could be used as well, but an unbalanced bias would be introduced near sharp edges due to spatial misalignments. For the evaluation of multi-input models, a separate clinical dataset of 153 image pairs of LAX cardiac image slices and SAX image stacks are used, of which 10 pairs are split for evaluation. Spatial misalignment between SAX and LAX images are corrected using image registration [9]. For the single/multi image model, seven consecutive Conv layers are used after the upscaling layer. In the Siamese model, the channels are merged after the fourth Conv layer.

Image Quality Assessment: The upscaled images are compared with the ground-truth HR 3D volumes in terms of PSNR and SSIM [18]. The latter measure assesses the correlation of local structures and is less sensitive to image noise. The results in Table 1 show that learning the initial upscaling kernels (de-CNN) can improve (\(p=.007\)) the quality of generated HR image compared to convolution only network (CNN) using the same number of trainable parameters. Additionally, the performance of 7-layer network is compared against the 4-layer shallow network from [6] (sh-CNN). Addition of extra Conv layers to the 7-layer model is found to be ineffective due to increased training time and negligible performance improvement. In Fig. 4, we see that CNN based methods can learn better HR synthesis models even after a small number of training epochs. On the same figure, it can be seen that the model without the residual learning (nrCNN) underperforms and requires a large number of training iterations.

Multi-input Model: In the second experiment, we show that the single image SR model can be enhanced by providing additional information from two and four chamber (2/4CH) LAX images. The results given in Table 2 show that by including LAX information in the model, a modest improvement in image visual quality can be achieved. The improvement is mostly observed in image regions closer to areas where the SAX-LAX slices overlap, as can be seen in Fig. 5 (a–d). Also, the results show that the multi-channel (MC) model performs slightly better than Siamese model as it is given more degrees-of-freedom, whereas the latter is more practical as it trains faster and requires fewer trainable parameters.

Table 2. Image quality results obtained with three different models: single-image de-CNN, Siamese, and multi-channel (MC) that uses multiple input images.

Full size table

Table 3. Segmentation results for different upsampling methods, CSpline (\(p=.007\)) and MAPM (\(p=.009\)). They are compared in terms of mean and Hausdorff distances (MYO) and LV cavity volume differences (w.r.t. manual annotations).

Full size table

Segmentation Evaluation: As a subsequent image analysis, 18 SAX SR images are segmented using a state-of-the-art multi-atlas method [2]. The SR images generated from clinical 2D stack data with different upscaling methods are automatically segmented and those segmentations are compared with the manual annotations performed on ground-truth HR 3D images. Additionally, the HR images are segmented with the same method to show the lower error bound. The quality of segmentations are evaluated based on the LV cavity volume measure and surface-to-surface distances for myocardium (MYO). The results in Table 3 show that CNN upscaled images can produce segmentation results similar to the ones obtained from HR images. The main result difference between the SR methods is observed in image areas where thin and detailed boundaries are observed (e.g. apex). As can be seen in Fig. 5 (e–h), the MAPM over-smooths areas closer to image boundaries. Inference of the proposed model is not as computationally demanding as brute-force searching (MAPM), which requires hours for a single image, whereas SR-CNN can be executed in 6.8 s on GPU or 5.8 min CPU on average per image. The shorter runtime makes the SR methods more applicable to subsequent analysis, as they can replace the standard interpolation methods.

Motion Tracking: The clinical applications of SR can be extended to MYO tracking as it can benefit from SR as a preprocessing stage to better highlight the ventricle boundaries. End-diastolic MYO segmentations are propagated to end-systolic (ES) phase using B-Spline FFD registrations [13]. ES meshes generated with CNN and linear upscaling methods are compared with tracking results obtained with 10 3D-SAX HR images based on Hausdorff distance. The proposed SR method produces tracking results (\(4.73{\pm }1.03\) mm) more accurate (\(p=0.01\)) than the linear interpolation (\(5.50{\pm }1.08\) mm). We observe that the images upscaled with the CNN model follow the apical boundaries more accurately, which is shown in the supplementary material: www.doc.ic.ac.uk/~oo2113/M16

4 Discussion and Conclusion

The results show that the proposed SR approach outperforms conventional upscaling methods both in terms of image quality metrics and subsequent image analysis accuracy. Also, it is computationally efficient and can be applied to image analysis tasks such as segmentation and tracking. The experiments show that these applications can benefit from SR images since 2D stack image analysis with SR-CNN can achieve similar quantitative results as the analysis on isotropic volumes without requiring long acquisition time. We also show that the proposed model can be easily extended to multiple image input scenarios to obtain better SR results. SR-CNN’s applicability is not only limited to cardiac images but to other anatomical structures as well. In the proposed approach, inter-slice and stack spatial misalignments due to motion are handled using a registration method. However, we observe that large slice misplacements can degrade SR accuracy. Future research will focus on that aspect of the problem.

References

Alexander, D.C., Zikic, D., Zhang, J., Zhang, H., Criminisi, A.: Image quality transfer via random forest regression: applications in diffusion MRI. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part III. LNCS, vol. 8675, pp. 225–232. Springer, Heidelberg (2014)
Chapter Google Scholar
Bai, W., Shi, W., O’Regan, D.P., Tong, T., Wang, H., Jamil-Copley, S., Peters, N.S., Rueckert, D.: A probabilistic patch-based label fusion model for multi-atlas segmentation with registration refinement: application to cardiac MR images. IEEE TMI 32(7), 1302–1315 (2013)
Google Scholar
Bhatia, K.K., Price, A.N., Shi, W., Rueckert, D.: Super-resolution reconstruction of cardiac MRI using coupled dictionary learning. In: IEEE ISBI, pp. 947–950 (2014)
Google Scholar
Bromley, J., Guyon, I., Lecun, Y., Sckinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. In: NIPS, pp. 737–744 (1994)
Google Scholar
Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: IEEE CVPR, pp. 576–584 (2015)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE PAMI 38(2), 295–307 (2016)
Article Google Scholar
Greenspan, H.: Super-resolution in medical imaging. Comput. J. 52(1), 43–63 (2009)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lötjönen, J., Pollari, M., Kivistö, S., Lauerma, K.: Correction of movement artifacts from 4-D cardiac short- and long-axis MR data. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3217, pp. 405–412. Springer, Heidelberg (2004)
Chapter Google Scholar
Manjón, J.V., Coupé, P., Buades, A., Fonov, V., Collins, D.L., Robles, M.: Non-local MRI upsampling. MedIA 14(6), 784–792 (2010)
Google Scholar
Odille, F., Bustin, A., Chen, B., Vuissoz, P.-A., Felblinger, J.: Motion-corrected, super-resolution reconstruction for high-resolution 3D cardiac cine MRI. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 435–442. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24574-4_52
Chapter Google Scholar
Plenge, E., Poot, D.H.J., Niessen, W.J., Meijering, E.: Super-resolution reconstruction using cross-scale self-similarity in multi-slice MRI. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part III. LNCS, vol. 8151, pp. 123–130. Springer, Heidelberg (2013)
Chapter Google Scholar
Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE TMI 18(8), 712–721 (1999)
Google Scholar
Rueda, A., Malpica, N., Romero, E.: Single-image super-resolution of brain MR images using overcomplete dictionaries. MedIA 17(1), 113–132 (2013)
Google Scholar
Shi, W., Caballero, J., Ledig, C., Zhuang, X., Bai, W., Bhatia, K., de Marvao, A.M.S.M., Dawes, T., O’Regan, D., Rueckert, D.: Cardiac image super-resolution with global correspondence using multi-atlas patchmatch. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part III. LNCS, vol. 8151, pp. 9–16. Springer, Heidelberg (2013)
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: IEEE CVPR, pp. 945–953 (2015)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
Google Scholar
Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Is L2 a good loss function for neural networks for image processing? arXiv preprint arXiv:1511.08861 (2015)

Download references

Author information

Authors and Affiliations

Biomedical Image Analysis Group, Imperial College London, London, UK
Ozan Oktay, Wenjia Bai, Matthew Lee, Ricardo Guerrero, Konstantinos Kamnitsas & Daniel Rueckert
Institute of Clinical Science, Imperial College London, London, UK
Antonio de Marvao, Stuart Cook & Declan O’Regan
Magic Pony Technology, London, UK
Jose Caballero

Authors

Ozan Oktay
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Bai
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Guerrero
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Kamnitsas
View author publications
You can also search for this author in PubMed Google Scholar
Jose Caballero
View author publications
You can also search for this author in PubMed Google Scholar
Antonio de Marvao
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Cook
View author publications
You can also search for this author in PubMed Google Scholar
Declan O’Regan
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rueckert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ozan Oktay .

Editor information

Editors and Affiliations

University College London, London, UK
Sebastien Ourselin
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz
Harvard Medical School, Boston, MA, USA
Mert R. Sabuncu
Istanbul Technical University, Istanbul, Türkiye
Gozde Unal
Harvard Medical School, Boston, MA, USA
William Wells

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oktay, O. et al. (2016). Multi-input Cardiac Image Super-Resolution Using Convolutional Neural Networks. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9902. Springer, Cham. https://doi.org/10.1007/978-3-319-46726-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-46726-9_29
Published: 02 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46725-2
Online ISBN: 978-3-319-46726-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Abstract

1 Introduction

2 Methodology

3 Results

4 Discussion and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation