Skip to main content

Real-Time Segmentation of Non-rigid Surgical Tools Based on Deep Learning and Tracking

  • Conference paper
  • First Online:
Computer-Assisted and Robotic Endoscopy (CARE 2016)

Abstract

Real-time tool segmentation is an essential component in computer-assisted surgical systems. We propose a novel real-time automatic method based on Fully Convolutional Networks (FCN) and optical flow tracking. Our method exploits the ability of deep neural networks to produce accurate segmentations of highly deformable parts along with the high speed of optical flow. Furthermore, the pre-trained FCN can be fine-tuned on a small amount of medical images without the need to hand-craft features. We validated our method using existing and new benchmark datasets, covering both ex vivo and in vivo real clinical cases where different surgical instruments are employed. Two versions of the method are presented, non-real-time and real-time. The former, using only deep learning, achieves a balanced accuracy of 89.6% on a real clinical dataset, outperforming the (non-real-time) state of the art by 3.8% points. The latter, a combination of deep learning with optical flow tracking, yields an average balanced accuracy of 78.2% across all the validated datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., Jannin, P.: Detecting surgical tools by modelling local appearance and global shape. IEEE Trans. Med. Imaging 34(12), 2603–2617 (2015)

    Article  Google Scholar 

  2. Daga, P., Chadebecq, F., Shakir, D., Garcia-Peraza Herrera, L.C., Tella, M., Dwyer, G., David, A.L., Deprest, J., Stoyanov, D., Vercauteren, T., Ourselin, S.: Real-time mosaicing of fetoscopic videos using SIFT. In: SPIE Medical Imaging (2015)

    Google Scholar 

  3. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33418-4_70

    Chapter  Google Scholar 

  4. Tella, M., Daga, P., Chadebecq, F., Thompson, S., Shakir, D., Dwyer, G., Wimalasundera, R., Deprest, J., Stoyanov, D., Vercauteren, T., Ourselin, S.: A combined EM and visual tracking probabilistic model for robust mosaicking of fetoscopic videos. In: IWBIR (2016)

    Google Scholar 

  5. Devreker, A., Rosa, B., Desjardins, A., Alles, E., Garcia-Peraza, L., Maneas, E., Stoyanov, D., David, A., Vercauteren, T., Deprest, J., Ourselin, S., Reynaerts, D., Vander Poorten, E.: Fluidic actuation for intra-operative in situ imaging. In: IROS, pp. 1415–1421. IEEE (2015)

    Google Scholar 

  6. Reiter, A., Allen, P.K., Zhao, T.: Marker-less articulated surgical tool detection. In: CARS (2012)

    Google Scholar 

  7. Allan, M., Ourselin, S., Thompson, S., Hawkes, D.J., Kelly, J., Stoyanov, D.: Toward detection and localization of instruments in minimally invasive surgery. IEEE Trans. Biomed. Eng. 60(4), 1050–1058 (2013)

    Article  Google Scholar 

  8. Allan, M., Thompson, S., Clarkson, M.J., Ourselin, S., Hawkes, D.J., Kelly, J., Stoyanov, D.: 2D-3D pose tracking of rigid instruments in minimally invasive surgery. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 1–10. Springer, Cham (2014). doi:10.1007/978-3-319-07521-1_1

    Chapter  Google Scholar 

  9. Pezzementi, Z., Voros, S., Hager, G.D.: Articulated object tracking by rendering consistent appearance parts. In: ICRA, pp. 3940–3947. IEEE (2009)

    Google Scholar 

  10. Reiter, A., Goldman, R.E., Bajo, A., Iliopoulos, K., Simaan, N., Allen, P.K.: A learning algorithm for visual pose estimation of continuum robots. In: IROS, pp. 2390–2396. IEEE, September 2011

    Google Scholar 

  11. Voros, S., Orvain, E., Cinquin, P., Long, J.A.: Automatic detection of instruments in laparoscopic images: a first step towards high level command of robotized endoscopic holders. In: The First IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob 2006), pp. 1107–1112. IEEE (2006)

    Google Scholar 

  12. Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic surgical tools. Int. J. Robot. Res. 33(2), 342–356 (2014)

    Article  Google Scholar 

  13. Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, pp. 1–9 (2015)

    Google Scholar 

  15. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL VOC Challenge 2007 Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  16. Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. In: CVPR, pp. 1–10 (2016)

    Google Scholar 

  17. Fetoscope: https://www.karlstorz.com/doc/interactivebrochure/3317862/html5

  18. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)

    Google Scholar 

  19. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440. IEEE (2015)

    Google Scholar 

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  21. Guerra, E., de Lara, J., Malizia, A., Díaz, P.: Supporting user-oriented analysis for multi-view domain-specific visual languages. Inf. Softw. Technol. 51(4), 769–784 (2009)

    Article  Google Scholar 

  22. Caffe Model Zoo. http://github.com/BVLC/caffe/wiki/Model-Zoo

  23. Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)

    Google Scholar 

  24. Smith, L.N.: No more pesky learning rate guessing games. Arxiv, June 2015

    Google Scholar 

  25. Shi, J., Tomasi, C.: Good features to track. In: IEEE Computer Society Conference on CVPR, pp. 593–600 (1994)

    Google Scholar 

  26. Bouguet, J.Y.: Pyramidal implementation of the lucas kanade feature tracker: description of the algorithm. Technical report, Intel Corporation Microprocessor Research Labs (2000)

    Google Scholar 

  27. MICCAI. http://endovissub-instrument.grand-challenge.org

  28. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Wellcome Trust [WT101957], EPSRC (NS/A000027/1, EP/H046410/1, EP/J020990/1, EP/K005278), NIHR BRC UCLH/UCL High Impact Initiative and a UCL EPSRC CDT Scholarship Award (EP/L016478/1). The authors would like to thank NVIDIA for the donated GeForce GTX TITAN X GPU, their colleagues E. Maneas, S. Moriconi, F. Chadebecq, M. Ebner and S. Nousias for the ground truth of FetalFlexTool and E. Maneas for preparing setup with an ex vivo placenta.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis C. García-Peraza-Herrera .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

440896_1_En_8_MOESM1_ESM.mp4

Supplementary material 1 (mp4 1369 KB)

440896_1_En_8_MOESM2_ESM.mp4

Supplementary material 2 (mp4 1221 KB)

440896_1_En_8_MOESM3_ESM.mp4

Supplementary material 3 (mp4 1118 KB)

Supplementary material 4 (mp4 3555 KB)

Supplementary material 1 (mp4 1369 KB)

Supplementary material 2 (mp4 1221 KB)

Supplementary material 3 (mp4 1118 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

García-Peraza-Herrera, L.C. et al. (2017). Real-Time Segmentation of Non-rigid Surgical Tools Based on Deep Learning and Tracking. In: Peters, T., et al. Computer-Assisted and Robotic Endoscopy. CARE 2016. Lecture Notes in Computer Science(), vol 10170. Springer, Cham. https://doi.org/10.1007/978-3-319-54057-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54057-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54056-6

  • Online ISBN: 978-3-319-54057-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics