nach oben

International Journal of Computer Assisted Radiology and Surgery

Erschienen in:

05.05.2021 | Original Article

Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition

verfasst von: Tong Xia, Fucang Jia

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 5/2021

Einloggen, um Zugang zu erhalten

Abstract

Purpose

Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial–temporal discrepancies in surgical videos.

Methods

We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial–temporal features that eliminate irrelevant changes in the environment.

Results

We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches.

Conclusion

The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial–temporal discrepancies to improve the performance of surgical workflow recognition.

Nur mit Berechtigung zugänglich

Cleary K, Kinsella A, Mun SK (2005) Or 2020 workshop report: Operating room of the future. Int Congr Ser 1281:832–838CrossRef

Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82–90CrossRef

Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691–696CrossRef

Schoeffmann K, Taschwer M, Sarny S, Münzer B, Primus MJ, Putzgruber D (2018) Cataract-101: video dataset of 101 cataract surgeries. In: Proceedings of the 9th ACM multimedia systems conference, pp 421–425

Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568CrossRef

Quellec G, Lamard M, Cochener B, Cazuguel G (2014) Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans Med Imaging 33(12):2352–2360CrossRef

Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2019) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imaging 38(4):1069–1078CrossRef

Blum T, Feußner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI. pp. 400-407

Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97CrossRef

10.

Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng PA (2018) SV-RCnet: Workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126CrossRef

11.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

12.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR. pp 770–778

13.

Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRef

14.

Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: ICCV. pp 1450–1457

15.

Chen MH, Li B, Bao Y, AlRegib G, Kira Z (2020) Action segmentation with joint self-supervised temporal domain adaptation. In: CVPR. pp 9454–9463

16.

Charriere K, Quellec G, Lamard M, Martiano D, Cazuguel G, Coatrieux G, Cochener B (2017) Real-time analysis of cataract surgery videos using statistical models. Multimed Tools Appl 76(21):22473–22491CrossRef

17.

Lalys F, Riffaud L, Bouget D, Jannin P (2011) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976CrossRef

18.

van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv reprint. arXiv: 1807.03748

19.

Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: CVPR. pp 815–823

20.

Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. arXiv preprint. arXiv:2002.05709

21.

Zhang H, Wu C, Zhang Z, Zhu Y, Zhang Z, Lin H, Sun Y, He T, Mueller J, Manmatha R, Li M, Smola A (2020) Resnest: Split-attention networks. arXiv preprint. arXiv:2004.08955

22.

Lo BPL, Darzi A, Yang GZ (2003) Episode classification for the analysis of tissue/instrument interaction with multiple visual cues. In: MICCAI. pp 230–237

23.

Deng J, Dong W, Socher R, Li L, Li K, Li F-F (2009) Imagenet: A large-scale hierarchical image database. In: CVPR. pp 248–255

24.

Qi B, Qin X, Liu J, Xu Y, Chen Y (2019) A deep architecture for surgical workflow recognition with edge information. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1358–1364

Titel: Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition
verfasst von: Tong Xia
Fucang Jia
Publikationsdatum: 05.05.2021
Verlag: Springer International Publishing
Erschienen in: International Journal of Computer Assisted Radiology and Surgery / Ausgabe 5/2021
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-021-02382-5

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.

Newsletter bestellen

Springer Medizin

Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition

Abstract

Purpose

Methods

Results

Conclusion

Neu im Fachgebiet Radiologie

Screening-Mammografie offenbart erhöhtes Herz-Kreislauf-Risiko

S3-Leitlinie zu Pankreaskrebs aktualisiert

Fünf Dinge, die im Kindernotfall besser zu unterlassen sind

„Nur wer sich gut aufgehoben fühlt, kann auch für Patientensicherheit sorgen“

Update Radiologie

Springer Medizin

Abstract

Purpose

Methods

Results

Conclusion

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 5/2021

Initial phantom studies for an office-based low-field MR system for prostate biopsy

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Domain adaptation and self-supervised learning for surgical margin detection

Automatic extraction of the mitral valve chordae geometry for biomechanical simulation

Computer-assisted contralateral side comparison of the ankle joint using flat panel technology

Towards markerless surgical tool and hand pose estimation

Neu im Fachgebiet Radiologie

Screening-Mammografie offenbart erhöhtes Herz-Kreislauf-Risiko

S3-Leitlinie zu Pankreaskrebs aktualisiert

Fünf Dinge, die im Kindernotfall besser zu unterlassen sind

„Nur wer sich gut aufgehoben fühlt, kann auch für Patientensicherheit sorgen“

Update Radiologie