Skip to main content
Erschienen in: International Journal of Computer Assisted Radiology and Surgery 11/2019

29.04.2019 | Original Article

Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks

verfasst von: Robert DiPietro, Narges Ahmidi, Anand Malpani, Madeleine Waldram, Gyusung I. Lee, Mija R. Lee, S. Swaroop Vedula, Gregory D. Hager

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 11/2019

Einloggen, um Zugang zu erhalten

Abstract

Purpose

Automatically segmenting and classifying surgical activities is an important prerequisite to providing automated, targeted assessment and feedback during surgical training. Prior work has focused almost exclusively on recognizing gestures, or short, atomic units of activity such as pushing needle through tissue, whereas we also focus on recognizing higher-level maneuvers, such as suture throw. Maneuvers exhibit more complexity and variability than the gestures from which they are composed, however working at this granularity has the benefit of being consistent with existing training curricula.

Methods

Prior work has focused on hidden Markov model and conditional-random-field-based methods, which typically leverage unary terms that are local in time and linear in model parameters. Because maneuvers are governed by long-term, nonlinear dynamics, we argue that the more expressive unary terms offered by recurrent neural networks (RNNs) are better suited for this task. Four RNN architectures are compared for recognizing activities from kinematics: simple RNNs, long short-term memory, gated recurrent units, and mixed history RNNs. We report performance in terms of error rate and edit distance, and we use a functional analysis-of-variance framework to assess hyperparameter sensitivity for each architecture.

Results

We obtain state-of-the-art performance for both maneuver recognition from kinematics (4 maneuvers; error rate of \(8.6 \pm 3.4\%\); normalized edit distance of \(9.3 \pm 4.3\%\)) and gesture recognition from kinematics (10 gestures; error rate of \(15.2 \pm 6.0\%\); normalized edit distance of \(8.4 \pm 6.3\%\)).

Conclusions

Automated maneuver recognition is feasible with RNNs, an exciting result which offers the opportunity to provide targeted assessment and feedback at a higher level of granularity. In addition, we show that multiple hyperparameters are important for achieving good performance, and our hyperparameter analysis serves to aid future work in RNN-based activity recognition.
Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The dataset-level normalization provides immediate connections to the original edit distances, which aids analysis in future work; in contrast, one cannot invert the sequence-level normalizations without access to the number of segments in each predicted sequence. Second, the dataset-level normalization continues to penalize predicted sequences with the same weight as more spurious segments are added; in contrast, the sequence-level normalization penalizes predicted sequences less and less as spurious segments are added.
 
Literatur
1.
Zurück zum Zitat Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64:2025–2041CrossRef Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64:2025–2041CrossRef
2.
3.
Zurück zum Zitat Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef
4.
Zurück zum Zitat Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305 Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
5.
Zurück zum Zitat Birkmeyer JD, Finks JF, O’reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJ (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442CrossRef Birkmeyer JD, Finks JF, O’reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJ (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442CrossRef
6.
Zurück zum Zitat Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: EMNLP Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: EMNLP
7.
Zurück zum Zitat DiPietro R, Hager GD (2018) Unsupervised learning for surgical motion by learning to predict the future. In: International conference on medical image computing and computer-assisted intervention DiPietro R, Hager GD (2018) Unsupervised learning for surgical motion by learning to predict the future. In: International conference on medical image computing and computer-assisted intervention
8.
Zurück zum Zitat DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula SS, Lee GI, Lee MR, Hager GD (2016) Recognizing surgical activities with recurrent neural networks. In: International conference on medical image computing and computer-assisted intervention, pp 551–558CrossRef DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula SS, Lee GI, Lee MR, Hager GD (2016) Recognizing surgical activities with recurrent neural networks. In: International conference on medical image computing and computer-assisted intervention, pp 551–558CrossRef
9.
Zurück zum Zitat DiPietro R, Rupprecht C, Navab N, Hager GD (2017) Analyzing and exploiting NARX recurrent neural networks for long-term dependencies. arXiv preprint arXiv:1702.07805 DiPietro R, Rupprecht C, Navab N, Hager GD (2017) Analyzing and exploiting NARX recurrent neural networks for long-term dependencies. arXiv preprint arXiv:​1702.​07805
10.
Zurück zum Zitat Elman JL (1990) Finding structure in time. Cognit Sci 14(2):179–211CrossRef Elman JL (1990) Finding structure in time. Cognit Sci 14(2):179–211CrossRef
11.
Zurück zum Zitat Ericsson KA (2004) Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med 79(10):S70–S81CrossRef Ericsson KA (2004) Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med 79(10):S70–S81CrossRef
12.
Zurück zum Zitat Gao Y, Vedula S, Lee GI, Lee MR, Khudanpur S, Hager GD (2016) Unsupervised surgical data alignment with application to automatic activity annotation. In: 2016 IEEE international conference on robotics and automation (ICRA) Gao Y, Vedula S, Lee GI, Lee MR, Khudanpur S, Hager GD (2016) Unsupervised surgical data alignment with application to automatic activity annotation. In: 2016 IEEE international conference on robotics and automation (ICRA)
13.
Zurück zum Zitat Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Bejar B, Yuh DD, Chen CCG, Vidal R, Khudanpur S, Hager GD (2014) Language of surgery: a surgical gesture dataset for human motion modeling. In: Modeling and monitoring of computer assisted interventions (M2CAI) 2014. Springer, Boston Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Bejar B, Yuh DD, Chen CCG, Vidal R, Khudanpur S, Hager GD (2014) Language of surgery: a surgical gesture dataset for human motion modeling. In: Modeling and monitoring of computer assisted interventions (M2CAI) 2014. Springer, Boston
14.
Zurück zum Zitat Gearhart SL, Wang MH, Gilson MM, Chen B, Kern DE (2012) Teaching and assessing technical proficiency in surgical subspecialty fellowships. J Surg Educ 69(4):521–528CrossRef Gearhart SL, Wang MH, Gilson MM, Chen B, Kern DE (2012) Teaching and assessing technical proficiency in surgical subspecialty fellowships. J Surg Educ 69(4):521–528CrossRef
15.
Zurück zum Zitat Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Neural networks, IJCNN, vol 3 Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Neural networks, IJCNN, vol 3
16.
Zurück zum Zitat Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471CrossRef Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471CrossRef
17.
Zurück zum Zitat Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. arXiv preprint arXiv:1503.04069 Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. arXiv preprint arXiv:​1503.​04069
18.
Zurück zum Zitat Hammer B (2000) On the approximation capability of recurrent neural networks. Neurocomputing 31(1):107–123CrossRef Hammer B (2000) On the approximation capability of recurrent neural networks. Neurocomputing 31(1):107–123CrossRef
19.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
20.
Zurück zum Zitat Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: International conference on machine learning, pp 754–762 Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: International conference on machine learning, pp 754–762
21.
Zurück zum Zitat Jacobs DM, Poenaru D (eds) (2001) Surgical educators’ handbook. Association for Surgical Education, Los Angeles Jacobs DM, Poenaru D (eds) (2001) Surgical educators’ handbook. Association for Surgical Education, Los Angeles
22.
Zurück zum Zitat Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Technical report, UPenn Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Technical report, UPenn
23.
Zurück zum Zitat Lea C, Hager GD, Vidal R (2015) An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1123–1129 Lea C, Hager GD, Vidal R (2015) An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1123–1129
24.
Zurück zum Zitat Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives for fine-grained action recognition. In: 2016 IEEE international conference on robotics and automation (ICRA) Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives for fine-grained action recognition. In: 2016 IEEE international conference on robotics and automation (ICRA)
25.
Zurück zum Zitat Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives from multimodal time series data. In: Proceedings of the IEEE international conference on robotics and automation—ICRA Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives from multimodal time series data. In: Proceedings of the IEEE international conference on robotics and automation—ICRA
26.
Zurück zum Zitat Lea C, Vidal R, Reiter A, Hager GD (2016) Temporal convolutional networks: a unified approach to action segmentation. In: European conference on computer vision. Springer, pp 47–54 Lea C, Vidal R, Reiter A, Hager GD (2016) Temporal convolutional networks: a unified approach to action segmentation. In: European conference on computer vision. Springer, pp 47–54
27.
Zurück zum Zitat Lin T, Horne BG, Tino P, Giles CL (1996) Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans Neural Netw 7(6):1329–1338CrossRef Lin T, Horne BG, Tino P, Giles CL (1996) Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans Neural Netw 7(6):1329–1338CrossRef
28.
Zurück zum Zitat Liu D, Jiang T (2018) Deep reinforcement learning for surgical gesture segmentation and classification. In: International conference on medical image computing and computer-assisted intervention Liu D, Jiang T (2018) Deep reinforcement learning for surgical gesture segmentation and classification. In: International conference on medical image computing and computer-assisted intervention
29.
Zurück zum Zitat Mavroudi E, Bhaskara D, Sefati S, Ali H, Vidal R (2018) End-to-end fine-grained action segmentation and recognition using conditional random field models and discriminative sparse coding. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1558–1567 Mavroudi E, Bhaskara D, Sefati S, Ali H, Vidal R (2018) End-to-end fine-grained action segmentation and recognition using conditional random field models and discriminative sparse coding. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1558–1567
30.
Zurück zum Zitat Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef
31.
Zurück zum Zitat Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef
32.
Zurück zum Zitat Scott DJ, Cendan JC, Pugh CM, Minter RM, Dunnington GL, Kozar RA (2008) The changing face of surgical education: simulation as the new paradigm. J Surg Res 147(2):189–193CrossRef Scott DJ, Cendan JC, Pugh CM, Minter RM, Dunnington GL, Kozar RA (2008) The changing face of surgical education: simulation as the new paradigm. J Surg Res 147(2):189–193CrossRef
33.
Zurück zum Zitat Sefati S, Cowan NJ, Vidal R (2015) Learning shared, discriminative dictionaries for surgical gesture segmentation and classification. In: Modeling and monitoring of computer assisted interventions (M2CAI) 2015. Springer, Berlin Sefati S, Cowan NJ, Vidal R (2015) Learning shared, discriminative dictionaries for surgical gesture segmentation and classification. In: Modeling and monitoring of computer assisted interventions (M2CAI) 2015. Springer, Berlin
34.
Zurück zum Zitat Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning, vol 2. MIT Press, Cambridge Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning, vol 2. MIT Press, Cambridge
35.
Zurück zum Zitat Tao L, Elhamifar E, Khudanpur S, Hager GD, Vidal R (2012) Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: International conference on information processing in computer-assisted interventions. Springer, pp 167–177 Tao L, Elhamifar E, Khudanpur S, Hager GD, Vidal R (2012) Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: International conference on information processing in computer-assisted interventions. Springer, pp 167–177
36.
Zurück zum Zitat Tao L, Zappella L, Hager GD, Vidal R (2013) Surgical gesture segmentation and recognition. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N (eds) Medical image computing and computer-assisted intervention (MICCAI) 2013, Part III. LNCS, vol 8151. Springer, Berlin, pp 339–346CrossRef Tao L, Zappella L, Hager GD, Vidal R (2013) Surgical gesture segmentation and recognition. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N (eds) Medical image computing and computer-assisted intervention (MICCAI) 2013, Part III. LNCS, vol 8151. Springer, Berlin, pp 339–346CrossRef
37.
Zurück zum Zitat Vedula SS, Ishii M, Hager GD (2017) Objective assessment of surgical technical skill and competency in the operating room. Annu Rev Biomed Eng 19:301–325CrossRef Vedula SS, Ishii M, Hager GD (2017) Objective assessment of surgical technical skill and competency in the operating room. Annu Rev Biomed Eng 19:301–325CrossRef
38.
Zurück zum Zitat Wenghofer E, Klass D, Abrahamowicz M, Dauphinee D, Jacques A, Smee S, Blackmore D, Winslade N, Reidel K, Bartman I, Tamblyn R (2009) Doctor scores on national qualifying examinations predict quality of care in future practice. Med Educ 43(12):1166–1173CrossRef Wenghofer E, Klass D, Abrahamowicz M, Dauphinee D, Jacques A, Smee S, Blackmore D, Winslade N, Reidel K, Bartman I, Tamblyn R (2009) Doctor scores on national qualifying examinations predict quality of care in future practice. Med Educ 43(12):1166–1173CrossRef
39.
Metadaten
Titel
Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks
verfasst von
Robert DiPietro
Narges Ahmidi
Anand Malpani
Madeleine Waldram
Gyusung I. Lee
Mija R. Lee
S. Swaroop Vedula
Gregory D. Hager
Publikationsdatum
29.04.2019
Verlag
Springer International Publishing
Erschienen in
International Journal of Computer Assisted Radiology and Surgery / Ausgabe 11/2019
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-019-01953-x

Weitere Artikel der Ausgabe 11/2019

International Journal of Computer Assisted Radiology and Surgery 11/2019 Zur Ausgabe

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.