Deep learning for video-based assessment of surgical skills

Thumbnail Image
Yanik, Erim
Issue Date
Electronic thesis
Mechanical engineering
Research Projects
Organizational Units
Journal Issue
Alternative Title
Surgical skill assessment is crucial for training and certification. The current gold standard is based on real-time proctoring in the operating room (OR) – developed by Halsted over a century ago. To assess skills, proctors use objective rating scales such as the Objective Structured Assessment of Surgical Skills (OSATS) or time and error metrics of the Fundamentals of Laparoscopic Surgery (FLS) program. The current assessment techniques suffer from drawbacks, including being subjective with poor inter-rater reliability, distribution errors, recall bias, and halo effects. Video-based assessment (VBA) can ensure patient safety by allowing surgeons to provide post-hoc feedback. Still, VBA is not real-time, and post-hoc evaluation may lead to burnout. Also, it is prone to subjective interpretation. Deep Learning (DL) can circumvent these limitations. However, current models use video snippets, ignoring long-term representations. This prevents them from providing meaningful feedback. Besides, models predominantly use motor actions, e.g., hand motion, for assessment which is not holistic. Finally, they are data-hungry and restricted to their training domain. This prevents them from adapting to new tasks without retraining via data-intensive sets. This thesis addresses these limitations to achieve feasible real-life implementation of VBA. Firstly, we propose a deep learning (DL) model – Video-Based Assessment Network (VBA-Net) – that can automatically and objectively assess surgical skills in real-time from complete videos while generating statistically-verifiable feedback via Class Activation Maps (CAMs). We also benchmarked the VBA-Net and achieved state-of-the-art performance in the commonly-used public dataset, JIGSAWS. Secondly, we fuse neural activations from the prefrontal cortex – shown to differentiate skill levels – with motor action data for holistic skill assessment. We also compare motor actions with neural activations. For comparison, we use a t-test on the distributions of performance metrics, e.g., accuracy and R2, after 100 repetitions. The VBA-Net generates slightly better assessment performance via neural activations (p<.05) than motor actions, and multimodality leads to the best performance (p<.05). This shows the saliency of neural activations and the advantage of holistic inputs in skill assessment. Lastly, we develop an Adaptive VBA-Net (A-VBANet) to deliver domain-agnostic skill classification via one-shot learning without retraining. A-VBANet successfully adapts to independent physical simulators with a single video-based sample. Further, using the simulator data, A-VBANet successfully adapts to the OR task – laparoscopic cholecystectomy – only with one sample. The technology developed in this thesis has the potential to significantly impact patient care by providing automated assessment tools for surgeons based on videos. Large-scale adoption will require the development of surgical video repositories. However, the development of meta-learning approaches will mitigate the need for intensive annotation and enable the assessment of intra-operative videos based on simulator data, which is abundant and easier to obtain.
December 2022
School of Engineering
Full Citation
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
PubMed ID