Deep learning for video-based assessment of surgical skills

Yanik, Erim
Thumbnail Image
Other Contributors
Anderson, Kurt
Zhang, Lucy
De, Suvranu
Intes, Xavier
Issue Date
Mechanical engineering
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
Surgical skill assessment is crucial for training and certification. The current gold standard is based on real-time proctoring in the operating room (OR) – developed by Halsted over a century ago. To assess skills, proctors use objective rating scales such as the Objective Structured Assessment of Surgical Skills (OSATS) or time and error metrics of the Fundamentals of Laparoscopic Surgery (FLS) program. The current assessment techniques suffer from drawbacks, including being subjective with poor inter-rater reliability, distribution errors, recall bias, and halo effects. Video-based assessment (VBA) can ensure patient safety by allowing surgeons to provide post-hoc feedback. Still, VBA is not real-time, and post-hoc evaluation may lead to burnout. Also, it is prone to subjective interpretation. Deep Learning (DL) can circumvent these limitations. However, current models use video snippets, ignoring long-term representations. This prevents them from providing meaningful feedback. Besides, models predominantly use motor actions, e.g., hand motion, for assessment which is not holistic. Finally, they are data-hungry and restricted to their training domain. This prevents them from adapting to new tasks without retraining via data-intensive sets. This thesis addresses these limitations to achieve feasible real-life implementation of VBA. Firstly, we propose a deep learning (DL) model – Video-Based Assessment Network (VBA-Net) – that can automatically and objectively assess surgical skills in real-time from complete videos while generating statistically-verifiable feedback via Class Activation Maps (CAMs). We also benchmarked the VBA-Net and achieved state-of-the-art performance in the commonly-used public dataset, JIGSAWS. Secondly, we fuse neural activations from the prefrontal cortex – shown to differentiate skill levels – with motor action data for holistic skill assessment. We also compare motor actions with neural activations. For comparison, we use a t-test on the distributions of performance metrics, e.g., accuracy and R2, after 100 repetitions. The VBA-Net generates slightly better assessment performance via neural activations (p<.05) than motor actions, and multimodality leads to the best performance (p<.05). This shows the saliency of neural activations and the advantage of holistic inputs in skill assessment. Lastly, we develop an Adaptive VBA-Net (A-VBANet) to deliver domain-agnostic skill classification via one-shot learning without retraining. A-VBANet successfully adapts to independent physical simulators with a single video-based sample. Further, using the simulator data, A-VBANet successfully adapts to the OR task – laparoscopic cholecystectomy – only with one sample. The technology developed in this thesis has the potential to significantly impact patient care by providing automated assessment tools for surgeons based on videos. Large-scale adoption will require the development of surgical video repositories. However, the development of meta-learning approaches will mitigate the need for intensive annotation and enable the assessment of intra-operative videos based on simulator data, which is abundant and easier to obtain.
December 2022
School of Engineering
Dept. of Mechanical, Aerospace, and Nuclear Engineering
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Restricted to current Rensselaer faculty, staff and students in accordance with the Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.