Keywords

1 Introduction

Virtual reality (VR) is increasingly being used in surgical training as it offers a risk-free, interactive, repeatable, and easily accessible platform that can be utilised to develop standardised training programs. Despite an emerging body of evidence related to the effectiveness of VR in surgical training [1, 10, 15, 40], it is clear that the availability of a surgical simulator alone cannot promote best practice amongst surgical trainees. For example, a study in the United States observed that only 14% of surgical residents completed VR training when participation was voluntary [3]. Thus, even when facilities for VR training exist, a lack of awareness, trainee motivation, and limited access to simulators inhibit their usage [22, 25]. To overcome these barriers, an appropriate VR-based curriculum should be developed and integrated into mandatory competency-based surgical training programs [31, 33].

Optimal skill acquisition during simulation-based training relies on the availability of performance feedback, task variety with a range of difficulty levels, and the opportunity for extensive deliberate practice [13, 21, 33]. The incorporation of the above considerations into a VR-based module of a surgical curriculum is likely to improve trainees’ readiness for the operating room. The availability of immediate performance feedback is a required component of deliberate practice [9]. Its purpose is to reinforce strengths, address weaknesses, and foster improvements in the learner by providing insights into the consequences of their actions and by highlighting the differences between intended and actual results [33]. While some simulators provide feedback by means of an expert supervising practice [6, 27], others have been developed with in-built real-time procedural feedback. For example, a dental simulator exists that compares the user’s tool position, tool orientation and force application to an expert data set, and displays its feedback on the screen [30]. Similarly, Sewell et al. [32] have developed a system that provides real-time feedback on bone visibility, drilling velocity and force. The University of Melbourne VR Temporal Bone Surgery Simulator [26] provides step-by-step procedural feedback [38] and technical verbal feedback on drill handling skills [7, 18, 19, 41, 42].

Another important aspect of a surgical curriculum is practice variation, which is essential to prepare trainees for anatomical variation between patients [13, 33]. In the context of VR simulation, practice variation refers to the availability of multiple specimens of varying difficulty levels. The availability of such practice variation has been shown to improve surgical performance on previously unseen temporal bone models by Otolaryngology residents [29]. Various VR surgical simulators for laparoscopy [3, 27] and temporal bone drilling [5, 16, 23, 34] have been developed to offer a selection of cases with a range of difficulties.

To maximise skill acquisition and support self-directed learning, real-time feedback must be provided when practicing on different specimens. However, performance feedback doesn’t appear to be available across the full range of cases on existing surgical simulators, limiting their educational value. Also, at present there are no reported methods that transfer feedback models automatically between different cases, as an alternative to the time consuming and data intensive process of developing feedback models individually for each case.

According to the concepts of transfer learning [28], feedback transfer can be defined as transferring the same task (providing feedback on performance) from a source domain to a target domain. The differences in domains can be characterised as the variations in anatomy. Although the feature space (metrics on which feedback should be provided) is the same, the values that these metrics take may differ according to anatomical variations between specimens. Therefore, the transfer of feedback from one specimen to another can be characterised as a domain adaptation problem [2]. It is not practical to obtain labelled data for each new specimen to train a new model or to retrain an existing one. As such, unsupervised learning (such as, instance weighting for covariate shift, self-labelling methods, changes in feature representation, and cluster-based learning [20]) is commonly used in solving problems of this form.

In contrast to using unsupervised learning for domain adaptation, we investigate a simpler, direct transfer approach supported by a pre-processing task that makes the source and target domains similar. To this end, we define regions of a specimen where surgical skills can be considered to be consistent. By defining these regions, we account for the changes in anatomical variation in specimens. We assume that the source and target specimens are similar enough that changes in the values of metrics (features) that feedback is provided on between specimens are negligible. This enables direct transfer of a feedback model of one region in the original specimen to the corresponding region in another specimen. Using this method, we transfer the neural network based model developed for providing technical feedback in VR temporal bone surgery in Ma et al. [18] to new specimens. We show through a user study that the feedback provided by the transferred models are as accurate as that provided by the original model. We also show that practice on multiple specimens with transferred performance feedback results in positive acquisition of surgical skills.

2 VR Environment

The VR platform used in this research is the University of Melbourne temporal bone surgery simulator temporal bone surgery simulator (see Fig. 1). Virtual models of multiple temporal bones, generated from segmented micro-CT scans of cadaveric bones, are available to drill on this simulator. A haptic device that emulates the operation of a surgical drill provides tactile feedback during an operation. Depth perception is achieved through NVIDIA 3D vision technology. A MIDI controller is used as an input device to change environment variables such as magnification level and burr size. Using the VR simulator, surgeons can perform ear operations to remove disease and improve hearing. The surgery under consideration in this paper is cortical mastoidectomy. This is a common procedure performed to remove mastoid air cells as a treatment for chronic otitis media, with or without cholesteatoma or mastoiditis. It is also performed as an initial step of cochlear implant surgery and various lateral skull base operations. A cortical mastoidectomy requires routine identification of key anatomical structures including the tegmen mastoideum, sigmoid sinus, incus, and facial nerve to be used as landmarks to ensure safe removal of the mastoid bone.

Fig. 1.
figure 1

A surgeon performing an operation on the VR temporal bone surgery simulator.

3 Types of Performance Feedback

Surgical skills are multi-faceted. As such, surgeons provide performance feedback and guidance on different aspects of surgical skill during training. To emulate this, the simulation system considers four main aspects of skill that need to be acquired: procedural knowledge, knowledge of landmarks/boundaries of the operative field, manipulation of environmental variables, and drill handling/technical skills. The effectiveness of these types of feedback/guidance methods on one specimen have been established by Davaris et al. [7].

Procedural guidance is provided using the step-by-step guidance method of Wijewickrema et al. [38]. The steps were obtained by manually segmenting an expert procedure. Each step of the surgery is highlighted sequentially on the temporal bone - the next step is only provided once the current step is completed.

Verbal warnings are provided in the form of verbal advice when nearing an anatomical structure to make trainees aware of the boundaries of the operative field [35]. To this end, distance thresholds per anatomical structure were defined, the crossing of which generated proximity warnings. Further, to enable learning of the anatomical structures, functionality to make the temporal bone transparent, so that the underlying structures can be viewed, is also available.

Feedback on environmental settings such as magnification level and burr size are provided as verbal advice. The ideal values of these settings differ according to where the surgeon is drilling. For example, at the start of a cortical mastoidectomy, an overall view of the surgical space is required, and therefore, a lower magnification level is used. When drilling in tighter spaces, a higher magnification level is required. Advice on how to change these values are provided by comparing against value ranges calculated from pre-collected expert data per surgical region. The region calculation process is discussed in the next section.

For the provision of technical feedback (feedback on surgical technique or motor skills), the method discussed in Ma et al. [18] is used. Similar to environmental setting, surgeons adopt different surgical technique when drilling in different regions of the temporal bone. For example, higher speed and force may be used when drilling in an open area, while lower speed and force may be used when near anatomical structures. As such, different behaviour models were trained for different regions and used to provide technical feedback. Figure 2 shows an overview of the technical feedback generation process.

Fig. 2.
figure 2

Method of providing feedback on surgical technique.

For the offline training of the neural network classifier, a dataset of 16 surgeries recorded by 7 experts and 34 surgeries from 18 novices was used. The surgical performances were segmented into strokes - continuous drilling motions without abrupt changes in direction [12]. All strokes in expert and novice performances were considered to be expert and novice strokes respectively. The strokes were separated according to the region. Isolation forests [17] were used to remove outliers. Characteristics (or metrics) of each stroke, such as length, duration, speed, and force were then calculated to represent a stroke. These were used to train a neural network with one hidden layer per region. The number of hidden neurons for each region was chosen using cross validation [18].

In real-time, strokes are segmented from the surgical trajectory, and the neural network classifier for the relevant region is used to identify whether it is an expert or novice stroke. In the case of a novice stroke, an adversarial example [11], a small modification of the metrics that changes the prediction of the model from novice to expert, is generated. The resulting change is recorded in a buffer as an increase or decrease of the metrics that were changed to generate the expert prediction. Once multiple instances of the same change is generated in a row, it is presented to the user as verbal auditory feedback (for example, ‘decrease force’) [37].

4 Transfer of Feedback Models

As a method of adapting the feedback models to specimens other than the one they were developed on, we explored a method of direct transfer. We assumed that surgical technique (and environmental settings) are similar in the same region on all specimens and that the specimens are similar enough that the values of the metrics (features) that the feedback is provided on remain the same. As such, once the regions are defined on a new specimen, feedback models developed on the original specimen can be transferred to be used on this new specimen without any changes to the models themselves. Note that this assumption is only valid for specimens with no abnormal or pathological anatomy, which is the case for the specimens considered here.

We used the same process used in the generation of regions in the original specimen for this purpose [35]. Regions were identified as the areas surrounding or between anatomical structures. The width of a region was pre-defined and morphological operations were used to generate them. For example, to generate areas around an anatomical structure, we dilated the voxels belonging to that structure and subtracted them from the resulting region. To obtain regions between anatomical structures, we used dilation and erosion in tandem. Figure 3 shows the regions generated for different specimens.

Fig. 3.
figure 3

Definition of regions where surgical technique is considered to be uniform: (a) original specimen and (b)–(d) transfer specimens. The anatomical structures and the regions defined around them are shown in opaque and transparent colours respectively. (Color figure online)

For the generation of proximity warnings on different specimens, we used the same distance thresholds that were defined for the original specimen. We manually segmented steps of an expert procedure for each specimen in order to provide procedural guidance.

5 Validation of the Feedback Transfer

5.1 Study Design

We conducted a user study of 14 medical students to evaluate the accuracy of feedback transfer and to test the effect of the transferred feedback on skill acquisition. The ratio of postgraduate (MD) to undergraduate (MBBS) students was 5:2 and the male to female ratio was 4:3. This study was approved by the Royal Victorian Eye and Ear Hospital Human Ethics Committee (#17/1312H). Written consent was obtained from all participants.

Participants were first shown a video tutorial on how to perform a cortical mastoidectomy on our VR simulator. Then, they were shown how to use the simulator and given five minutes of familiarisation time. Participants then performed the same surgery on the VR simulator with no automated guidance (pre-test). The pre-test was performed in order to gauge their initial skill level, to account for individual variations in aptitude. This is the specimen that the original feedback models were developed on (Bone 0). Next, they underwent training on four specimens (in the same order) with real-time automated guidance. The first of the training sessions was on the original bone. The next three sessions were on different specimens (Bones 1–3) and the automated feedback on these were transferred from the original specimen using the method discussed above. After this, on the same day, the participants performed a post-test: a cortical mastoidectomy without feedback on the original specimen. Note that the ‘transfer’ temporal bone specimens were from the same side of the head as the original specimen (right-hand side). All procedures were recorded by the simulator and using screen capture software. The study design is shown in Fig. 4.

Fig. 4.
figure 4

Design of the validation study.

5.2 Accuracy of Transfer

To determine the accuracy of the provided technical feedback, the errors in the feedback were determined by an expert surgeon through the analysis of anonymised videos based on the following criteria [36].

  • False positives (FP): feedback was provided while stroke technique was acceptable.

  • Wrong content (WC): participants’ technique was accurately detected as poor, but the content of the feedback was inaccurate.

  • False negatives (FN): Feedback was not provided while stroke technique was unacceptable.

The accuracy of the feedback (ACC) was calculated for each training session as \(ACC = \frac{TF - FP - WC}{TF + FN } \times 100\%\), where, TF is the total feedback provided in a session. Feedback accuracy was compared between specimens using a Kruskal-Wallis test. There was no significant difference in the accuracy level of the feedback provided by the original model when compared to that of the transferred models. Figure 5 illustrates this comparison.

Fig. 5.
figure 5

Accuracy of the technical feedback. Bone 0 is the original specimen on which the feedback models were developed. Bones 1–3 are the new specimens that these models were transferred to. No significant difference was observed in the accuracy levels of the feedback on all specimens.

5.3 Effectiveness of Transfer

To investigate the effect of the transferred feedback on skill acquisition, participant performance in the pre- and post-tests were evaluated by a blinded expert surgeon. To this end, a validated assessment scale designed for temporal bone surgery [14] was used. This scale comprises two parts: checklist and global instruments, and assesses competency of the surgeon in performing the surgery as a whole. This takes into consideration all aspects of surgical skill, for example, knowledge of landmarks and procedure as well as technical skills. The checklist and global instruments consists of 22 and 10 items respectively, each based on a Likert scale ranging from 1 (unable to perform), through 3 (performs with minimal prompting), to 5 (performs easily with good flow). Comparison of pre- and post-test scores using a Wilcoxon signed rank test showed significant improvement in performance (checklist score: \(p = 0.001\) and global score: \(p = 0.002\)). Figure 6 shows the comparison between pre- and post-test scores.

Fig. 6.
figure 6

Comparison of pre- and post-test performance results: (a) checklist score and (b) global score. Significant improvements were observed in the post-test scores when compared to the pre-test scores in both scores.

6 Discussion

The results of this study demonstrate the accuracy of the feedback transfer, as no significant difference was observed between the accuracy of the feedback of the original and transferred models. Furthermore, participants showed significant improvement in surgical performance after training on specimens with transferred feedback models, demonstrating that the transferred feedback (along with other factors such as repeated practice) had a positive impact on skill acquisition. However, it has already been established that repeated practice (without feedback) is not sufficient to impart surgical skills in mastoidectomy in a novice cohort such as the participants in our study [7]. Therefore, we can attribute the improvements in performance to the effectiveness of the feedback.

Successful feedback transfer (of the type outlined in this study) will allow VR simulators to meet the requirement of deliberate practice to have immediate and continuous feedback [9, 33]. The provision of instant, unsupervised performance feedback by VR simulators offers a time efficient alternative to the current dependency on continuous expert supervision. Thus, this VR curriculum may serve as a valuable adjunct to current surgical training. In addition, developing a library of virtual temporal bone models covering anatomical variants complete with automated feedback could provide a valuable training resource for rural trainees where exposure to varying cases is limited.

It would also be beneficial to apply feedback transfer to VR simulation in other types of surgery, including laparoscopic surgery [24] and neurosurgery [4], or even endovascular procedures [8, 39]. However, a potential barrier to the reapplication of this direct feedback transfer technique would be the ability for comparable pre-processing of the simulation cases, defining different anatomical regions to facilitate the transfer of feedback models.

A limitation of this work is that the developed method was for feedback transfer between specimens with normal anatomy. As surgical behaviour may not be the same when operating on abnormal or pathological specimens, this direct transfer method may not be as accurate for those. For example, for an abnormally large specimen, values of feedback metrics such as stroke length may not be directly transferable. In such cases, the region-based method could be used in conjunction with more complicated domain adaptation techniques and/or a limited amount of labelled data from the abnormal or pathological specimens to overcome this. This may also be used to improve the accuracy of transfer between normal specimens. This is a future avenue of research we will explore.

A further study limitation is that only three of the four types of performance guidance/feedback provided during training were automatically transferred. Procedural guidance was provided by segmenting an expert procedure performed on each specimen. In future work, this process will also be automated, albeit using different techniques to that used for transferring technical feedback. A simulation-based surgical training program that incorporates other concepts of curriculum design that were not considered here (such as practice distribution, task difficulty including pathological cases, and proficiency based training) [33] will also be developed and validated.

The generalisability of our results are limited by the small number of specimens, cohort size, and use of a single expert reviewer. Further studies will be conducted to account for this bias with a larger number of specimens on a larger cohort, including those with intermediate level surgical skills (surgical residents). Assessments by multiple experts will also be performed to reduce the subjectivity of assessment.

7 Conclusion

We introduced a method of transferring technical feedback models from the specimen they were developed on to other specimens and showed that the feedback provided by the transferred models were as accurate as that of the original model. We also showed that the transferred feedback assisted in positive skill acquisition. This enables the development of self-directed, simulation-based surgical curricula that can be used as adjuncts to traditional surgical training methods.