|Year : 2022 | Volume
| Issue : 2 | Page : 62-71
A mixed-methods, validity informed evaluation of a virtual OSCE for undergraduate medicine clerkship
Giovanna Sirianni1, Jenny S. H. Cho2, David Rojas3, Jana Lazor1, Glendon Tait4, Yuxin Tu2, Joyce Nyhof-Young1, Kulamakan Kulasegaram1
1 Department of Family and Community Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; MD Program, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
2 MD Program, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
3 MD Program, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; Department of Obstetrics and Gynaecology, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
4 MD Program, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; Department of Psychiatry, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
|Date of Submission||22-Jan-2022|
|Date of Acceptance||12-May-2022|
|Date of Web Publication||09-Sep-2022|
Dr. Giovanna Sirianni
Department of Family and Community Medicine, Temerty Faculty of Medicine, University of Toronto, 500 University Avenue, 5th Floor, Toronto, Ontario M5G 1V7
Source of Support: None, Conflict of Interest: None
Background: Pandemic-related learning environment disruptions have threatened clinical skills development and assessment for medical students and prompted a shift to virtual objective structured clinical examinations (vOSCEs). This study explores the benefits/limitations of vOSCEs from the perspective of key stakeholders and makes recommendations for improving future vOSCEs. Materials and Methods: Using a mixed-methods, utilization-focused program evaluation, we looked at feasibility and implementation evidence that addresses content, response process, and feasibility as per Messick’s validity framework. The analysis of test data was reviewed to inform reliability, acceptability, and consequential validity. A 14-question online survey was sent to both students and faculty followed by stakeholder focus groups. Descriptive statistics were collected, and deidentified transcripts independently reviewed and analyzed via constant, comparative, and descriptive thematic analysis. Results: The survey results showed the vOSCE was a feasible option for assessing history-taking, clinical reasoning, and counseling skills. Limitations were related to assessing subtle aspects of communications skills, physical examination competencies, and technical disruptions. Beyond benefits and drawbacks, major qualitative themes included recommendations for faculty development, technology limitations, professionalism, and equity in the virtual environment. The reliability of the six vOSCE stations reached a satisfactory level with a G-coefficient of 0.51/0.53. Conclusions: The implementation of a virtual, summative clerkship OSCE demonstrated adequate validity evidence and feasibility. The key lessons learned relate to faculty development content and ensuring equity and academic integrity. Future study directions include examining the role of vOSCEs in the assessment of virtual care competencies and the larger role of OSCEs in the context of workplace-based assessment and competency-based medical education.
Keywords: Clinical competence, medical, program evaluation, students, validity
|How to cite this article:|
Sirianni G, Cho JS, Rojas D, Lazor J, Tait G, Tu Y, Nyhof-Young J, Kulasegaram K. A mixed-methods, validity informed evaluation of a virtual OSCE for undergraduate medicine clerkship. Educ Health Prof 2022;5:62-71
|How to cite this URL:|
Sirianni G, Cho JS, Rojas D, Lazor J, Tait G, Tu Y, Nyhof-Young J, Kulasegaram K. A mixed-methods, validity informed evaluation of a virtual OSCE for undergraduate medicine clerkship. Educ Health Prof [serial online] 2022 [cited 2022 Oct 7];5:62-71. Available from: https://www.ehpjournal.com/text.asp?2022/5/2/62/355837
| Introduction|| |
Pandemic-related disruptions to learning environments pose a threat to clinical skills development and assessment for medical students, prompting a shift to virtual offerings, including the provision of virtual objective structured clinical examinations (vOSCEs) at the University of Toronto Faculty of Medicine.,,, The current assessment literature in this emerging area has noted the overall acceptability of the vOSCE from learner and faculty assessor perspectives.,,, Prepandemic vOSCE studies have commented on their acceptability, opportunities for vOSCEs to assess learners in remote locations, and the potential for financial savings.,, The main limitations of the vOSCE format thus far have been the ability to accurately and thoroughly assess physical examination skills and technical challenges during examination administration.,,, Systematic studies of validity evidence and implementation remain rare. One large student cohort study had the benefit of scale; however, it was limited to a two-station OSCE administration and had no validity evidence relevant to measurement. Several 2020 publications with larger student cohort sizes outlined detailed suggestions for vOSCE implementation during the pandemic., Drawing on this literature, we report here a systematic evaluation of validity evidence and implementation lessons for a large vOSCE at the University of Toronto. The study results are presented using elements of Messick’s validity framework to report on our process, results, and recommendations.
| Materials and Methods|| |
We used a mixed-methods, program evaluation case study that included the following data sources: (1) development, feasibility, and implementation evidence generated during the vOSCE that addresses content, response process, and general feasibility relevant to the valid implementation of assessments; (2) test data analysis to inform reliability and other measurement relevant analyses to inform internal structure evidence; (3) acceptability and consequential validity evidence; (4) a 14-question online survey sent to students and faculty assessors via Qualtrics, and (5) stakeholder focus groups to gain an in-depth understanding of vOSCE-related perspectives. We reconciled these data sets through the lens of validity evidence and utilization-focused evaluation that was intended to support future development. This mixed-methods study received REB approval (Protocol #40132) at the University of Toronto.
The clerkship OSCE is comprehensive, high-stakes, ten stations and offered to one of the largest cohorts of medical students in Canada. It covers clinical content and skills across all clerkship courses. Students must achieve a pass on their clerkship OSCE to graduate. Because of multiple pandemic waves and accompanying public health restrictions on gathering sizes, the MD program administered three vOSCEs from October 2020 to August 2021. We adopted key implementation assumptions, including establishing extensive administrative support, being prepared for technical disruptions, allowing additional time between stations, and having technology failure back-up plans.,,,
Sample size and sampling methods
An online survey was distributed to students (n = 254) and assessors (n = 154) after the October 2020 vOSCE iteration with two subsequent reminders. Consent was explained in the recruitment email and presumed through survey completion. A second recruitment email was sent asking students and assessors of the October 2020 and March 2021 vOSCE iterations to join virtual focus groups of three to five participants. All participants signed consent forms. Each focus group was led by an education scientist (JNY) with qualitative research expertise and unaffiliated with the vOSCEs, conducted via Zoom, audio-recorded, and transcribed.
Deidentified, descriptive statistics were obtained for survey data. Examination psychometrics were analyzed in a pre- and postexamination generalizability analysis, along with vOSCE station reliability data, which evaluated internal structure evidence. Survey and focus group data were relevant to consequential validity and feasibility issues pertinent to future implementations. The qualitative arm was driven by a constructivist approach recognizing the need to consider the interrelated nature of various contexts from the perspectives of both students and assessors.,
Saturation occurred when new focus groups did not offer novel perspectives on vOSCE experiences. For the qualitative arm, transcripts were deidentified prior to analysis. To review the transcribed audio files of the interviews, we used descriptive thematic analysis as outlined by Braun and Clarke, specifically a constant comparative data analysis. Initially, two authors (JC, GS) generated descriptive codes through rereading, reflection, and team discussion. Common recurrent categories were then identified and refined inductively and deductively, and a joint preliminary coding framework created. Themes were further refined alongside a third author (JNY).
| Results|| |
Content and response process validity
Our vOSCE construct validity was informed by key factors including content and response process. These sources of evidence were addressed by the examination blueprint and how students and raters would approach the vOSCE stations. Organizers determined that some key skill domains assessed by our in-person summative clerkship OSCE lent themselves well to a virtual format, including history-taking, counseling, and clinical decision making. Other skills, including virtual physical examinations, could not be easily covered as they posed important response validity issues, as well as authenticity challenges. We considered whether physical examination aspects should be included in assessments. The primary contender was a narrative type of performance where students would verbalize which physical examination maneuvers they would complete and how. However, as this primarily addresses a “knows” or “knows how” competence level, we felt that this format replacement would not meet program needs nor authentically demonstrate student skills. Moreover, faculty would have less experience assessing the verbalization of physical examination skills compared to directly observing skills performance. As such, the major content change was the removal of observed physical examinations.
Because of tight pandemic-induced timelines associated with our first vOSCE, we opted to leverage existing technology, including our current assessment platform and Zoom technology, given their cost-effectiveness and familiarity., To allow for seamless examination functioning, our four Zoom links, which were analogous to OSCE examination sites, each required multiple levels and types of support. Unanticipated in-station delays were primarily accounted for between stations by an expanded 7-minute gap that allowed students to make-up time due to technical disruptions.
In the vOSCE, we opted to continue asking postencounter probe questions to address clinical decision making and offering opportunities to assess students’ interpretation of clinical findings, laboratory results, radiologic images, and electrocardiograms. Regarding our raters’ assessment modality, we continued using the global rating scales for our in-person examination, removing physical examination scales and modifying verbal and nonverbal global rating scales to account for the in-person to virtual switch.
Another important response process validity aspect relates to the provision of extensive faculty development (FD) for the assessors focusing on adapting their role to the virtual format. Previously, faculty assessor orientation only occurred on the examination day. For the vOSCE, our approach changed to include multimodal, synchronous and asynchronous FD opportunities, and just-in-time, examination day orientation and technical support.
Findings from internal structure
Using 2019 OSCE administration data, we conducted a generalizability analysis and decision study to determine the minimum station number for an acceptably reliable summative OSCE. A decision study is an extended generalizability study that aims to optimize reliability and resourcing. This yielded a six-station plan that predicted adequate reliability of 0.4. This was within the range our assessment committee considered acceptable, as previous 2013 and 2019 analyses of the in-person OSCE had shown reliabilities ranging from 0.4 to 0.6. Although we aim for a high reliability for summative assessment (>0.5), we accepted that the multidisciplinary assessment of a comprehensive OSCE may show variable reliabilities.
Postexamination, we evaluated the six stations’ reliability on each day; this reached satisfactory levels with a G-coefficient of 0.51 (day 1) and 0.53 (day2), which was higher than that of a previous in-person clerkship OSCE (G-coefficients of 0.436 [day 1] and 0.448 [day 2]). We used the vOSCE iteration data to conduct a decision study to understand the number of future OSCE stations required for a reliability coefficient in keeping with our historical data. We determined that six virtual stations would continue to yield an acceptable reliability of 0.51. Reliability via station discrimination indices is reported in [Table 1].
|Table 1: Discrimination indices of stations included on day 1 and day 2 of the vOSCE|
Click here to view
Acceptability and consequential validity
The final survey response rate was 33.5% for students and 64.2% for examiners [Table 2] and [Table 3]. Overall, both groups felt the vOSCE ran smoothly and was feasible for assessing history-taking, clinical reasoning, and counseling skills. As to whether future iterations of the OSCE should occur virtually, results were more equivocal.
|Table 2: vOSCE program evaluation; survey results from students and faculty assessors|
Click here to view
Focus group results
A total of 15 faculty assessors were divided among four focus groups, and three students were recruited into one focus group. Most faculty assessors had extensive OSCE-based assessment experience, with one new assessor. All students were in their fourth year and had experienced both virtual and in-person OSCEs.
Four main themes were developed, including: (1) discussion of the benefits and drawbacks of vOSCEs for clinical skills assessment, (2) required vOSCE preparation distinct from an in-person examination, (3) importance of technology, and (4) considerations of professionalism and equity [Table 4]. Focus group data converged with survey data on virtual format benefits, including its suitability for the assessment of history-taking, counseling, and clinical decision making. Virtual format limitations for assessing physical examination skills, nonverbal communication, and body language were reiterated in the focus groups. Interestingly, virtual station encounters were not felt to be barriers to developing patient rapport. However, assessors voiced concerns about how the lack of exposure to in-person OSCE formats might impact students in the future. They described OSCEs as a rite of passage and expressed concern about the lack of preparation among students for future high-stakes examinations. Other vOSCE format benefits included efficiency and convenience. Participants felt the virtual format was a time-saver that also increased access to available assessors and standardized patients by removing geographical limitations. Participants saw potential opportunities in the virtual format, including the assessment of virtual medicine competencies and improved efficiency from assessment platform digitization.
Regarding FD, a mixed response arose from participants about whether pre-examination faculty preparation was adequate. Although some felt the process was smooth and fulsome, others felt more could have been done prior to the assessment. Assessors noted that the virtual format required advanced preparation in terms of examination materials and required technical pieces, compared with in-person assessments where no advance preparation was required. Assessors also discussed the increased complexity of their vOSCE tasks that often included simultaneously screen-sharing the stem/station materials, while watching for visual station prompts, attending to student performance, and completion of online assessment forms.
Significant concerns arose about technical glitches (e.g., Internet connectivity), which aligned with survey responses. Participants lamented the lack of audio cues in the Zoom platform to prompt students and examiners to station transitions. Others described a lack of standardization in the virtual format that they considered concerning and in opposition to the intent of an OSCE. Finally, assessors also missed the quality assurance oversight and student feedback provided by session video-recording in our previous in-person OSCEs.
Finally, participants expressed concern about professionalism and equity challenges for students with disabilities and how to ensure virtual format accessibility for those requiring accommodation. Another barrier noted was for students from lower socio-economic backgrounds who might find it challenging to have a private, quiet home space to sit the examination or reliable Internet access. Despite this, student participants felt the MD program did support individuals requiring a secure campus space to sit the examination.
Two main areas of divergence existed between student and assessor focus groups around academic integrity and the future of vOSCEs in medical education. Students felt that breaches in academic integrity would be unlikely, as they are monitored during the assessment, whereas assessors felt that opportunities to use notes or examination aids were feasible; however, no assessors noted academic integrity concerns in their experience with vOSCEs. Although students felt that future iterations of the OSCE should occur virtually, as they felt physical examination competencies were being adequately assessed in the workplace, assessors favored a hybrid model moving forward that would combine a vOSCE with in-person assessments of physical examination skills.
| Discussion|| |
This study evaluated the transition to a vOSCE using a validity lens, including the often-neglected concerns of feasibility and acceptability. In our local context, this implementation of a virtual, summative clerkship OSCE demonstrated adequate validity evidence, feasibility, and acceptability. As such, the MD program continues to offer a virtual clerkship OSCE through the ongoing pandemic to balance stakeholder safety with the need for continued student assessment and feedback. From a consequential validity perspective, the vOSCE offers a viable alternative to traditional in-person clinical assessment during an unprecedented period. Both students and assessors found the vOSCE acceptable for the assessment of history-taking, counseling skills, and clinical decision making; its convenience and accessibility provided students and assessors opportunities to participate in clinical assessments in a private, socially distanced space. The vOSCE also enhanced assessor involvement as geographical limitations no longer applied. Assessors also noted improvements in online assessment from previous paper-based forms; online assessment facilitated the ease of form completion, especially for narrative comments. Both groups felt that the inability to demonstrate and assess physical examination skills was the biggest limitation of the vOSCE format. Other concerns included technical disruptions and the additional complexity of assessor tasks during virtual assessments. Assessors also felt that students were missing opportunities for professional identity formation, especially regarding the preparation for future high-stakes clinical assessments.
A major lesson learned in our context was around the need for extensive FD in the virtual context because of the increased assessor task complexity. In response, we developed an assessor information letter with embedded resource material hyperlinks, multiple synchronous-recorded webinars that allowed for asynchronous access, and drop-in Zoom help sessions to supplement our usual examination day orientation with just-in-time support. Recommendations from the qualitative study arm suggest ongoing FD improvements including a virtual, simulated session for faculty to familiarize themselves with vOSCE formats, developing a buddy system for junior faculty to work with a more senior, peer assessor, and provision of one-on-one support for stakeholders unfamiliar with the platform or tasks. Another important focus group recommendation was to explicitly prepare students and faculty for expected technical disruptions and delays. Indeed, our stakeholder orientations already include the discussion of the additional time allotted between stations to mitigate delays and advice for faculty about how to address technical disruptions. We also had technical support on each of our Zoom links to offer expertise as required. Such expert technical support is a key recommendation for other institutions considering vOSCEs.
An unexpected theme from our focus groups includes considering challenges for students with disabilities (e.g., hearing and visual impairments) and how we can create an equitable assessment environment for those students. Currently, all our students have access to examination accommodation prior to OSCEs, be they in person or virtual. Students primarily require additional time related to learning challenges; we did not have students with identified visual or hearing impairments during the vOSCEs. The onus will be on the program to resolve such accessibility issues in future, perhaps with the use of descriptive text for those with hearing impairment. Furthermore, descriptive text may be of general utility to all students to help with information processing. An equitable test environment was also an important issue. In particular, concerns were expressed about students without a quiet and private place to sit their examination or lacking access to high-speed Internet. The program already prioritizes this aspect and offers students lacking such amenities the opportunity to sit the examination on campus. Concerns about professionalism and academic integrity require further exploration. Although students felt breaches of academic integrity were unlikely, assessors expressed concern around mitigating this risk, especially given this is a high-stakes assessment. In our setting, we asked students to ensure virtual backgrounds were not active, show their ID badge on camera, leave cameras and microphones on throughout the examination, and only take notes on paper, rather than on their device, to minimize the risk of using unauthorized aids. Future suggestions could include asking students to provide a 360-degree view of their room prior to the examination and showing their papers to camera to ensure they are blank. Programs should take such important points in consideration for vOSCE administrations.
Once the pandemic is behind us, students and examiners appear to diverge on whether vOSCE use should continue. Although students felt that vOSCEs are acceptable, assessors felt the ability to assess physical examination skills in an examination is important enough to warrant a hybrid in-person and vOSCE administration. Future examination blueprint considerations should include which skills are best assessed in a virtual format, while potentially leaving other skills to workplace assessments. For example, a vOSCE may be best utilized to assess specific competency in provision of virtual clinical care, which suggests a need for new assessment constructs. vOSCEs may also provide opportunities for across-university collaboration, given freedom from locational constraints, that could allow for leveraging of expertise and resources for a notoriously resource intensive examination. Finally, the vOSCE format allows for the assessment of students’ learning in distributed, remote, and rural sites.
We note several study limitations. Firstly, we were only able to recruit a small number of students to the focus groups. They may not represent the larger cohort and lead to difficulties in generalizability. However, we noted student and assessor congruence across most focus groups. Secondly, there is a potential of biased results because of group influence and social desirability in focus groups; however, such undue influence was not noted across the diverse sessions.
| Conclusions|| |
This study explores the benefits and limitations of vOSCEs from the perspective of key stakeholders, outlines key validity evidence that should be considered in order to provide a robust assessment, and suggests recommendations for improving future vOSCEs through key considerations around FD. It has also helped identify several areas for future study, including examining the role of OSCEs in conjunction with workplace-based assessment in the context of competency-based medical education. Furthermore, this study has identified issues that have not been extensively discussed to date on assessment equity, accessibility, and ensuring virtual examinations occur in an environment where academic integrity is upheld.
The shift to vOSCEs is promising. Considering which elements of this approach may continue to live-on in medical education beyond the pandemic is an exciting and promising development in health professions’ trainee assessment.
Thank you to Aaron Forward (Standardized Patient Program Project Manager), Shibu Thomas (Clerkship Coordinator), Frazer Howard (Data Analyst), MD Program Administrators, Discover Commons staff, and Standardized Patient Program staff for their dedication to implementing a vOSCE. Additional thanks to Dr. Mirek Otremba for sharing his team’s work and to the Office of Faculty Development at the MD Program. Thank you to the students and faculty who participated in this study. Thank you to the John Bradley Summer Research Program and Heather Sampson for supporting Ms. Cho’s work on this project.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
GS: concept, design, literature search, data acquisition, article preparation, article editing, and article review; JSHC: literature search, data analysis, article preparation, article editing; DR: concept, design, data acquisition, data analysis, article editing, and article review; JL: concept, design, data acquisition, article preparation, article editing, and article review; GT: concept, design, article editing, and article review; YT: data acquisition, data analysis, article editing, and article review; JNY: literature search, data acquisition, data analysis, article preparation, article editing, and article review; KK: concept, design, literature search, data analysis, article preparation, article editing, and article review.
| References|| |
Ryan A, Carson A, Reid K, Smallwood D, Judd T Fully online OSCEs: A large cohort case study. MedEdPublish 2020;9:214.
Lewandowski R, Stratton A, Gupta TS, Cooper M Twelve tips for OSCE-style tele-assessment. MedEdPublish 2020;9:168.
Hannon P, Lappe K, Griffin C, Roussel D, Colbert-Getz J An objective structured clinical examination: From examination room to Zoom breakout room. Med Educ 2020;54:861.
Major S, Sawan L, Vognsen J, Jabre M COVID-19 pandemic prompts the development of a web-OSCE using Zoom teleconferencing to resume medical students’ clinical skills training at Weill Cornell Medicine-Qatar. BMJ Stel 2020;6:376-7.
Khan FA, Williams M, Napolitano CA Resident education during COVID-19, virtual mock OSCE’S via Zoom: A pilot program. J Clin Anesth 2021;69:110107.
Hannan TA, Umar SY, Rob Z, Choudhury RR Designing and running an online objective structured clinical examination (OSCE) on Zoom: A peer-led example. Med Teach 2021;43:651-5.
Novack DH, Cohen D, Peitzman SJ, Beadenkopf S, Gracely E, Morris J A pilot test of WebOSCE: A system for assessing trainees’ clinical skills via teleconference. Med Teach 2002;24:483-7.
Bulik RJ, Frye AW, Callaway MR, Romero CM, Walters DJ Clinical performance assessment and interactive video teleconferencing: An iterative exploration. Teach Learn Med 2002;14:124-32.
Prettyman AV, Knight EP, Allison TE Objective structured clinical examination from virtually anywhere! J Nurse Pract 2018;14:E157-63.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 2014.
Govaerts M, van der Vleuten CP Validity in work-based assessment: Expanding our horizons. Med Educ 2013;47:1164-74.
Patton MQ Utilization-Focused Evaluation. 4th ed. Thousand Oaks, CA: Sage Publications; 2008.
Sandelowski M Theory unmasked: The uses and guises of theory in qualitative research. Res Nurs Health 1993;16:213-8.
Sandelowski M What’s in a name? Qualitative description revisited. Res Nurs Health 2010;33:77-84.
Braun V, Clarke V Using thematic analysis in psychology. Qual Res Psychol 2006;3:77-101.
Miller GE The assessment of clinical skills/competence/performance. Acad Med 1990;65:S63-7.
Webb NM, Shavelson RJ Generalizability theory: Overview. In: Everitt BS, Howell D, editors. Encyclopedia of Statistics in Behavioural Science. Vol 2. Chichester: John Wiley & Sons Ltd; 2005. p. 717-9.
Patterson F, Lievens F, Kerrin M, Zibarras L, Carette B Designing selection systems for medicine: The importance of balancing predictive and political validity in high-stakes selection contexts. Int J Select Assess 2012;20:486-96.
[Table 1], [Table 2], [Table 3], [Table 4]