• Users Online: 171
  • Print this page
  • Email this page


 
 
Table of Contents
FACULTY DEVELOPMENT
Year : 2018  |  Volume : 1  |  Issue : 2  |  Page : 33-35

Forty-five common rater errors in medical and health professions education


Department of Clinical Sciences, North Carolina State University, Raleigh, North Carolina, USA

Date of Web Publication7-Feb-2019

Correspondence Address:
Dr. Kenneth D Royal
Department of Clinical Sciences, North Carolina State University, Raleigh, North Carolina
USA
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/EHP.EHP_27_18

Rights and Permissions
  Abstract 


Minimizing the influence of rater errors is a persistent and considerable challenge for educators in the medical and health professions. This article presents a list of 45 common rater errors that assessors and evaluators should be cognizant of while rating performance assessments. Readers are encouraged to examine each rater error type, reflect on the extent to which s/he has previously committed each error, and identify strategies for mitigating and preventing errors in future performance assessment scenarios.

Keywords: Assessment, clinical education, evaluation, grading, medical education, performance assessment, standardized patients


How to cite this article:
Royal KD. Forty-five common rater errors in medical and health professions education. Educ Health Prof 2018;1:33-5

How to cite this URL:
Royal KD. Forty-five common rater errors in medical and health professions education. Educ Health Prof [serial online] 2018 [cited 2019 Feb 20];1:33-5. Available from: http://www.ehpjournal.com/text.asp?2018/1/2/33/251905



In the medical and health professions, raters are commonly used in both real practice and simulated settings to directly observe and evaluate an individual while performing/demonstrating a variety of skills, tasks, procedures, and/or behaviors. Using rubrics, checklists, and other instruments, raters provide scores that may be used for formative (e.g., teaching), summative (e.g., determining competency), or other (e.g., documenting clinical skills for accreditation) purposes. Score results often carry moderate to high stakes for examinees; thus, it is imperative that the scores/ratings are valid indicators of performance.

However, obtaining valid scores through performance assessments is typically much more challenging than more objective types of assessments, such as multiple choice examinations. Unlike multiple choice examinations that involve three primary sources of measurement error (instrumentation, examinees, and conditions of administration),[1] performance assessments are much more complex. More specifically, the inclusion of human raters introduces an inescapable element of subjectivity that poses an additional and significant threat to score validity. Suffice it to say, there is a minimum of four potential sources of measurement error in performance assessment scenarios: instrument, examinees, conditions of administration, and raters. Although numerous strategies are available to minimize sources of error, research has long noted that reducing rater errors is the most difficult.[2]

The purpose of this brief article is threefold. First, the author intends to bring attention to the critical issue of rater errors in performance assessments. Second, the author intends to identify and describe 45 types of rater errors that were been identified from a multidisciplinary review of the literature.[1],[3],[4] Third, it is the author's hope that this list will help raters not only become more aware of potential cognitive biases that might affect examinees' scores and distort score validity but also become better equipped to mitigate and prevent many of these errors in future performance assessments [Table 1].
Table 1: List and description of 45 common rater errors

Click here to view



  Discussion and Recommendations Top


The list presented in [Table 1] provides a sobering perspective on the challenges raters face when assigning ratings in performance assessment scenarios. Fortunately, there are some tips that can help reduce many rater errors.

First, any individual that is tasked with rating performances should have received prior training on the topic of rater inconsistencies and undergone a series of rater calibration exercises (also known as “norming”) with other raters. The purpose of these exercises is to standardize raters in such a way that no examinee will be unduly advantaged or disadvantaged as a result of being evaluated by a given rater. Persons unfamiliar with the rater calibration/norming process should consult works by Allen[5] and Maki[6] for a thorough overview. If raters have never engaged in this activity, they should immediately consult an expert in educational assessment who can help provide the requisite training and/or provide guidance on how to set up a robust rater training program.

Second, it is important to identify the type and quantity of errors that raters have committed in the past. As George Santayana famously stated, “Those who cannot remember the past are condemned to repeat it.” Therefore, raters are encouraged to review each type of error, and mark each error that she/he committed in the past. The rater should thoughtfully consider why each flagged error occurred previously and what she/he can do to avoid committing this error again in the future. Simply becoming aware of one's tendency to commit a particular error often is enough to avoid committing that same error again. Of course, some types of errors may pose a more persistent challenge.

Third, raters are encouraged to discuss errors with other individuals who also provide ratings of the same examinees. It is critical raters understand that mitigating rater errors require a combination of planning, teamwork, ongoing communication, and evaluation. Thus, raters should frequently converse with fellow raters not only to re-calibrate but also to discuss any issues such as new information or other changes that might affect one's ratings in some way. These conversations typically are particularly effective for mitigating some of the most common rater errors, such as “drift” and “fatigue,” and may help mitigate or prevent many other types of errors.

Finally, those responsible for analyzing data should become familiar with various techniques for scoring performance assessment data. Perhaps, the most common approaches to data analysis include calculating traditional summary statistics and inter-rater reliability estimates as a validity check. Although these techniques are fundamental to understanding the data, they leave much to be desired methodologically. More recently, specialized techniques such as generalizability theory[7],[8] and Many-Faceted Rasch Measurement (MFRM) modeling[9],[10] have become commonplace in high-stakes settings. Although detailed discussion of these techniques is beyond the scope of this article, readers are encouraged to learn more about these techniques as they may be useful for identifying and differentiating various sources of error, and in the case of the MFRM, producing linear measures that account for differences among facets (e.g., task difficulty and rater leniency/stringency) before calculating an examinee's score.


  Conclusion Top


Minimizing the influence of rater errors is a persistent and considerable challenge for educators in the medical and health professions. Readers also are encouraged to remain cognizant of rater errors and do their best to ensure minimal error emanating from subjective elements manifest in examinees' scores. To help accomplish this goal, a list of rater errors believed to be the most comprehensive ever assembled was presented. Readers are encouraged to examine each rater error type, reflect on the extent to which she/he has previously committed each error, and identify strategies for mitigating and preventing errors in future performance assessment scenarios.

Financial support and sponsorship

Nil.

Conflicts of interest

Dr. Royal is the editor-in-chief of Education in the Health Professions. All peer-review activities relating to this manuscript were independently performed by other members of the editorial board.



 
  References Top

1.
Royal KD, Hecker KG. Rater errors in clinical performance assessments. J Vet Med Educ 2016;43:5-8.  Back to cited text no. 1
    
2.
Linacre JM. Many-Facet Rasch Measurement. Chicago, IL: MESA Press; 1989.  Back to cited text no. 2
    
3.
Johnson RL, Penny JA, Gordon B. Assessing Performance: Developing, Scoring, and Validating Performance Tasks. New York: Guilford Press; 2009.  Back to cited text no. 3
    
4.
Wesolowski BC, Wind SA, Engelhard G. Rater fairness in music performance assessment: Evaluating model data fit and differential rater functioning. Music Sci 2015;19:147-70.  Back to cited text no. 4
    
5.
Allen M. Assessing Academic Programs in Higher Education. San Francisco: Jossey-Bass; 2004.  Back to cited text no. 5
    
6.
Maki P. Assessing for Learning: Building a Sustainable Commitment across the Institution. Sterling: Stylus Publishing; 2004.  Back to cited text no. 6
    
7.
Brennan RL. Generalizability Theory. New York: Springer- Verlag; 2001.  Back to cited text no. 7
    
8.
Chiu CW. Scoring Performance Assessments Based on Judgements: Generalizability Theory. New York: Kluwer; 2001.  Back to cited text no. 8
    
9.
Linacre JM, Engelhard G, Tatum DS, Myford CM. Measurement with judges: Many-faceted conjoint measurement. Int J Educ Res 1994;21:569-77.  Back to cited text no. 9
    
10.
Lunz ME, Schumacker RE. Scoring and analysis of performance examinations: A comparison of methods and interpretations. J Appl Meas 1997;1:219-38.  Back to cited text no. 10
    



 
 
    Tables

  [Table 1]



 

Top
 
  Search
 
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

 
  In this article
Abstract
Discussion and R...
Conclusion
References
Article Tables

 Article Access Statistics
    Viewed137    
    Printed13    
    Emailed0    
    PDF Downloaded33    
    Comments [Add]    

Recommend this journal