Humans and Machines: Modeling the Stochastic Behavior of Raters in Educational Assessment

Valid assessment of educational progress requires measurement of student competencies by means both authentically reflective of the ways students learn and reliably operationalized on a large scale. Technology has played an ever-greater role in both instruction and assessment, and automated processing of, and feedback regarding, naturally expressed student responses have matured significantly in scientific methodology and in adoption within education practice. In particular, automated scoring of student responses on large scale assessments has enabled the inclusion of more authentic, open-ended response formats, where economic costs of human scoring would otherwise preclude such item formats in favor of multiple-choice or other selected, rather than constructed, student response templates. The mix of human and machine scoring presents significant challenges in the interpretation and summarization of rating data in educational assessment. This presentation reviews the landscape, focuses on specific methodological challenges of managing unreliability and bias as well as combining information from multiple raters, and discusses practical applications of the hierarchical rater model (HRM) framework in addressing these challenges.

Richard Patz is a visiting scholar conducting research in educational measurement. He earned a doctoral degree in statistics at Carnegie Mellon University and has held a variety of scientific and executive roles in testing organizations. He recently served ACT in several capacities, including chief measurement officer and CEO of its assessment technologies subsidiary. Richard has published research on statistical methods for item response data, assessment design & development, and measurement technology.

Tuesday, February 13, 2018 - 2:00pm
PDF icon Patz presentation1.79 MB