# Peter Halpin Workshops

On behalf of the BEAR Center, please join us for a series of workshops presented by visiting faculty Peter Halpin of New York University. Professor Halpin’s research focuses on psychometrics—confirmatory factor analysis, item response theory, latent class analysis)—as well as statistical methods for complex and technology-enhanced educational assessments. His work has been published in methodological journals including Psychometrika, Structural Equation Modeling, and Multivariate Behavioral Research, as well as general interest journals including Educational Researcher.

The workshop schedule is below:

Monday, April 18

1 - 4pm

2326 Tolman

Item Design for Assessments Involving Collaboration

Peter F. Halpin (joint work with Yoav Bergner)

Overview: The use of collaboration and group work for assessment purposes has a relatively long history, and also features prominently in current initiatives concerning the measurement of ``21st century skills.'' However, fundamental questions remain about how to design assessments involving collaboration. In this workshop I'll discuss item design, focusing on various strategies for deriving collaborative "two-player" items from conventional "one-player" items. Then I'll demo an Xblock for Open edX (currently in Beta!) that allows small groups to collaborate using online chat while writing an assessment. The demo will involve workshop participants teaming up to write a short collaborative assessment, which will provide some of the data that we will analyze in later workshops.

Details: One central issue in the assessment of collaborative problem solving (CPS) is whether and how to simultaneously measure student performance in a traditional content domain, such as math or science, in conjunction with CPS. For example, the PISA 2015 CPS assessment did not evaluate content domain knowledge, but conceptualized collaboration in a general problem solving context. On the other hand, recent reforms to educational standards have often called for the incorporation of collaboration and other ``non-cognitive skills'' within existing curricula. In this workshop, I'll address the problem of designing and assessments that involve CPS but are anchored in a content domain, specifically mathematics.

To make things concrete, I'll consider the following question: How can a conventional ``one-player'' mathematics test question be adapted to a "two-player" collaborative context? In pragmatic terms, the goal is to arrive at a number of recipes for creating collaborative tasks from existing assessment materials. Clearly, this is not an ideal approach to designing group-worthy tasks. However, the reason for starting with existing assessment materials is to retain their strengths (e.g., established psychometric methods), while building towards more authentic and meaningful assessment contexts. The resulting tasks will necessarily represent a compromise between genuine group work and what we can measure well.

For example, one easy way to build a collaborative component into an existing mathematics assessment is just to change the instructions (e.g., "work with a partner") while retaining the assessment materials. A slightly more interesting task might incorporate elements of a jigsaw or hidden profile. The basic idea is that each student sees some incomplete portion of the item stimulus, and must share the information that he/she each possesses to arrive at a solution. A third task type involves students collaborating to request the information that they want to use to answer a question (e.g., in the form of hints). When hints are devised well, this invites students to co-construct the solution path. A fourth type of item involves questions with vector-valued answers. For example, instead of asking students to determine whether a line with a given slope and intercept intersects a certain point, we can turn the problem on its head by providing the point and asking one student to provide the slope and another the intercept. This type of task requires students to collaborate while providing an answer, not simply while obtaining the information used to provide an answer.

After reviewing the theory and practical implementation of each item type, I'll provide a demo of a web-based platform built on Open edX that allows for synchronous chat among small groups. Workshop participants will be invited to author their own items, and will be asked to team up to write a short collaborative assessment, which will provide some of the data that we will analyze in later workshops.

Tuesday, April 19

9am - noon

3515 Tolman

Modeling the effects of collaboration on student performance.

Peter F. Halpin (joint work with Yoav Bergner)

Overview: This workshop addresses the analysis of ``outcome data'' in assessments that involve collaboration among students. I'll start with the following question: When pairs of individuals work together on a conventional educational assessment, how does their collective performance differ from what would be expected of the two individuals working independently? I'll review past work on the study of small groups, build on this to develop an IRT-based approach, and then consider extensions to assessments that involve non-conventional item types (see workshop: "Item Design for Assessments Involving Collaboration"). After the review, we will analyze our data from the previous workshop.

Details: Perhaps the simplest method for incorporating collaboration into an existing assessment is just to change the instructions while retaining the rest of the assessment materials. Concretely, two students could be presented with one copy of a math test and instructed that their performance will be evaluated based on only what they record on the test form. From the perspective of group-worthy tasks, this is a worst case scenario. From the perspective of psychometric modeling, this is the easiest case to deal with. So I'll start with this situation and consider the implications in a standard IRT framework.

First we need to define what it means for a dyad to get a test item correct. I'll cover various scoring rules and their precedents in the literature on small groups and teamwork. Then I'll provide some definitions of successful and unsuccessful collaborative outcomes, and also some specific models of successful collaboration. Next I'll translate these models into standard IRT framework, which allows for a consideration of the implications of the different models for assessment design. Here we will be concerned with questions such as the following: In order to assess whether team A is collaborating according to model B, what type of questions should we ask them? Finally, I'll talk about how to test the various models using a likelihood ratio approach, estimate effect sizes for the effect of collaboration on test performance, and review some empirical results.

In the second part of the workshop, we will go over the finer details of the analyses using the data from our previous workshop on item design.

Wednesday, April 20

1 - 4pm

2320 Tolman

Measuring Student Engagement During Collaboration

Peter F. Halpin (joint work with Alina A. von Davier)

Overview: This workshop addresses the analysis of ``process data'' in assessments that involve collaboration among students. I’ll start by talking about how to interpret process data from a psychometric perspective, then I'll give an overview of a particular modeling framework that I’ve been working on, based on the Hawkes process. After considering the statistical set-up, we'll do some data analyses with an R-package I'm developing for the estimation of Hawkes process. Many of the details are in the attached paper, which is currently under review at JEM.

Details: I'll begin by talking about alternatives to the assumption of local independence that can be useful for defining temporally complex tasks. Then I'll decompose the statistical dependence in a collaborative performance assessment into a) a part that depends on interactions between students (inter-individual dependence), and b) an additional part that depends only on the actions of individual students considered in isolation (intra-individual dependence). This provides a general set-up for modeling inter-individual dependence in performance assessments that involve collaboration.

Next I'll provide an overview of temporal point processes, and specifically the Hawkes process as a parametric modeling framework that captures these two sources of dependence. I'll provide a review of some basic results on specification, estimation, and goodness-of-fit for the Hawkes process, but I'll keep the focus on application of the model.

In the present application, the Hawkes process is useful for inferring whether the actions of one student are associated with increased probability of further actions by his / her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators. I'll present a model-based index that can be used to quantify the level of engagement exhibited by individual team members, and show how this can be aggregated to the team level. I'll also present some preliminary results about the standard error of the proposed engagement index, which allows for considerations about how to design tasks such that engagement can be measured reliably. I'll also summarize some empirical results from pilot data.

After all that, I'll introduce an R-package that I am working on, and we can go through the source code together, talk about issues in estimation, and run some analyses.