BEAR Seminars BEAR Projects BEAR Publications BEAR Portal
Berkeley Evaluation & Assessment Research  Center Director: Mark Wilson
Measurement Journal Books and Papers by BEAR Authors Contacts GSE
Convener: Carolyn Hofstetter
Coordinator: Deborah Peres

|Archive of Past Seminars|Current Seminars|

BEAR Seminar, Spring 2005

Unless otherwise specified, events take place on Tuesdays, from 2-4 PM at:
UC Berkeley, Graduate School of Education
3507 Tolman Hall

Date Speaker Title (click for abstract)
Feb 1 Derek Briggs, University of Colorado
SPECIAL TIME & ROOM: 12:30 - 1:45 in 5509 Tolman
Generalizability in Item Response Modeling
Feb 15 Kathleen Scalise, UCB

BEAR CAT: Toward a Theoretical Basis for Dynamically Driven Content in Computer-Mediated Environments

March 1 Diane Allen Item Response Modeling in Behavioral Research
March 15 Linda Woodward, UCB
Kathy Long,UCB
Assessing Science Knowledge (ASK)
March 29
SPECIAL ROOM: 2515
Dylan Wiliam, ETS Formative Assessment and the Regulation of Learning
Apr 26
SPECIAL ROOM: 2515
Joan Herman, UCLA
Steve Schneider, WestEd
Mark Wilson, UCB
Carlos Ayala, Sonoma State
Building Science Assessment Systems That Serve Accountability and Student Learning: The CAESL Model
May 3
SPECIAL ROOM: 2515
Tom Gumpel, Virginia Commonwealth U.
Parisa Muller, UCB
Bullies and their victims: The measurement of peer aggression and victimization
May 10

Yiyu Xie,UCB

Three Studies of Person by Item Interactions in International Achievement Tests
May 17
Room 2515 Tolman
Jose Felipe Martinez, UCLA

A Multilevel Study of the Effects of Opportunity to Learn (OTL) on Student Reading Achievement: Issues of Measurement, Equity, and Validity


Feb 1 SPECIAL TIME: 12:30 - 1:45 Generalizability in Item Response Modeling

Derek Briggs
University of Colorado at Boulder

The process of measuring a latent variable is a mix of development and design. Item Response Theory (IRT) is a useful tool for the analysis and scaling of test facets. Item response models can help test developers in characterizing the items, raters or forms to be included in a measurement instrument. Item response modeling is typically less useful in addressing design questions pertaining to the relative variability of facets of test design. For such decisions, articularly in the context of multi-faceted measurement instruments, Generalizability Theory (GT) can play an important role.

As theoretical models, the two approaches seem, on the surface at least, incompatible. Brennan (2001), for example, writes "Generalizability Theory is primarily a sampling model, whereas Item Response Theory is principally a scaling model." Nonetheless, because each approach can provide information fundamental to the development and design of measurement instruments, one would expect to see IRT and GT applied in tandem, both in large-scale testing and smaller scale efforts (see, for example, Bock, Brennan & Muraki, 2002). In practice, the use of IRT alone seems more prevalent in the measurement literature than the sequential use of both IRT and GT. When the variability of the different facets in a multi-faceted test is not explicitly taken into account, use of IRT alone may lead to certain issues being less well examined than would be desirable, and of course, this is particularly important when high stakes decisions are being made.

In this presentation we discuss an approach we call Generalizability in Item Response Modeling (GIRM). The foundation for this approach was first supplied in unpublished work by Michael Kolen and Deborah Harris. The GIRM approach essentially incorporates the sampling model of GT into the scaling model of IRT by making distributional assumptions about the relevant measurement facets. By specifying a random effects measurement model, and taking advantage of the flexibility of Markov Chain Monte Carlo (MCMC) estimation methods, it becomes possible to estimate GT variance components concurrently with traditional IRT parameters. It is shown how GT and IRT can be linked together, in the context of a single facet measurement design with binary items, and a multi-faceted design with polytomous items. Using simulated data and the software WinBUGS, the GIRM approach is shown to produce results comparable to those from a standard GT analysis, but with certain advantages due to the incorporation of the IRT formulation.

To see the full paper on which this talk was based, look at: http://spot.colorado.edu/~briggsd/GIRM_BriggsWilson_020105.pdf

top

Feb 15 BEAR CAT: Toward a Theoretical Basis for Dynamically Driven Content in Computer-Mediated Environments

Kathy Scalise
UC, Berkeley

This presentation will describe a new computer adaptive assessment approach for data driven content (DDC) in the UC Berkeley "Smart Homework" implementation of ChemQuery, an NSF-funded assessment project, used here for autonomous learning. The intent of dynamic learning with data driven content (DDC) in computer-mediated learning environments is to interactively adapt the flow of content so that each student receives individualized learning materials and interventions more suited to their needs than in traditional one-size-fits-all applications. In research presented in this paper, measurement technologies similar to some models underlying computer-adaptive testing approaches (CAT) are used to map knowledge spaces and drive computer-mediated learning environments with DDC. BEAR (Berkeley Evaluation and Assessment Research) extensions to CAT will be presented, which may direct the flow and difficulty not only of assessments but also of other e-learning materials and feedback to tailor the learning experience to student needs. A measurement model, the iota model, is introduced and tested as a multifacet Rasch model to estimate "pathway" parameters through BEAR CAT testlets. Testlets are small bundles of items that act as questions and follow-up probes to interactively measure and assign scores to students. The function of the bundle measurement models applied is mathematically equivalent to the semi-linear neural net model. Research questions consider whether the iota model can serve as a valid and reliable item design to collect data and implement interactions in data-driven content, whether path scores through the testlet modeled to a cognitive framework can be considered equivalent, and how three testlet designs compare in fit and other measurement considerations.

View the Powerpoint slides from this presentation.

top

March 1 Item Response Modeling in Behavioral Research

Diane Allen
UC,Berkeley

The project, related to the National Cancer Institute, is to show how item response modeling (IRM) techniques might be of use to behavioral scientists, since little IRM has been applied in this area to date. Patient-reported attitudes and outcomes are of particular interest to behavioral scientists, especially when trying to determine the mediating or moderating effects of self-efficacy, self-regulation of motivation, and decisional balance, for example, on behavior change that occurs after intervention. The Institute provided the data sets and a few references regarding the instruments used, and some ideas about what analyses might be of interest. Since the data were all collected prior to the involvement of IRM analysts, exploration of the data in various configurations was required, and some less than ideal circumstances had to be acknowledged (e.g., wording changes within items across sites). The intent of this project was to determine which data sets best illustrated the principles and benefits of IRM techniques, and report the analyses in a way that could be well understood by an audience with little IRM experience. Three presentations at a conference in June, 2004, and several papers have resulted. The presentations and papers introduce IRM using the NCI data, and demonstrate, for example, instrument evaluation, multidimensional analysis, equating, and DIF analyses.

View the Powerpoint slides from this presentation.

top

March 15 Assessing Science Knowledge (ASK)

Linda Woodward, BEAR Center
Kathy Long, Lawrence Hall of Science

The Assessing Science Knowledge (ASK) project is designed to define, field test, and validate effective assessment tools and techniques to be used by grade 3 to 6 classroom teachers to assess, guide, and confirm student learning in science. The assessments are being conceptualized, developed, and refined using one exemplary science-education program, the Full Option Science System (FOSS).

During the first two years of the project, the BEAR Center is helping with the development and refinement of the frameworks, progress maps, items, scoring guides, and other elements of the system. We are providing guidance and support in the psychometric data analysis and, where appropriate, also performing such analysis and assisting in the interpretation of the results.

top

March 29
SPECIAL ROOM: 2515
Formative Assessment and the Regulation of Learning

Dylan Wiliam, ETS

In this seminar, Dylan Wiliam, Director of the Learning and Teaching Research Center at ETS, will discuss the role that assessment can play in supporting learning, and as a powerful catalyst in promoting effective teacher professional development. This will entail discussion of the nature of teacher expertise, the failure of much educational research to impact practice, and an embedding of work of formative assessment within the wider theoretical field of the regulation of learning, as proposed by Perrenoud.

View the Powerpoint slides from this presentation.
Look at papers available for downloading on Dylan Wiliam's web site .

top

April 26
SPECIAL ROOM: 2515
Building Science Assessment Systems That Serve Accountability and Student Learning: The CAESL Model

Carlos Ayala, Sonoma State
Joan Herman, UCLA
Steve Schneider, WestEd
Mark Wilson, UCB

The No Child Left Behind legislation offers both great opportunity and substantial peril for science education in the United States. By requiring that science join reading and mathematics as annual subjects of states’ standards-based assessment by the year 2007, NCLB pushes science to the center stage of public attention and helps to assure that it gets the priority it deserves in school curriculum. With the right kinds of assessments, NCLB can help to promote more effective science teaching and learning that well prepares students for success in the 21st Century, as envisioned by the National Science Education Standards (1995).

Yet history shows many examples of good intentions gone awry, and considerable evidence shows the limitations of accountability assessments in supporting meaningful reform goals and informing classroom practice (Herman, 2004). There is danger of state science assessments pushing teaching and learning in undesirable directions, counterproductive to the goals of scientific literacy. Moreover, if we do not extend the state of the art, we may miss the opportunity to have science assessments that truly support classroom teaching and learning, formative assessments that represent a known and powerful strategy for improving student learning (Bell & Cowie, 2001; Black & Wiliam, 1998; 2001a; Shepard, 2000).

The national Center for the Assessment and Evaluation of Student Learning (CAESL) is committed to the design of new assessment systems that meet this challenge. Building on current knowledge about the nature of quality assessment, the shortcomings of existing systems and the uniqueness of science as a discipline, the CAESL Assessment model takes as its pinnacle specific goals for student learning. These not only serve as the common focal points for state, district and/or school, and classroom-embedded assessments, but also are the basis for a consistent system for measuring progress across all levels.

Symposium presentations present background on the CAESL model, operationalize its key innovative features, and present the results of a feasibility study conducted in middle school science, addressing the following research questions:
• How can cognitive theory be used to systematically link assessments to the core types of learning that constitute science?
• How can psychometrics advances in progress variables and link tests be used to provide greater coherence across national/state standards, large-scale assessments and classroom assessments?

You can view the slides and papers from this presentation on the CAESL website: http://www.edgateway.net/cs/caesl/print/docs/558

top

May 3
SPECIAL ROOM: 2515
Bullies and their victims: The measurement of peer aggression and victimization

Tom Gumpel, Virginia Commonwealth U.
Parisa Muller, UCB

In this talk, we discuss the development and validation of the School Violence Inventory (SVI) and its use to understand participant roles in school aggression/bullying and victimization through the discussion of five different studies. Following an extensive development phase which delineated a 3 x 3 conceptual model of school-based aggression, the self-report SVI was created. The instrument focuses on physical, relational, and sexual aggression as well as physical, relational, and sexual victimization and provides a profile for each respondent on each of these six dimensions and has also been used to measure treatment effectiveness of school violence interventions.

In the first study, middle and high school students in Israel (N = 10,383) completed the SVI and were designated as uninvolved, pure-aggressors, pure-victims, and mixed aggressor-victims for physical, relational, and sexual aggression and physical, relational, and sexual victimization. Between-group comparisons showed a main effect for grade level for all types of aggression, with children in 10th and 11th grades showing the highest levels of all six types of aggression/victimization. Multiple hierarchical regressions showed different trajectories for each of the four participant roles. In the second study (N = 1004), special education status was examined for each of the six dimensions of the SVI. Children with ADHD are significantly more involved in both aggression and victimization than their peers with and without learning disabilities. In the third study (N = 1398), clinical depression and post-traumatic stress disorder (PTSD) were examined. Physical and relational victimization was significantly associated with all types of PTSD symptoms, depressive symptoms, and disruptions in social relationships, after accounting for the contributions of school, gender, and grade. PTSD symptoms partially mediate the association of physical victimization and both depressive symptoms and disruptions in social relationships; PTSD symptoms do not mediate the association of relational victimization and these outcomes. The fourth study (N = 3950) examined the relationships between being an extreme bully and psychopathy symptoms. Adolescents exhibiting psychopathic tendencies tended to be uninvolved in extreme school aggression. However, extreme sexual aggression did predict psychopathy. The fifth study (N = 6741) examined the influence of respondent characteristics vis-ŕ-vis school influences using a two-level HLM model. For all types of aggression, significantly higher amounts of SVI variance are explained on a school-wide level than on an individual level, bringing into view a dispositional vs. situational understanding of school based violence. Implications for future research on both psychometric and substantive issues in the measurement of school based violence and victimization are discussed.

top

May 10 Three Studies of Person by Item Interactions in International Achievement Tests

Yiyu Xie
UC,Berkeley

This presentation describes three independent investigations of interactions between persons and items associated with cultural differences in international assessments of educational achievement. It uses some modified item response models to account for influential elements that are valued differently among nations. The research tackles three specific issues: (1) whether using the measurement unit (imperial versus metric) in the mathematics assessment has an impact on American students’ performances; (2) is the tendency to guess in the assessments that adopt the multiple-choice item format constant across nations; and (3) how to make substantive interpretation of the items that exhibit differential item functioning (DIF). Each investigation has its own emphasis, varying from test design to model comparison. Together they use a wide range of international assessment data, including data for selected countries from the Third International Mathematics and Science Study (TIMSS) and Program for International Student Assessment (PISA). The research also employs a variety of computer programs, i.e., ConQuest, WinBUGS, and the NLMIXED procedure in SAS; all the programs have the flexibility of customizing the item response models for many test situations. All three studies indicate that the interactions between the examinees and items may vary systematically from country to country, and the item response models need to be carefully selected for the purpose of comparison in the international assessments.

top

May 17
Room 2515 Tolman
A Multilevel Study of the Effects of Opportunity to Learn (OTL) on Student Reading Achievement: Issues of Measurement, Equity, and Validity

Jose Felipe Martinez
UCLA

The study employed three-level hierarchical linear modeling techniques to investigate the effects of opportunities to learn (OTL) reading and writing on student achievement. Three questions are the focus of the study. One, how are educational opportunities distributed across classrooms and schools? Two, are there differences between measures of student achievement in terms of their sensitivity to educational opportunities or instructional practices? And finally, what are the consequences of ignoring the classroom level and using two-level models in analyses of student achievement—in terms of the effects of the school environment and opportunities to learn on student achievement?

Students and teacher reports of OTL both reveal large differences between classrooms in terms of the OTL offered to students, but only minimal differences between schools. Results also indicate that within-classroom variability in student OTL reports is not simply the result of measurement error but also reflects true differences between students’ educational experiences. Student OTL reports are more powerful as predictors of student performance than global classroom reports provided by the teacher and constitute a useful source of information for the researcher investigating OTL effects. Results from three-level models indicate that the classroom environment is at least as important a determinant of student achievement, as the larger school environment. Using a two-level model that ignores classroom nesting results in inflated estimates of the extent of direct school effects on student achievement. At the same time, however, the model underestimates the overall effect the educational system can have on the achievement of its students (the combined effect of classrooms and schools). In addition, the effects of OTL (the OTL slopes) vary considerably across classrooms but remain relatively stable across schools; this suggests that the classroom environment largely determines the extent to which students benefit from additional educational opportunities. Finally, contextual effects often discussed in the literature at the school level are instead observed at the classroom level when a three-level model is used. Two level models therefore not only underestimate the total effects of schools, but present a distorted picture of the mechanisms through which schools influence student achievement.

View the Powerpoint slides from this presentation.

top

Thank You for visiting, come back soon!

BEAR Center
Graduate School of Education
University of California, Berkeley
Berkeley, CA 94720

© 2002-2008 BEAR Center