|
||||||||
![]() |
Berkeley Evaluation & Assessment Research Center | Director: Mark Wilson | ||||||
|
||||||||
|Archive of Past Seminars|Current Seminars|
Unless otherwise specified, events take place on Tuesdays,
from 2-4 PM at:
Yiyu Xie,UCB Derek Briggs The process of measuring
a latent variable is a mix of development and design. Item Response Theory
(IRT) is a useful tool for the analysis and scaling of test facets. Item
response models can help test developers in characterizing the items, raters
or forms to be included in a measurement instrument. Item response modeling
is typically less useful in addressing design questions pertaining to the
relative variability of facets of test design. For such decisions,
articularly in the context of multi-faceted measurement instruments,
Generalizability Theory (GT) can play an important role.
As theoretical models, the two approaches seem, on the surface at least,
incompatible. Brennan (2001), for example, writes "Generalizability Theory
is primarily a sampling model, whereas Item Response Theory is principally
a scaling model." Nonetheless, because each approach can provide information
fundamental to the development and design of measurement instruments, one
would expect to see IRT and GT applied in tandem, both in large-scale
testing and smaller scale efforts (see, for example, Bock, Brennan & Muraki,
2002). In practice, the use of IRT alone seems more prevalent in the
measurement literature than the sequential use of both IRT and GT. When
the variability of the different facets in a multi-faceted test is not
explicitly taken into account, use of IRT alone may lead to certain issues
being less well examined than would be desirable, and of course, this is
particularly important when high stakes decisions are being made. In this presentation we discuss an approach we call Generalizability
in Item Response Modeling (GIRM). The foundation for this approach was
first supplied in unpublished work by Michael Kolen and Deborah Harris.
The GIRM approach essentially incorporates the sampling model of
GT into the scaling model of
IRT by making distributional assumptions about the relevant measurement
facets. By specifying a random effects measurement model, and taking
advantage of the flexibility of Markov Chain Monte Carlo (MCMC) estimation
methods, it becomes possible to estimate GT variance components concurrently
with traditional IRT parameters. It is shown how GT and IRT can be linked
together, in the context of a single facet measurement design with binary
items, and a multi-faceted design with polytomous items. Using simulated
data and the software WinBUGS, the GIRM approach is shown to produce results
comparable to those from a standard GT analysis, but with certain advantages
due to the incorporation of the IRT formulation.
To see the full paper on which this talk was based, look at:
http://spot.colorado.edu/~briggsd/GIRM_BriggsWilson_020105.pdf
Kathy Scalise
This presentation will describe a new computer adaptive assessment approach
for data driven content (DDC) in the UC Berkeley "Smart Homework"
implementation of ChemQuery, an NSF-funded assessment project, used here for
autonomous learning. The intent of dynamic learning with data driven content
(DDC) in computer-mediated learning environments is to interactively adapt
the flow of content so that each student receives individualized learning
materials and interventions more suited to their needs than in traditional
one-size-fits-all applications. In research presented in this paper,
measurement technologies similar to some models underlying computer-adaptive
testing approaches (CAT) are used to map knowledge spaces and drive
computer-mediated learning environments with DDC. BEAR (Berkeley Evaluation
and Assessment Research) extensions to CAT will be presented, which may
direct the flow and difficulty not only of assessments but also of other
e-learning materials and feedback to tailor the learning experience to
student needs. A measurement model, the iota model, is introduced and tested
as a multifacet Rasch model to estimate "pathway" parameters through BEAR
CAT testlets. Testlets are small bundles of items that act as questions and
follow-up probes to interactively measure and assign scores to students. The
function of the bundle measurement models applied is mathematically
equivalent to the semi-linear neural net model. Research questions consider
whether the iota model can serve as a valid and reliable item design to
collect data and implement interactions in data-driven content, whether path
scores through the testlet modeled to a cognitive framework can be
considered equivalent, and how three testlet designs compare in fit and
other measurement considerations.
Diane Allen
The project, related to the National Cancer Institute, is to show how item response
modeling (IRM) techniques might be of use to behavioral scientists, since little IRM has
been applied in this area to date. Patient-reported attitudes and outcomes are of
particular interest to behavioral scientists, especially when trying to determine the
mediating or moderating effects of self-efficacy, self-regulation of motivation, and
decisional balance, for example, on behavior change that occurs after intervention.
The Institute provided the data sets and a few references regarding the instruments
used, and some ideas about what analyses might be of interest. Since the data were all
collected prior to the involvement of IRM analysts, exploration of the data in various
configurations was required, and some less than ideal circumstances had to be
acknowledged (e.g., wording changes within items across sites). The intent of this
project was to determine which data sets best illustrated the principles and benefits
of IRM techniques, and report the analyses in a way that could be well understood by
an audience with little IRM experience. Three presentations at a conference in
June, 2004, and several papers have resulted. The presentations and
papers introduce IRM using the NCI data, and demonstrate, for example, instrument
evaluation, multidimensional analysis, equating, and DIF analyses.
Linda Woodward, BEAR Center
The Assessing Science Knowledge (ASK) project is designed to define, field test, and validate effective assessment tools and techniques to be used by grade 3 to 6 classroom teachers to assess, guide, and confirm student learning in science. The assessments are being conceptualized, developed, and refined using one exemplary science-education program, the Full Option Science System (FOSS).
During the first two years of the project, the BEAR Center is helping with the development and refinement of the frameworks, progress maps, items, scoring guides, and other elements of the system. We are providing guidance and support in the psychometric data analysis and, where appropriate, also performing such analysis and assisting in the interpretation of the results.
Dylan Wiliam, ETS
In this seminar, Dylan Wiliam, Director of the Learning and Teaching Research
Center at ETS, will discuss the role that assessment can play in supporting
learning, and as a powerful catalyst in promoting effective teacher
professional development. This will entail discussion of the nature of
teacher expertise, the failure of much educational research to impact
practice, and an embedding of work of formative assessment within the wider
theoretical field of the regulation of learning, as proposed by Perrenoud.
Carlos Ayala, Sonoma State
The No Child Left Behind legislation offers both great opportunity and substantial peril for science education in the United States. By requiring that science join reading and mathematics as annual subjects of states’ standards-based assessment by the year 2007, NCLB pushes science to the center stage of public attention and helps to assure that it gets the priority it deserves in school curriculum. With the right kinds of assessments, NCLB can help to promote more effective science teaching and learning that well prepares students for success in the 21st Century, as envisioned by the National Science Education Standards (1995).
Yet history shows many examples of good intentions gone awry, and considerable evidence shows the limitations of accountability assessments in supporting meaningful reform goals and informing classroom practice (Herman, 2004). There is danger of state science assessments pushing teaching and learning in undesirable directions, counterproductive to the goals of scientific literacy. Moreover, if we do not extend the state of the art, we may miss the opportunity to have science assessments that truly support classroom teaching and learning, formative assessments that represent a known and powerful strategy for improving student learning (Bell & Cowie, 2001; Black & Wiliam, 1998; 2001a; Shepard, 2000).
The national Center for the Assessment and Evaluation of Student Learning (CAESL) is committed to the design of new assessment systems that meet this challenge. Building on current knowledge about the nature of quality assessment, the shortcomings of existing systems and the uniqueness of science as a discipline, the CAESL Assessment model takes as its pinnacle specific goals for student learning. These not only serve as the common focal points for state, district and/or school, and classroom-embedded assessments, but also are the basis for a consistent system for measuring progress across all levels.
Symposium presentations present background on the CAESL model, operationalize its key innovative features, and present the results of a feasibility study conducted in middle school science, addressing the following research questions:
You can view the slides and papers from this presentation
on the CAESL website:
http://www.edgateway.net/cs/caesl/print/docs/558
Tom Gumpel, Virginia Commonwealth U.
In this talk, we discuss the development and validation of the School Violence Inventory (SVI) and its use to understand participant roles in school aggression/bullying and victimization through the discussion of five different studies. Following an extensive development phase which delineated a 3 x 3 conceptual model of school-based aggression, the self-report SVI was created. The instrument focuses on physical, relational, and sexual aggression as well as physical, relational, and sexual victimization and provides a profile for each respondent on each of these six dimensions and has also been used to measure treatment effectiveness of school violence interventions.
In the first study, middle and high school students in Israel (N = 10,383) completed the SVI and were designated as uninvolved, pure-aggressors, pure-victims, and mixed aggressor-victims for physical, relational, and sexual aggression and physical, relational, and sexual victimization. Between-group comparisons showed a main effect for grade level for all types of aggression, with children in 10th and 11th grades showing the highest levels of all six types of aggression/victimization. Multiple hierarchical regressions showed different trajectories for each of the four participant roles. In the second study (N = 1004), special education status was examined for each of the six dimensions of the SVI. Children with ADHD are significantly more involved in both aggression and victimization than their peers with and without learning disabilities. In the third study (N = 1398), clinical depression and post-traumatic stress disorder (PTSD) were examined. Physical and relational victimization was significantly associated with all types of PTSD symptoms, depressive symptoms, and disruptions in social relationships, after accounting for the contributions of school, gender, and grade. PTSD symptoms partially mediate the association of physical victimization and both depressive symptoms and disruptions in social relationships; PTSD symptoms do not mediate the association of relational victimization and these outcomes. The fourth study (N = 3950) examined the relationships between being an extreme bully and psychopathy symptoms. Adolescents exhibiting psychopathic tendencies tended to be uninvolved in extreme school aggression. However, extreme sexual aggression did predict psychopathy. The fifth study (N = 6741) examined the influence of respondent characteristics vis-ŕ-vis school influences using a two-level HLM model. For all types of aggression, significantly higher amounts of SVI variance are explained on a school-wide level than on an individual level, bringing into view a dispositional vs. situational understanding of school based violence. Implications for future research on both psychometric and substantive issues in the measurement of school based violence and victimization are discussed.
Yiyu Xie
This presentation describes three independent investigations of interactions between persons and items associated with cultural differences in international assessments of educational achievement. It uses some modified item response models to account for influential elements that are valued differently among nations. The research tackles three specific issues: (1) whether using the measurement unit (imperial versus metric) in the mathematics assessment has an impact on American students’ performances; (2) is the tendency to guess in the assessments that adopt the multiple-choice item format constant across nations; and (3) how to make substantive interpretation of the items that exhibit differential item functioning (DIF). Each investigation has its own emphasis, varying from test design to model comparison. Together they use a wide range of international assessment data, including data for selected countries from the Third International Mathematics and Science Study (TIMSS) and Program for International Student Assessment (PISA). The research also employs a variety of computer programs, i.e., ConQuest, WinBUGS, and the NLMIXED procedure in SAS; all the programs have the flexibility of customizing the item response models for many test situations. All three studies indicate that the interactions between the examinees and items may vary systematically from country to country, and the item response models need to be carefully selected for the purpose of comparison in the international assessments.
Jose Felipe Martinez
The study employed three-level hierarchical linear modeling techniques to investigate the effects of opportunities to learn (OTL) reading and writing on student achievement. Three questions are the focus of the study.
One, how are educational opportunities distributed across classrooms and schools?
Two, are there differences between measures of student achievement in terms of their sensitivity to educational opportunities or instructional practices?
And finally, what are the consequences of ignoring the classroom level and using two-level models in analyses of student achievement—in terms of the effects of the school environment and opportunities to learn on student achievement?
Students and teacher reports of OTL both reveal large differences between classrooms in terms of the OTL offered to students, but only minimal differences between schools. Results also indicate that within-classroom variability in student OTL reports is not simply the result of measurement error but also reflects true differences between students’ educational experiences. Student OTL reports are more powerful as predictors of student performance than global classroom reports provided by the teacher and constitute a useful source of information for the researcher investigating OTL effects. Results from three-level models indicate that the classroom environment is at least as important a determinant of student achievement, as the larger school environment. Using a two-level model that ignores classroom nesting results in inflated estimates of the extent of direct school effects on student achievement. At the same time, however, the model underestimates the overall effect the educational system can have on the achievement of its students (the combined effect of classrooms and schools). In addition, the effects of OTL (the OTL slopes) vary considerably across classrooms but remain relatively stable across schools; this suggests that the classroom environment largely determines the extent to which students benefit from additional educational opportunities. Finally, contextual effects often discussed in the literature at the school level are instead observed at the classroom level when a three-level model is used. Two level models therefore not only underestimate the total effects of schools, but present a distorted picture of the mechanisms through which schools influence student achievement. Thank You for visiting, come back soon!
UC Berkeley, Graduate School of Education
3507 Tolman Hall
Date
Speaker
Title (click for abstract)
Feb 1
Derek Briggs, University of Colorado
SPECIAL TIME & ROOM: 12:30 - 1:45 in 5509 Tolman
Generalizability in
Item Response Modeling
Feb 15
Kathleen Scalise, UCB
March 1
Diane Allen
Item Response Modeling in Behavioral Research
March 15
Linda Woodward, UCB
Kathy Long,UCB
Assessing Science Knowledge (ASK)
March 29
SPECIAL ROOM: 2515 Dylan Wiliam, ETS
Formative Assessment and the Regulation of Learning
Apr 26
SPECIAL ROOM: 2515Joan Herman, UCLA
Steve Schneider, WestEd
Mark Wilson, UCB
Carlos Ayala, Sonoma State Building Science Assessment
Systems That Serve Accountability and Student Learning: The CAESL Model
May 3
SPECIAL ROOM: 2515 Tom Gumpel, Virginia Commonwealth U.
Parisa Muller, UCB
Bullies and their victims: The measurement of peer aggression and victimization
May 10
Three Studies of Person by Item Interactions in International Achievement Tests
May 17
Room 2515 Tolman
Jose Felipe Martinez, UCLA
Feb 1 SPECIAL TIME: 12:30 - 1:45
Generalizability in Item Response Modeling
University of Colorado at Boulder
Feb 15
BEAR CAT:
Toward a Theoretical Basis for Dynamically Driven Content
in Computer-Mediated Environments
UC, Berkeley
View the Powerpoint slides from this presentation.
March 1
Item Response Modeling in Behavioral Research
UC,Berkeley
View the Powerpoint slides from this presentation.
March
15
Assessing Science Knowledge (ASK)
Kathy Long, Lawrence Hall of Science
March 29
SPECIAL ROOM: 2515
Formative Assessment and the Regulation of Learning
View the Powerpoint slides from this presentation.
Look at papers available for downloading on
Dylan Wiliam's web site
.
April 26
SPECIAL ROOM: 2515
Building Science Assessment
Systems That Serve Accountability and Student Learning: The CAESL Model
Joan Herman, UCLA
Steve Schneider, WestEd
Mark Wilson, UCB
• How can cognitive theory be used to systematically link assessments to the core types of learning that constitute science?
• How can psychometrics advances in progress variables and link tests be used to provide greater coherence across national/state standards, large-scale assessments and classroom assessments?
May 3
SPECIAL ROOM: 2515
Bullies and their victims: The measurement of peer aggression and victimization
Parisa Muller, UCB
May 10
Three Studies of Person by Item Interactions in International Achievement Tests
UC,Berkeley
May 17
Room 2515 Tolman
A Multilevel Study of the Effects of Opportunity to Learn (OTL) on Student Reading Achievement: Issues of Measurement, Equity, and Validity
UCLA
BEAR
Center © 2002-2008 BEAR Center |