| January 24 |
Supporting
Teachers' Formative Assessment Practices: An Example Involving Science
Notebooks
Alicia Alonzo
Abstract: Literature on teachers’ formative assessment practices
has both extolled their impact on student achievement and lamented
their seeming absence in most classrooms. To realize the potential
benefits of formative assessment, teachers need support in incorporating
these practices into their classrooms. In this presentation, I will
discuss the beliefs, knowledge, and skills which have been shown to
affect teachers’ formative assessment practices – drawing on both
research literature and my recent study of science notebooks. These
are used to propose professional development experiences which may
impact teachers’ formative assessment practices. Click here to view the slides from this presentation.
top
|
| Janary 31 |
CAESL
Brown Bag Lunch:
The 2009 NAEP Science Framework
Rich Shavelson, Stanford University
Steve Schneider, Alice Fu & Mike Timms, WestEd
Abstract: This brownbag session will cover various aspects of the development of the new 2009 Science Framework for the National Assessment of Educational Progress (NAEP). Under contract to the National Assessment Governing Board, WestEd's National Center for Improving Science Education (NCISE) and the Council of Chief State School Officers conducted an 18-month process to develop the Framework involving hundreds of individuals across the country, including some of the nation's leading scientists, science educators, policymakers, and assessment experts. The Governing Board also engaged an external review panel to evaluate the draft Framework and convened a public hearing to gather additional input during the development process.
Four CAESL members will address different aspects of the development of the Framework during this session.
* Steve Schneider: The committee process and review cycles by which the NAEP Framework and Test Specifications documents were produced.
* Rich Shavelson: The assessment specifications that are in the Framework and the Test Specs documents and what is new for 2009.
* Alice Fu: The science content of the Framework, with particular reference to what is different for 2009.
* Mike Timms: The new Interactive Computer Tasks that will form a new part of the assessments from 2009.
Click here to view the handout for this presentation.
Click
here to view the slides for this presentation.
top
|
| February 7 |
BEAR
IT: Berkeley Evaluation and Assessment Research Information Technology
Cathleen Kennedy, Sevan Tutunciyan & Richard
Vorp
University of California at Berkeley
Abstract:
Come and see new information technologies under development in the
BEAR Center:
Advances in GradeMap - Software to facilitate multidimensional
item response modeling and the interpretation of longitudinal response
data. The GradeMap program accommodates the calibration of multiple
forms linked by common items and produces reports of respondent change.
We will also demonstrate reports that support the analysis of item
and person fit, alternate forms analysis, and traditional item statistics.
Standard Setting using ConstructMap - This software assists users
in the evaluation of criterion-referenced cut-points. Calibrated item
estimates and person proficiency estimates are imported into ConstructMap
and then the software is used to demonstrate the impact of selecting
alternative cut-points.
The BEAR Scoring Engine - This web-based software is called by external
applications to compute multidimensional proficiency estimates. Calibrated
item parameters and response data are sent to the Scoring Engine in
XML files via an HTTP request, the Scoring Engine computes proficiency
estimates, and then transmits the estimates back to the calling program
via an XML output file. Input response data and the returned proficiency
data files comply with IMS/QTI (Question-Test Interface) XML specifications.
We will demonstrate two applications that call the Scoring Engine
and highlight interface techniques.
Online Assessment Delivery - Preview an online system
under development that delivers an assessment, gathers responses,
and produces multidimensional proficiency reports in real time. Click here to view the slides from this presentation.
top
|
| February 21 |
Validity,
Reliability, and Responsiveness of Movement Ability Measure: Using Item
Response Models
Diane Allen University of California at Berkeley Abstract:
Instruments created to test subjective factors related to human performance
frequently lack validity because participant responses can indicate
interpretations of items that differ radically from measurers'. Item
response modeling (IRM) methods can assist in the development and
testing of instruments that retain verifiable links to their subjective
constructs and thus support using them to test theory. The purpose
of this talk is to demonstrate IRM methods used to generate and test
the Movement Ability Measure (MAM), a self-report questionnaire asking
for people's perceptions of their ability to move. The MAM was generated
to match closely the Movement Continuum Theory of physical therapy,
generated by Cott et al. (1995) and extended and operationalized for
this study. The responses of 318 adults (age range 18-101 years) provided
evidence of content, construct, and criterion validity, and an internal
consistency of .94; responses from 34 adults revealed a test-retest
reliability of .84. Wright Maps showed the strong relationship between
the theorized construct and the empirical data. A six-dimensional
model fit the data better than a unidimensional model although the
dimensions correlate highly. Results of the MAM and a 32-item self-reported
functional assessment instrument correlated at r = .76. Repeated measures
of 34 patients (age range 19-85 years) undergoing physical therapy
in outpatient clinics indicated that the MAM was responsive to intervention
after both 2 weeks (p < .00003) and at 2 months or discharge, whichever
came first (p < .00002). Correlation between these patients' and their
physical therapists' responses regarding their movement at initial
visit was moderate, at r = .68. The evidence supported the predictions
of the Movement Continuum Theory: current movement ability increased,
and the gap between current and preferred movement abilities decreased
following physical therapy for these patients with mostly orthopedic
diagnoses. Thus, the IRM methods supported generation of an instrument
closely linked to its underlying construct, and able to provide evidence
supporting the overall theory within which the construct rests. Similar
IRM methods might enhance the development and testing of additional
theories related to human performance. Click
here to view the slides for this presentation.
Click
here to view the handout from this presentation.
top
|
Thursday, March 9 |
Assessment
for e-Learning: Case Studies of an Emerging Field
Cathleen Kennedy and the
Technology and Assessment Group (TAG): Diana J. Bernbaum, Kristen Burmester, S. Veeragouder Harrell, Kathleen Scalise, and Mike
Timms UC Berkeley
Abstract: This symposium will discuss the rapidly emerging field of computer-based assessment in e-learning. In e-learning products, a variety of assessment approaches are being used for such diverse purposes as adaptive delivery of content, individualizing learning materials, dynamic feedback, cognitive diagnosis, score reporting and course placement. This symposium discusses evidence-based assessment principles in e-learning. Four case studies will be presented of e-learning products with assessment components. The products in the case studies were selected for exhibiting at least one exemplary aspect regarding assessment and measurement. The principles of the BEAR Assessment System will be used as a framework of analysis for these products with respect to key measurement principles, such as evidence identification and accumulation. Click
here to view slides for this presentation. Click
here to view additional slides for this presentation.
top
|
| March 21 |
Some Problems with Confidence Intervals
Juliet P. Shaffer
University of California at Berkeley
Abstract: Social-behavioral scientists are often warned about the defects of
hypothesis testing, and exhorted to rely instead on confidence interval
and effect size estimation. However, there are also problems unique
to confidence intervals that are much more rarely addressed. For example,
if attention is paid only to intervals not including the null value, the
confidence coverage of those intervals is often much less than the nominal
value. This phenomenon will be explained and illustrated, related issues
will be discussed, and the possible impact of the results in the educational
context will be noted.
Click
here to view slides for this presentation.
top
|
| April 5-7 |
International
Objective Measurement Workshop (IOMW)
Nathaniel Brown & Brent Duckor, Coordinators University of California at Berkeley
top
|
| April 18 |
Using
IRT in an Intelligent Tutoring System
Mike Timms, Ph.D. University of California at Berkeley Abstract:
Providing feedback, including hints, is one of the key steps in the tutoring process. However, a persistent challenge in the development of computer-based intelligent tutoring systems (ITSs) is how to determine accurately when a student needs help, and then determine what the best help is for that individual student.
In this session I will describe a study in which I investigated the feasibility of predicting students’ need for help in an ITS using Item Response Theory (IRT).
The first part of my study involved analysis of data from the PACT (Pittsburgh Advanced Cognitive Tutors) Geometry Tutor and a randomized study that compared three versions of a tutoring system that used IRT. The analysis of prior data showed that the use of hints was related to the students’ beginning ability and the size of the gap between that initial ability and the difficulty of the item.
For the second part of my study, I worked with staff from the Principled Assessment Design for Inquiry (PADI) project to develop three versions of a computer-based self-assessment system used with the Full Option Science System (FOSS) curriculum on Force and Motion. The self-assessment system, or tutor, was designed to help middle-school students to learn to solve physics problems using the equation for calculating speed.
The full version of the tutor used item response theory to give students hints appropriate to the size of their learning gap. The feedback version of the tutor provided feedback on errors that they made, but gave no hints on how to repair those errors. The limited version of the tutor gave neither error feedback nor hints, just confirming if their responses were right or wrong.
I will report on the result of the comparative study of these three versions and discuss the implications of design decisions that were made in the development process.
top
|