The Berkeley Evaluation and
Assessment Research (BEAR) Center coordinates several seminars designed to
provide a forum for researchers to share cutting-edge findings and to prompt
congenial discussion of educational assessment and evaluation topics.
| Sep. 8 |
Evaluating Full-day vs. Half-day Kindergarten via Marginal Structural Models
Ed Bein, Ph.D., UC Berkeley
This talk reports on an analysis of the Early Childhood Longitudinal Survey - Kindergarten Cohort (ECLS-K) to assess the relative impact of full-day vs. halfday kindergarten on short- and longer-term reading achievement. The analysis is based on the counterfactual or potential outcomes model of the
effects of interventions and policies, and utilizes a biostatistical approach - marginal structural models - based on this model. The talk will present the basic ideas of the counterfactual outcomes model and consider the benefits and challenges of applying it to the analysis of the ECLS-K.
top
|
Sep. 22
|
Developing an Integrated Assessment System (DIAS) for Teacher Education
Heeju Jang, Amy Dray, Xiaoting Huang and Mark Wilson
UC Berkeley
In recent years, there has been a strong emphasis in U.S. educational policy on the importance of teachers’ content or subject matter knowledge to quality teaching. However, traditional notions of “content knowledge” do not properly capture the nature of the subject matter knowledge needed for teaching (Shulman, 1986, 1987; Wilson, Shulman, & Richert, 1987; Ball & Bass, 2003). Moreover, traditional assessment, such as PRAXIS or other existing observational protocols, is inadequate to evaluate the complex and multidimensional nature of teaching practice. This talk introduces a multi-year research collaboration between the Berkeley Evaluation and Assessment Research center and the University of Michigan’s School of Education that aims to develop, implement, and evaluate a prototype for an integrated assessment system in elementary mathematics teacher education. The goals for the assessment system focus on teaching practice grounded in professional and disciplinary knowledge, traces teaching practice as it develops over time across multiple contexts, addresses multiple purposes of a broad array of stakeholders; and creates the foundation for programmatic coherence and professional development of those who work with beginning teachers.
top
|
| Oct. 6 |
What Have We Learned from Two National Evaluations of Reading First?
Beth C. Gamse, Abt Associates
The No Child Left Behind Act of 2001 (PL 107-110) established the Reading First Program (Title I,
Part B, Subpart 1), a major federal initiative to ensure that all children can read at or above grade
level by the end of third grade. With the exception of almost $12 billion in Title I funding, which
supports general reading activities for children in low-income schools, Reading First is substantially
larger, more ambitious and more explicit in its guidance than any previous school-based early literacy
initiative ever undertaken in the U.S.
In 2003, the U.S. Department of Education launched two major evaluations of the Reading First
program: The Reading First Implementation Evaluation and the Reading First Impact Study. The
Reading First Implementation Evaluation was designed to describe how Reading First (RF) was
implemented nationally, using a large, nationally representative sample of RF schools. The study
also included a nationally representative sample of Title I School-Wide Project schools that were not
implementing the program, to provide a national context for the description of the reading programs in RF schools. The primary strength of this study comes from its nationally representative sample,
giving the results strong external validity and generalizability. As with most quasi-experimental
studies, its primary weakness is one of internal validity, such that we are unable to causally attribute
any positive findings to the program.
The Reading First Impact Study, on the other hand, used a regression discontinuity design, a rigorous
quasi-experimental design that, under certain conditions, can produce unbiased estimates of program
impacts. The study design provides strong internal validity, allowing us to make causal claims about
the impact of the program. Its primary weakness is that the study sample was constrained to only
include districts and states (or sites) that strictly adhered to an objective rating system when awarding
RF funds. The purposively selected study sample, therefore, does not provide external validity for
study findings. Further, impact estimates, absent detailed, externally valid information about program
implementation, are difficult to interpret.
Both studies released final reports late in 2008, and those final studies have been discussed widely in
the education policy community. This presentation will summarize results of both studies, provide a
synthesis of how these results contribute to our understanding of the implementation and impact of
this important federal reading initiative, and will discuss the strengths and challenges of each study’s
methodology.
top
|
| Oct. 20 |
Learning Progressions in Middle School Science Instruction and Assessment:
A Proposal to the IES
Linda Morell, UC Berkeley
At the heart of calls for reform and improvement in science education is a concern to see more emphasis on the assessment of student understanding of scientific content and their ability to reason scientifically. We have written a grant proposal for a project that seeks to understand (1) how students interpret science concepts, (2) how they reason about those concepts (and perhaps, more generally), and (3) the relationship between students’ knowledge of content and reasoning. We plan to develop assessments for learning progressions in physical science (Structure of Matter) together with assessments of students’ scientific reasoning – assessments which will measure these features in a valid and reliable way, and inform how learning may progress in each. We plan to use the BEAR Assessment System (BAS) to develop and refine the assessment materials, including construct maps, items, scoring guides, and other interpretative tools for teachers. We plan to use the latest software tools from the FADS project to advance the development of these materials, as well as possibly to deliver and score some of the assessments. Data will be gathered from middle grade science students (grade 8) in San Francisco Unified School District, an ethnically, culturally, and linguistically diverse
urban school district. The researchers plan to work with teachers and others to develop and revise assessment materials designed to measure student understanding and their ability to coordinate theory and evidence to construct domain-specific arguments of the particulate model of matter (grade 8). We anticipate that this work in each of thesedomains will enable us to construct a model of the major features of the ways in which student knowledge of the domain and their ability to reason develops. In this presentation, we will discuss the content of the proposal as well as the strategies employed to develop and complete the proposal in a timely way.
top
|
| Nov. 3 |
Item Response Modeling:
Applications to Large-Scale Assessment of Academic Achievement
Xiaohui Zheng, UC Berkeley
The call for standards-based reform and educational accountability has led to increased attention to large-scale assessments. Over the past two decades, large-scale assessments have been providing policymakers and educators with timely information about student learning and achievement to facilitate their decisions regarding schools, teachers and students. For large-scale assessments, the outcomes are far from straightforward. There have been great concerns about generalizing and interpreting assessment results, due to a rather large number of observed and unobserved variables co-existing and interacting at different levels of the assessment system. A wide variety of advanced techniques are available for the analysis of large-scale assessments, many of which do not fully capture the complex nature of the data.
This research explores multilevel item response modeling and its application to largescale assessments. Building on the Multidimensional Random Coefficients Multinomial Logit (MRCML) framework and the Generalized Linear Latent and Mixed Model (GLLAMM) framework, three forms of multilevel item response models are presented in three separate studies. Each study addresses a specific measurement issue. The first study proposes a Latent Growth Item Response Model
(LG-IRM) for the analysis of longitudinal assessment data. A growth model is incorporated into the item response function from a multidimensional perspective. Instead of using ability scores, the LGIRM provides a direct representation of item responses in the longitudinal model. The second study discusses Multilevel Structural Equation Models (MSEMs) for the relationship between latent variables. MSEMs that combine measurement models with multilevel regression models are used to explore the effect of the school-level latent variable on the student-level latent variable. The third study investigates between-school Differential Item Functioning (DIF) as well as Differential Facet Functioning (DFF). School-to-school variability is examined in terms of differential functioning of items for students in different schools. An explanatory DFF model that includes covariates is also
formulated to predict school effects on the DFF.
top
|
| Nov.17 |
Addressing the incommensurability between state-administered, standards-based exams in
mathematics, and the assessment-mediated problem solving exercises required to support coherent and progressive learning and teaching in the classroom
Bernard R. Gifford, UC Berkeley
I will be discussing a novel ensemble of enabling technologies, designed and built from the ground up to translate into practice the vision of assessment-guided learning and instruction in the content area of mathematics articulated by the architects of No Child Left Behind (NCLB). These architects imagined that teachers, administrators, and policymakers would eventually learn to leverage the student performance data generated by State-administered exams to customize and target evidence-based instructional resources and interventions to those students most in need of this type of support. The ensemble, Learning Conductor: Mathematics (“Conductor”), incorporates a quartet of interrelated technologies:
1) Item Generator (I-GEN) employs sophisticated parameterization techniques to transmute the proven psychometrically valid but highly compacted assessments incorporated in State-administered end-of-year exams into the stream of similarly valid, but more finely tuned and less compact classroom-oriented assessment items needed to evaluate the effectiveness of the finite sequences of instructional activities deliberately scored and orchestrated by teachers to help students achieve targeted levels of proficiency in specific strands of the mathematics curriculum.
2) Problem Generator (P-Gen) generates solutions to the mathematics assessment items generated by IGEN in the form of elaborated “Worked-examples,” a problem-solving format shown to be significantly effective in increasing student mathematical proficiency. P-Gen also dynamically generates geometrically mathematical representations to complement these worked-examples, another capability proven to be effective in increasing student mathematical proficiency.
3) Open Communications (O-Gen) is a collection of Open-Source communication and collaboration
resources architected to make the assessment/Worked-example couplets (“Couplets”) generated by IGen and P-Gen accessible to students, teachers, schools, public libraries, public housing learning
centers, and other education-minded community-based organizations on a location-independent basis.
4) Open Source relational database application (D-Gen) is architected and programmed to track the actions of individual students, as they negotiate the mathematical couplets generated by I-Gen and P-Gen. DGen can be used to produce progress reports that will make it clear to students the influence of their assessment-based problem-solving activities on their growth as capable mathematics learners. D-Gen’s report generation capabilities can also be used to provide teachers the kinds of data they will need to adjust their own teaching practices to take into account the assessment-guided learning activities of their students.
I will discuss the origins and development of Learning Conductor, and more, importantly, the theory of assessment-guided learning and teaching that is beginning to emerge from these efforts to address the incommensurability of end-of-term and everyday classroom assessment items. I will also discuss how Learning Conductor can support the i) creation and evaluation of novel assessment items; ii) buttressed recent theories on “learning progressions” with empirical evidence; iii) support ongoing efforts redefine and reconstitute mathematics homework; and, iv) support the efforts of teachers working in educational settings characterized by a high degree of student heterogeneity.
top
|
| Dec. 1 |
The preparation of aspiring educational researchers in the empirical qualitative and quantitative traditions of social science: Methodological Rigor, social and theoretical relevance, and more
Judith Warren Little, UC Berkeley
In 2008, at the behest of a group of education deans from eleven major research universities (recipients of 10 years of Research Training Grant funding), the Spencer Foundation convened a task force to offer guidance for doctoral-level research preparation. This report is the result of the task force’s work over approximately one year. It is organized from a community of practice perspective, identifying four “universes” that doctoral students must be prepared to inhabit when they complete their doctoral study. Two composite vignettes profile prototypical student histories and graduate school trajectories; they provide the vehicle for introducing specific aspects of research preparation in the social science traditions and the various contexts (coursework, research groups, GSRs, peer writing groups, etc.) in which students develop expertise and acquire experience. The report devotes a separate section to the role of the advisor(s), and concludes with a set of recommendations targeted at each of the four “universes.”
Download pdf of the task force's report.
top
|
|
|