Biblio
Gender differences and similarities in PISA 2003 mathematics: A comparison between the United States and Hong Kong. International Journal of Testing, 9, 20–40.
. (2009). 
From principles to practice: An embedded assessment system. Applied Measurement in Education, 13, 181–208.
. (2000). Formulating the Rasch Differential Item Functioning Model Under the Marginal Maximum Likelihood Estimation Context and Its Comparison With Mantel–Haenszel Procedure in Short Test and Small Sample Conditions. Educational and Psychological Measurement, 71, 1023–1046.
. (2011). 
Formulating latent growth using an explanatory item response model approach. Journal of applied measurement, 13, 1–22.
. (2011). 
Formulating latent growth using an explanatory item response model approach. Journal of applied measurement, 13, 1.
. (2012). Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement, 35, 583–603.
. (2011). 
Explanatory item response models: A brief introduction. Assessment of competencies in educational contexts, 91–120.
. (2008). 
The evidence-based reasoning framework: Assessing scientific reasoning. Educational Assessment, 15, 123–141.
. (2010). 
Evaluating SAT coaching: gains, effects and self-selection. Rethinking the SAT: The Future of Standardized Testing in University Admissions, 217–233.
. (2004). Does participation in an intervention affect responses on self-report questionnaires?. Health education research, 21, i98–i109.
. (2006). 
Dimensionality, Dependence, or Both?: An Application of the Item Bundle Model to Multidimensional Data. Position Paper; Policy, Organization, Measurement, & Evaluation; Graduate School of Education; University of California, Berkeley.
. (2001). . (2006).
Contributions of Middle Grade Students to the Validation Process of a National Science Assessment Study. Middle Grades Research Journal, 3, 1–22.
. (2008). 
Contributions of Middle Grade Students to the Validation Process of a National Science Assessment Study. Middle Grades Research Journal, 3, 1–22.
. (2008). 
ConstructMap Version 4 (computer program). University of, Berkeley, CA: BEAR Center.
. (2008). 
Constructing One Scale to Describe Two Statewide Exams. Journal of applied measurement, 10, 170–184.
. (2009). 
Concrete, abstract, formal, and systematic operations as observed in a" Piagetian" balance-beam task series. Journal of applied measurement, 11, 11.
. (2010). Concrete, abstract, formal, and systematic operations as observed in a" Piagetian" balance-beam task series. Journal of applied measurement, 11, 11–23.
. (2009). 
On the conceptual foundations of psychological measurement. Journal of Physics: Conference Series, 459, 012008. Retrieved from http://stacks.iop.org/1742-6596/459/i=1/a=012008
. (2013). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51-72.
. (1995). A competence model for environmental education. Environment and Behavior, 0013916513492416.
. (2013). 
A comparative analysis of the ratings in performance assessment using generalizability theory and the many-facet Rasch model. Journal of applied measurement, 10, 408–423.
. (2008). 
Cognitive diagnosis using item response models. Zeitschrift für Psychologie/Journal of Psychology, 216, 74–88.
. (2008). 