Biblio
Does participation in an intervention affect responses on self-report questionnaires?. Health education research, 21, i98–i109.
. (2006). 
Evaluating SAT coaching: gains, effects and self-selection. Rethinking the SAT: The Future of Standardized Testing in University Admissions, 217–233.
. (2004). The evidence-based reasoning framework: Assessing scientific reasoning. Educational Assessment, 15, 123–141.
. (2010). 
Explanatory item response models: A brief introduction. Assessment of competencies in educational contexts, 91–120.
. (2008). 
Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement, 35, 583–603.
. (2011). 
Formulating latent growth using an explanatory item response model approach. Journal of applied measurement, 13, 1.
. (2012). Formulating latent growth using an explanatory item response model approach. Journal of applied measurement, 13, 1–22.
. (2011). 
Formulating the Rasch Differential Item Functioning Model Under the Marginal Maximum Likelihood Estimation Context and Its Comparison With Mantel–Haenszel Procedure in Short Test and Small Sample Conditions. Educational and Psychological Measurement, 71, 1023–1046.
. (2011). 
From principles to practice: An embedded assessment system. Applied Measurement in Education, 13, 181–208.
. (2000). Gender differences and similarities in PISA 2003 mathematics: A comparison between the United States and Hong Kong. International Journal of Testing, 9, 20–40.
. (2009). 
Gender differences and similarities in PISA 2003 mathematics: a comparison between the United States and Hong Kong. International Journal of Testing, 9, 20–40.
. (2009). Gender differences in large-scale math assessments: PISA trend 2000 and 2003. Applied Measurement in Education, 22, 164–184.
. (2009). 
Generalizability in item response modeling. Journal of Educational Measurement, 44, 131–155. Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2007.00031.x/full
. (2007). A gentle introduction to Rasch measurement models for metrologists. Journal of Physics: Conference Series, 459, 012002. Retrieved from http://stacks.iop.org/1742-6596/459/i=1/a=012002
. (2013). Improving assessment evidence in e-learning products: some solutions for reliability. International Journal of Learning Technology, 5, 191–208.
. (2010). 
Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach. Health Education Research, 21, i19–i32.
. (2006). 
Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach. Health education research, 21, i19–i32.
. (2006). Improving measurement in health education and health behavior research using item response modeling: introducing item response modeling. Health education research, 21, i4–i18.
. (2006). 
Introducing equating methodologies to compare test scores from two different self-regulation scales. Health education research, 21, i110–i120.
. (2006). 
Introducing multidimensional item response modeling in health behavior and health education research. Health education research, 21, i73–i84.
. (2006). 
An introduction to multidimensional measurement using Rasch models. Journal of Applied Measurement, 4, 87–100.
. (2003). An IRT modeling of change over time for repeated measures item response data using a random weights linear logistic test model approach. Asia Pacific Education Review, 13, 487–494.
. (2012). 
A LLTM approach to the examination of teachers' ratings of classroom assessment tasks. Psychology Science, 50, 417.
. (2008).