Measuring Hate Speech: Unifying Deep Learning with Item Response Theory

The hate speech measurement project began in 2017 at UC Berkeley’s D-Lab. Our goal is to apply data science to track changes in hate speech over time and across social media. After two and a half years we are now nearing the completion of a comprehensive, groundbreaking method to measure hate speech with precision while mitigating the influence of human bias.

This project website was created in October 2019 to tell our story. We are continuing to expand its content as we begin to report on our findings and solicit new partnerships.

Unifying Deep Learning with Item Response Theory: Interval Measurement, Annotator Debiasing, Efficiency, and Explainability

Outcomes are commonly measured as binary variables: a comment is toxic or not, an image has sexual content or it doesn’t, etc. But the real world is more complex: most target variables are inherently continuous in nature. Physical quantities such as temperature and weight can be measured as interval variables where magnitudes are meaningful. How can we achieve that same interval measurement for arbitrary outcomes - creating continuous scales with magnitudes?

We propose a method for measuring phenomena as continuous, interval variables by unifying deep learning with the Constructing Measures approach to Rasch item response theory (IRT). We decompose the target construct into ordinal components measured as survey items, which are then transformed via an IRT non-linear activation into a continuous measure of unprecedented quality. We estimate first-order labeler bias and eliminate its influence on the final construct when creating a training dataset, which supersedes the notion of inter-rater reliability as a quality metric. To our knowledge this IRT bias adjustment has never before been implemented in machine learning but is critical for algorithmic fairness. We further estimate the response quality of each individual labeler, allowing responses from low-quality labelers to be removed.

Our IRT scaling translates naturally into multi-task, weight-sharing deep learning architectures in which our theorized outcome components become supervised, ordinal latent variables for the neural networks’ internal representation learning. Our multitask architecture exploits a proportional odds activation function and quadratic weighted kappa loss function designed for ordinal outcomes. This leads to a new form of model explanation because each continuous prediction can be directly explained by the constituent ordinal components in the penultimate layer.

We demonstrate our method on a new dataset of 50,000 online comments labeled to measure a spectrum from hate speech to counterspeech, and sourced from YouTube, Twitter, and Reddit. We evaluate Universal Sentence Encoders, BERT, RoBERTa, and XLNet as contextual representation models for the comment text, and benchmark our predictive accuracy against Google Jigsaw’s Perspective API models.

Dr. Claudia von Vacano is the Executive Director of the D-Lab and the Digital Humanities at Berkeley, and is on the boards of the Social Science Matrix and Berkeley Center for New Media. She has worked in policy and educational administration since 2000, and at the UC Office of the President and UC Berkeley since 2008. She received a Master’s degree from Stanford University in Learning, Design, and Technology. Her doctorate is in Policy, Organizations, Measurement, and Evaluation from UC Berkeley. Her expertise is in organizational theory and behavior and in educational and language policy implementation. The Phi Beta Kappa Society, the Andrew W. Mellon Foundation, the Rockefeller Brothers Foundation, and the Thomas J. Watson Foundation, among others, have recognized her scholarly work and service contributions.

Chris Kennedy is a PhD student in Bio Statistics specializing in RCTs, machine learning, and causal inference for precision medicine & elections. As a D-Lab consultant he advises researchers on statistical analysis & computing in R.

Click for online presentation sides with animations

Click for recorded talk

Tuesday, March 17, 2020 - 2:00pm
Berkeley Way West
PDF icon Presentation slides1.61 MB