Teaching Machines to Hate: Scalable Detection of Online Hate Speech

Reports of hate speech targeting minority groups have risen dramatically since launch of Donald Trump’s presidential campaign. Although this surge is well reported, it is unknown how rates of hate speech vary across online properties and over time. To overcome these challenges, this study identifies and examines online incidents of hate speech, designing a replicable research methodology in collaboration with the Anti-Defamation League. We develop a theoretically informed codebook and hand label hate speech in approximately 9,000 online comments. We subsequently apply supervised machine learning algorithms to differentiate hate speech from non-hate speech in this labelled corpus. We investigate traditional feature engineering in natural language processing combined with standard machine learning algorithms.

After integrating hyperparameter optimization and model ensembling, we achieve an area under the curve (AUC) of 0.78. Performance notably improves by replacing traditional feature engineering with pre-trained GloVe word embeddings (AUC = 0.85). Deep recurrent neural networks (LSTM and GRU) by contrast did not improve predictive performance (AUC = 0.71), potentially due to sample size limitations or non-optimal hyperparameter tuning. Our system can be applied to new text on Facebook, Twitter, The New York Times, and a variety of other platforms to scalably and automatically identify hate speech over time. Future extensions may include the design of randomized trials to develop interventions that durably reduce hate speech.

Tuesday, October 31, 2017 - 2:00pm