Seema Jayachandran: Using machine learning and qualitative interviews to design a five-question survey module for women's agency

February 15, 2022

Tuesday, February 15, 2022

2:00 - 4:00 PM (PST) in Berkeley Way West room 1212 and Zoom


Open-ended interview questions elicit rich information about people's lives, but in large-scale surveys, social scientists often need to measure complex concepts using only a few close-ended questions. We propose a new method to design a short survey measure for such cases by combining mixed-methods data collection and machine learning. We identify the best survey questions based on how well they predict a "gold standard'' measure of the concept derived from qualitative interviews. We apply the method to create a survey module and index for women's agency. We measure agency for 209 women in Haryana, India, first, through a semi-structured interview and, second, through a large set of close-ended questions. We use qualitative coding methods to score each woman's agency based on the interview, which we treat as her true agency. To determine the close-ended questions most predictive of the "truth," we apply statistical algorithms that build on LASSO and random forest but constrain how many variables are selected for the model (five in our case). The resulting five-question index is as strongly correlated with the coded qualitative interview as is an index that uses all of the candidate questions. This approach of selecting survey questions based on their statistical correspondence to coded qualitative interviews could be used to design short survey modules for many other latent constructs.