A paper I’m working on at the moment.
Applying big-data techniques to small-data: Latent Semantic Analysis of interviews investigating reasons for parents’ choice of school
ABSTRACT In economics, preferences are revealed from the measurable attributes of actual choices. However, choices in education are complex. Many factors underlie the decision processes associated with how parents choose a school for their children. In-depth interviews investigating how and why parents choose a particular school for their children suggest that there are a wide variety of attributes playing an important role in this process. For economic analysis these attributes are not easily identified and measured within a traditional revealed preference framework. In this study I apply latent semantic analysis to a set of 22 open-ended interviews exploring how and why parents choose a particular school for their children to extract latent choice attributes in a measurable form. Latent semantic analysis is used to elicit key words from these interviews to reveal those attributes associated with a parent’s actual choice of school. These words, such as ‘encourage’ and ‘support’, represent particular choice mindsets that frame a wide range of possible choice attributes into a smaller bundle of evaluated attributes that can be mapped to a parent’s actual choice. Latent semantic analysis is used to first calculate the semantic distance between individual interviews and a set of target words. Semantic distances are then statistically analysed for clustering subject to school-type (such a public, independent, Catholic and government selective). These semantically revealed words can then be analysed in a more traditional economic framework as trade-offs between particular preference attributes. Importantly, this analysis indicates that there exist distinct groups of parents who are motivated by different choice mindsets but ultimately choose the same type of school. Some of the advantages and disadvantages of applying big-data techniques to small-data sets of relatively large open-ended text responses are discussed.