Applying big-data techniques to small-data: Latent Semantic Analysis of interviews investigating reasons for parents’ choice of school

A paper I’m working on at the moment.

Applying big-data techniques to small-data: Latent Semantic Analysis of interviews investigating reasons for parents’ choice of school

ABSTRACT    In economics, preferences are revealed from the measurable attributes of actual choices. However, choices in education are complex. Many factors underlie the decision processes associated with how parents choose a school for their children. In-depth interviews investigating how and why parents choose a particular school for their children suggest that there are a wide variety of attributes playing an important role in this process.  For economic analysis these attributes are not easily identified and measured within a traditional revealed preference framework. In this study I apply latent semantic analysis to a set of 22 open-ended interviews exploring how and why parents choose a particular school for their children to extract latent choice attributes in a measurable form.  Latent semantic analysis is used to elicit key words from these interviews to reveal those attributes  associated with a parent’s actual choice of school. These words, such as ‘encourage’ and ‘support’, represent particular choice mindsets that frame a wide range of possible choice attributes into a smaller bundle of evaluated attributes that can be mapped to a parent’s actual choice.  Latent semantic analysis is used to first calculate the semantic distance between individual interviews and a set of target words.  Semantic distances are then statistically analysed for clustering subject to school-type (such a public, independent, Catholic and government selective). These semantically revealed words can then be analysed in a more traditional economic framework as trade-offs between particular preference attributes. Importantly, this analysis indicates that there exist distinct groups of parents who are motivated by different choice mindsets but ultimately choose the same type of school. Some of the advantages and disadvantages of applying big-data techniques to small-data sets of relatively large open-ended text responses are discussed.

Using linguistic analysis to understand how parents choose schools for their children

In economics, there is limited use of linguistic analysis to understand decision making processes and the contextual relationship between preferences.  Over the last 6 months I have undertaken field research to understand how parents choose a school for their children and the decision architecture associated with this choice.  The objective was not simply to collect information about stated preferences per se, but to understand the complexity of the decision process.   I collected 22 exploratory interviews from Melbourne and regional Victorian parents – with a reasonable level of diversity in family demographics – looking at how they approach the problem of choosing a school for their children.

The purpose of these interviews was to principally explore for interesting economic ideas and questions arising from field observations.  The intent was not to achieve a statistically robust collection of interviews of limited scope but instead to explore for opportunities that would warrant targeted econometric, experimental or theoretical research in the later part of my PhD.   The presentation I gave at the 2014 ‘Cooperation and conflict in the family’ conference on an intergenerational discount heuristic is one of the ideas that arose from these field observations/interviews.

Continue reading

Latent Semantic Analysis

An everyday application of Latent Semantic Analysis (LSA) is the Google search engine where words that are semantically/contextually similar are also returned in the search query.  Type in “run” and the search will also pick up “ran”, “runs” and “running”. LSA allows natural language processing of vast collections of data, such as web pages, to provide information about how similar words are related to each other in (semantic) context by converting words into vectors (vectorial semantics) and applying singular value decomposition to the matrix.  In this way, the data itself is used to create a ‘latent semantic dictionary/thesaurus’ which reflects the context of the documents being analysed.

LSA captures the contextual relationships between text documents and word meanings.  Taking into account the context in which words are used is important for linguistic analysis.  The contextual meaning of words change over time and across social groups.  An example of the importance of context is how the meaning of ‘terrific’ changes over time.  Latent semantic analysis of documents from the second half of the 19th century would show ‘terrific’ as similar to ‘horror’. While documents from the second half of the 20th century would show ‘horror’ as now being the opposite of ‘terrific’.

Continue reading

%d bloggers like this: