Applying big-data techniques to small-data: Latent Semantic Analysis of interviews investigating reasons for parents’ choice of school

A paper I’m working on at the moment.

Applying big-data techniques to small-data: Latent Semantic Analysis of interviews investigating reasons for parents’ choice of school

ABSTRACT    In economics, preferences are revealed from the measurable attributes of actual choices. However, choices in education are complex. Many factors underlie the decision processes associated with how parents choose a school for their children. In-depth interviews investigating how and why parents choose a particular school for their children suggest that there are a wide variety of attributes playing an important role in this process.  For economic analysis these attributes are not easily identified and measured within a traditional revealed preference framework. In this study I apply latent semantic analysis to a set of 22 open-ended interviews exploring how and why parents choose a particular school for their children to extract latent choice attributes in a measurable form.  Latent semantic analysis is used to elicit key words from these interviews to reveal those attributes  associated with a parent’s actual choice of school. These words, such as ‘encourage’ and ‘support’, represent particular choice mindsets that frame a wide range of possible choice attributes into a smaller bundle of evaluated attributes that can be mapped to a parent’s actual choice.  Latent semantic analysis is used to first calculate the semantic distance between individual interviews and a set of target words.  Semantic distances are then statistically analysed for clustering subject to school-type (such a public, independent, Catholic and government selective). These semantically revealed words can then be analysed in a more traditional economic framework as trade-offs between particular preference attributes. Importantly, this analysis indicates that there exist distinct groups of parents who are motivated by different choice mindsets but ultimately choose the same type of school. Some of the advantages and disadvantages of applying big-data techniques to small-data sets of relatively large open-ended text responses are discussed.

The complexity of educational choices made by parents for their children

At a high level, choice decisions relate to trade-offs between consumption and savings, now and across subsequent time periods subject to constraints and uncertainty.  For parents, educational choices for their children are constrained by the parents’ income, time and regulations, and subject to high levels of uncertainty over very long time frames. Parental choice relating to investments in their children’s education only really occurs in the broad range of the socio-economic ‘middle class’. For the very wealthy, choice is the default of ‘only the best’ which requires little to no effort in decision making despite the cost of the education itself. Parents in low socio-economic conditions lack both the time and experience to research education options and the monetary resources to capture opportunities as they arise, leading to an acquiescence to the default choice of no action.

Rational choice theory suggests that parents are utility maximisers who make decisions from clear value preferences and can be relied upon to make decisions in the best interests of their children (Becker & Tomes 1976, 1979). Yet in deciding which school a child should attend, under rational choice conditions, a parent is required to make a series of complex intergenerational and intertemporal choices that would challenge seasoned economists. Educational choices are predominantly path dependent, subject to imperfect information and in most cases irreversible. Ordinary parents however, need to make these decisions with little training and with limited time to evaluate options. Instead, parents rely on a suite of behavioural heuristics in order to achieve a good outcome for their children.  Individual choice is also context dependent, subject to the experiences of parents, their expectations of the future, a duty to their children and emotional attachment.

How can a parent make optimal decisions in the face of so many possible choices and outcomes? Choices which are necessarily sequential and irreversible once made. To overcome the complexity of choice, humans have developed decision strategies which allow shortcuts to be taken to achieve a ‘good’ outcome in the face of incomplete information and limited time for evaluation. These heuristics, intuitive decision rules, allow mathematically hard problems to be solved under restrictive conditions where a good outcome is achieved at the expense of a perfect outcome. For a parent, a perfect outcome is only possible by chance and impossible by deliberate calculation.

While heuristics are ‘quick & dirty’ solutions, they draw on highly sophisticated underlying processes. Tversky & Kahneman (1983) testing the conjunction rule in likelihood rankings using the classic ‘Bill & Linda’ experiments showed that there was no difference between naïve and sophisticated participants. Experiments undertaken by Gigerenzer & Goldstein (1996) tested the effectiveness of fast and frugal decision heuristics, such as ‘take the best, ignore the rest’, against sophisticated statistical estimation strategies, such as Bayesian networks. Their research showed that fast and frugal heuristics did not fall too far behind a Bayesian network approach. More interestingly, as the quality of available information used for estimation decreased, heuristic strategies became more effective when compared with the more sophisticated strategies.

The complexity of the decision architecture associated with making choices, combining both rational choice and behavioural components, is illustrated in the ‘Choice Process’ diagram below:

The Choice Process

————————————————

Becker, G. S., & Tomes, N. 1976. Child Endowments and the Quantity and Quality of Children. The Journal of Political Economy, 84(4), S143-S162.

Becker, GS & Tomes, N 1979, ‘An equilibrium theory of the distribution of income and intergenerational mobility’, The Journal of Political Economy, 1153-1189.

Tversky, A & Kahneman, D 1983, ‘Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment’, Psychological review, 90(4), 293.

Gigerenzer, G & Goldstein, DG 1996, ‘Reasoning the fast and frugal way: models of bounded rationality’, Psychological review, 103(4), 650.

McFadden, D 2001 ‘Economic Choices’, The American Economic Review 91(3): 351-378.

%d bloggers like this: