Latent Semantic Analysis

An everyday application of Latent Semantic Analysis (LSA) is the Google search engine where words that are semantically/contextually similar are also returned in the search query.  Type in “run” and the search will also pick up “ran”, “runs” and “running”. LSA allows natural language processing of vast collections of data, such as web pages, to provide information about how similar words are related to each other in (semantic) context by converting words into vectors (vectorial semantics) and applying singular value decomposition to the matrix.  In this way, the data itself is used to create a ‘latent semantic dictionary/thesaurus’ which reflects the context of the documents being analysed.

LSA captures the contextual relationships between text documents and word meanings.  Taking into account the context in which words are used is important for linguistic analysis.  The contextual meaning of words change over time and across social groups.  An example of the importance of context is how the meaning of ‘terrific’ changes over time.  Latent semantic analysis of documents from the second half of the 19th century would show ‘terrific’ as similar to ‘horror’. While documents from the second half of the 20th century would show ‘horror’ as now being the opposite of ‘terrific’.

LSA can also be used to categorize documents based on similarities in word usage and ‘commonality of context’.  LSA  is one of the methods used to try and identify (speculate) who the ‘real’ Shakespeare was by comparing the works of Shakespeare with the writings of other contemporaries.   It is this application of LSA that I’m most interested in.

Where the meaning words should remain constant as a function of narrow time frames and geographical locations, it should be possible to identify economic decision types through their unique ‘commonality of context’.  The relationship between preferences across groups can then be analysed and hopefully shed light on the underlying architecture of the decision process.