| Latent Semantic Analysis |
Article Index for Latent |
Website Links For Semantic |
Information AboutLatent Semantic Analysis |
| CATEGORIES ABOUT LATENT SEMANTIC ANALYSIS | |
| information retrieval | |
| natural language processing | |
|
APPLICATIONS Applications of LSA include the and Polysemy :
OCCURRENCE MATRIX LSA uses a Term-document Matrix which describes the occurrences of terms in documents; it is a Sparse Matrix whose rows correspond to documents and whose columns correspond to Term s, typically Stemmed words that appear in the documents. A typical example of the weighting of the elements of the matrix is Tf-idf : the element of the matrix proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance. This matrix is common to standard semantic models as well (though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrix are not always used). RANK LOWERING After the construction of the occurrence matrix LSA finds a low- Rank approximation to the Term-document Matrix . The reasons for the approximations can have various explanations:
The consequence of the rank lowering is that some dimensions get "merged":
This mitigates synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings. It also mitigates polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share this sense. Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense. IMPLEMENTATION Concretely, the downsizing of the matrix is often achieved through the use of Singular Value Decomposition (SVD): the set of all the terms is then represented by a vector space of lower dimensionality than the total number of terms in the vocabulary. The SVD is typically computed using large matrix methods (for example, Lanczos Methods ) but may also be computed incrementally and with greatly reduced resources via a Neural Network -like approach which does not require the large, full-rank matrix to be held in memory {Link without Title} . LIMITATIONS OF LSA LSA features a number of drawbacks:
:will occur. This leads to results which can be justified on the mathematical level, but have no interpretable meaning in natural language.
SEE ALSO EXTERNAL LINKS AND REFERENCES
|
|
|