I wrote a tutorial on Latent Semantic Analysis (LSA). It can be accessed by following this link . I believe LSA is a very interesting method for ranking documents in response to a query. LSA is a method for discovering hidden concepts in document data. Each document and term (word) is expressed as a vector with elements corresponding to these concepts. Each element in a vector gives the degree of participation of the document or term in the corresponding concept. The goal is not to describe the concepts verbally, but to be able to represent the documents and terms in a unified way for exposing document-document, document-term, and term-term similarities or semantic relationship which are otherwise hidden. An Example Suppose we have the following set of five documents d1 : Romeo and Juliet. d2 : Juliet: O happy dagger! d3 : Romeo died by dagger. d4 : “Live free or die”, that’s the New-Hampshire’s motto. d5 : Did you know, New-Hampshire is in New-England. and search query: dies , dagger ...
Comments
Post a Comment