Font Size: a A A

The Research Of Optimization Technology In Latent Semantic Indexing Based On Pseudo Text

Posted on:2011-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:D B GuoFull Text:PDF
GTID:2178360302488552Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Synonym and polysemy widely coexist in the natural language, so it is not easy for users to express what they really want to retrieve from Internet just by keywords based on the word form matching. Latent Semantic Indexing maps synonym into the same dimension of the latent semantic space and maps polysemy into the different dimension. Consequently, it solves the problem of synonym and polysemy to some extent. Compared with conception dictionary, Latent Semantic Indexing has the advantages of high computability, adaptability of field and has become the research topic in the field of natural language processing.This paper is to improve the accuracy of the term vector and the text vector in the latent semantic space by intensifying the rational term co-occurrence information. Based on the above idea, this paper puts forward the optimization framework of Latent Semantic Indexing based on Pseudo Text. And Pseudo Text is the new text by the supervised constitution for the original text collection. At the same time, the paper puts forward two optimization strategies based on the optimization framework of Pseudo Text: the optimization method based on the semantic block and the optimization method based on the semantic resource.In view that regarding the text window as the transitive window in Latent Semantic Indexing gives rise to the inaccurate association measure of the terms; this paper proposes the optimization method based on the semantic block. This method split the text collection by the supervised method so as to merge the similar terms into the same semantic block. Accordingly, this method intensifies the relativity of the similar term in terms of semantics and optimizes the term vector and the text vector in the latent semantic space.At the meantime, the process of establishing the latent semantic space is unsupervised during the period of using Latent Semantic Indexing, so that the term vector and the text vector is inaccurate. The optimization method based on semantic resource adds the arithmetic into the text collection in order to be instructive to intensify the correlation measure of the synonymous terms. Finally, the term vector and the text vector are more accurate in the latent semantic space by using this method.Finally, this paper developed Patent Retrieval System based on LSI as the experimental system so as to show the experimental result of the method proposed in this paper by visualization.
Keywords/Search Tags:LATENT SEMANTIC INDEXING, LATENT SEMANTIC SPACE, SEMANTIC BLOCK, PSEUDO TEXT
PDF Full Text Request
Related items