Font Size: a A A

Research Of Chinese-Text Retrieval Based On Latent Semantic Indexing

Posted on:2009-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2178360245989580Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Most information on Internet is based on text. The explosive growth of text information is a great challenge to information retrieval, making it increasingly difficult to find useful information on internet rapidly and accurately. In the most used information retrieval based on keywords match, what match is the explicit representation, but there exists uncertainty in natural languages, such as synonym and polysemy. It is not easy for users to express what they really want to retrieve just with keywords or keyword chains.Latent Semantic Indexing Model is easy to calculate and requires less human intervention. Latent semantic Space is established by truncated singular value decomposition, terms and documents are projected onto the LSI-Space. Then the semantic relationships among terms are abstracted to present the semantic structure of natural languages, it improves the retrieve performance.The thesis focuses on how to improve the Chinese text information retrieval system performance based on LSI and its features. Firstly,The key technology and mathematical basis of LSI were analyzed deeply. Examples were given and analyzed which aimed at Chinese text retrieval. Secondly,The term weighting which is of great importance in LSI is studied in detail, and a new weighting design based on non- linear function and location factor was proposed. The retrieval performance has been improved further.Using the concept that the LSI-Space can calculate the relation among documents conveniently, "doc-doc retrieval" is put forward to make uers' retrieval more effectively. It offsets the effects that the retrieval sentences and input inaccurately affects the retrieval precision. At last, an experimental platform, namely"Chinese LSI Analysis System" ,has been developed. In this system, each vital link in LSI is correspond to special experimental method, and presents the result visually. All aspects in the dissertation are evidenced with experiments on this system.
Keywords/Search Tags:Information Retrieval, Latent Semantic Indexing, Term Weighting, doc-doc retrieva
PDF Full Text Request
Related items