Font Size: a A A

Text Retrieval Based On Keywords

Posted on:2012-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2178330335959845Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and increased Internet users, information on the Internet is expanding. Nowadays, the total number of Internet users has reached 430 million in China. Internet, as a huge information database, can provide people with rich information. However, how to find useful information quickly among enormous amount of information is a big problem. In order to solve this problem, search engine has developed rapidly and become to be an important way for people to obtain information. Text retrieval technology is the basis for the search engine.The basic task of text retrieval is returning related information based on the users'input. Keywords based means extracts keywords from the users'input and then use the keywords as Search criteria. The topic which the user input is not limited in one field.This paper is to provide a more accurate information retrieval method which means the returned results with a high precision based on the input. In order to improve accuracy, this paper researched on sentences level. Based on keywords, a more efficient sentences retrieval model has been given. First, do preprocess to the normal texts such as segment and POS. Second, several normal query expansion methods are compared and analyzed. Query expansion is necessary to make the input meaningful and extract keywords. And then establish a model to detect relevance. The search results are optimized by training the model and then an effective method which combines query expansion and language model is given. Compared experiments are done between Lucene and this method. Finally, based on the front research, last part of this paper given an simple but effective method of extracting reason.Through analysis of the experimental results, the following conclusion can be drawn:it has been proved that some text retrieval methods can also be used to sentences retrieval; query expansion method can optimize the search results; for the language model-based retrieval method, the balance of unigram and bi-gram runs better than only unigram; Lucene is more efficient for large-scale corpus because of inverted index; it is an easy and efficient way to detect reason through indicative words, however, for some implied reasons, it return poor performance.
Keywords/Search Tags:text retrieval, query expansion, relevance detection model, reason extract
PDF Full Text Request
Related items