Font Size: a A A

Passage Retrieval System And Its Application

Posted on:2011-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:W LinFull Text:PDF
GTID:2178330338989599Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the development of Internet Search Engine is so rapid that it's widely used by millions of people on the Internet, but the demand of long-text input is always ignored by big Search Engines. At the same time, people can use Search Engine to get numbers of text fastly, but only few of passages in these texts are useful to user. So develop a Passage Retrieval Engine, which can satisfy long-text input, and find out related passages, will solve the above ploblem of common Search Engine. In the research on Passage Retreival, it only been treat as a part of Question Answering and study in it, and there is no system to statisfy the demand of long-text input to find related passages. In order to solve this problem, the writer develop the HaiTianYuan Passage Retrieval System and study the related algorithm in it.To satisfy the long-text input, and find out related passages, a passage retrieval process is proposed: to do passage segmentation in index period, then turn input passage into ordered keywords sequence; after that use an algorithm based on key word intersection to search"related"passages, compute passage similarity and rank passages at last. In this process, some efficient algorithms are proposed: an efficient passage segmentation algorithm based on window and a stategy of segmenting passages before searching, which can shorten search time and keep in high precison at the same time; a keyword picking algorithm is proposed too to represent input passage, which can transfer input passages into the form can be search; an alogrithm which improve the proformance of SiteQ in computing passage similarity, which can compute passage similarity in words density level.The data used in system comes from the Haitianyuan Finance Spider. In order to evaluate the performance of Passage Retrieval system, the author propose evaluate methods for each step of system. To evaluate keyword picking algorithm, the author compare the keywords picked by algorithm and by human, the pricison is proved to be good(the pricison of picking 7 keywords in 10 keywords is 85%); To evaluate the performance of"SiteQ-Ttile", MRR(Mean Reciprocal Ranking) is used: in experiment it shows that"SiteQ"'s MRR value is bigger than"MITRE", and"SiteQ-Ttile"'s performance is better than"SiteQ". The final result is that the average precision is above of 27%, and average recall is above of 93%, when the input length come to more than 60 words. It shows that the Passage Retrieval System can satisfy the requirement talked above.At last, the application of the Passage Retrieval in Qusetion-Answering: Synonym expansion, Question Type Recognition, and Answer Extraction, are studied in this article.Our research was applied in Hai Tian Yuan Passage Retrieval System and Hai Tian Yuan Question-Answering System.
Keywords/Search Tags:passage retrieval, passage similarity, question answering
PDF Full Text Request
Related items