Font Size: a A A

Research On Patent Information Retrieval Based On Distributed Multi-index Fusion

Posted on:2011-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:H J PuFull Text:PDF
GTID:2178330332961139Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information Retrieval is playing a more and more important role in our daily life. However, information retrieval is still incomplete, as people's need is growing all the time. This paper focuses on the research of Chinese patent retrieval and fusion model, and the research and application of the indices theory.This paper studies the methods of patent information search based on distributed system, including distributed indexing, distributed search which is also a basis of indices pool theory.For information retrieval from massive data, how to create the indices is an important research topic. This paper proposes a theory of indices pool, and discusses the indices' effect of search result, and also builds an application based on Nutch which is an implement of indices pool.This paper studies the theories and methods on Chinese patent searching. This paper fuses the results of keyword search and semantic search, to improve the recall ratio. Also, result set's related adjustment is performed, to make the relational patents to rank higher, so that the searcher would find them easier.The main research of this paper is:(1) This paper studies patent search based on distributed system, the main research is focused on distributed crawling, distributed indexing, distributed search.(2) This paper proposes a conception of application-oriented indices pool, and the method of evaluating indices is also discussed. This paper applies an application based on the indices pool theory and Nutch, which proves the correction of indices pool.(3) This paper proposes a model of information fusion. RSSI fusion model is designed again the feature of Chinese patent. RSSI model fuses the results of keyword search and semantic search, while the length of the results, relativity scores are considered on the strategy level, and also the recall ratio and the mean average precision are optimized.
Keywords/Search Tags:Information Retrieval, Fusion Model, Distributed Computing
PDF Full Text Request
Related items