Font Size: a A A

Design And Implementation Of Cross-language Parallel Retrieval System Based On Hadoop For Patent

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:X ShenFull Text:PDF
GTID:2308330503958929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Patent data contains a lot of technical information。It plays an important role in development of science and technology. People can get a lot of useful information through searching patent data. However, different countries use different languages to write and save patent data. This caused some difficulties to people retrieve foreign patents. At present, patent data increase rapidly. How to deal with huge amounts of patent documents is a problem that patent retrieval system needs to solve. Considering the above two problems, this paper designed and implemented a cross-language parallel retrieval system based on Hadoop for huge patent. The main work includes:1. This paper detailed analysis of query translation technology based on bilingual dictionary for cross-language retrieval. And Put forward a disambiguation method using topic model. The characteristics of the method are: according to the theme of the word distribution to choose the appropriate translation. Then this paper did an experiment to show that the method can effectively solve the disambiguation problem of query translation.2. This paper designed for massive distributed storage method of patent literature. Structured information such as property and patent document vector is stored in HBase. Patent text is directly deposited into the HDFS. This paper also designed a parallel retrieval based on the Hadoop.3. This paper designed a cross-language parallel retrieval system based on Hadoop for huge patent using above research results. This system also provides the query translation error correction, relevant feedback, check the patent translation and so on. Users can obtain the required information more quickly through this system.
Keywords/Search Tags:cross-language retrieval, parallel retrieval, topic model, similarity, Hadoop
PDF Full Text Request
Related items