Font Size: a A A

The Application And Research Of Big Data In Patent Information Analysis

Posted on:2017-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2349330503468227Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Experiments show that the improved algorithm and the design of the parallelization of text clustering based on MapReduce have a good effect when dealing with patent texts, verify the theories and technologies of big data can be used in the analysis of patent information.With the rapid development of science and technology, patent as an important indicator of technological innovation, has attracted much attention. Scientific research institutions and enterprises have been more concerned about the mining of the patent information. Although patent texts have been classified by a specific method, it's hard to mine the deep information by the traditional methods based on the statistical analysis because of the unstructured and the explosion of the patent texts. When using text mining technology to analyze and process patent texts, lack of scalability of algorithms and processing capacity of the data platforms is presented. The rise of big data has brought new opportunities for the patent data analysis, using the theories and tools of big data to process the patent texts is a new trend.Based on the target of the analysis of patent texts, this paper analyzed applications of big data in the analysis of the patent information and took the clustering as the hitting-point to improve the traditional K-Means text clustering algorithm according the characteristics of patent texts. Finally this paper made a parallel design of the process of patent texts clustering combined with the big data processing platform Hadoop and its parallel processing framework MapReduce. The research of this article is as follows:(1)According to the current difficulties of patent information analysis, the requirements analysis was completed. And then the applications of big data is analyzed in the analysis of the patent information, combing with the theories and technologies of big data.(2)According to the result of requirements analysis, patent texts clustering was carried on research as a hitting-point. According to the requirements of patent texts clustering, the traditional K-Means clustering algorithm was improved by designing a method to delect the outliers and a density-based strategy to choose the original clustering centers.(3)Combing the characteristics of MapReduce, the whole process of patent texts clustering was designed in a parallel way, including word segmentation, feature selection, TF-IDF weight calculation, text representation and clustering using the algorithm proposed in this paper.(4)At last, the effect of the improved K-Means algorithm and the feasibility of parallel design of patent texts clustering are tested by establishing a Hadoop, using several groups of data and designing some experiments.
Keywords/Search Tags:big data, patent, text clustering, MapReduce
PDF Full Text Request
Related items