The Application And Research Of Big Data In Patent Information Analysis

Posted on:2017-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:P Liu

Full Text:PDF

GTID:2349330503468227

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Experiments show that the improved algorithm and the design of the parallelization of text clustering based on MapReduce have a good effect when dealing with patent texts, verify the theories and technologies of big data can be used in the analysis of patent information.With the rapid development of science and technology, patent as an important indicator of technological innovation, has attracted much attention. Scientific research institutions and enterprises have been more concerned about the mining of the patent information. Although patent texts have been classified by a specific method, it’s hard to mine the deep information by the traditional methods based on the statistical analysis because of the unstructured and the explosion of the patent texts. When using text mining technology to analyze and process patent texts, lack of scalability of algorithms and processing capacity of the data platforms is presented. The rise of big data has brought new opportunities for the patent data analysis, using the theories and tools of big data to process the patent texts is a new trend.Based on the target of the analysis of patent texts, this paper analyzed applications of big data in the analysis of the patent information and took the clustering as the hitting-point to improve the traditional K-Means text clustering algorithm according the characteristics of patent texts. Finally this paper made a parallel design of the process of patent texts clustering combined with the big data processing platform Hadoop and its parallel processing framework MapReduce. The research of this article is as follows:(1)According to the current difficulties of patent information analysis, the requirements analysis was completed. And then the applications of big data is analyzed in the analysis of the patent information, combing with the theories and technologies of big data.(2)According to the result of requirements analysis, patent texts clustering was carried on research as a hitting-point. According to the requirements of patent texts clustering, the traditional K-Means clustering algorithm was improved by designing a method to delect the outliers and a density-based strategy to choose the original clustering centers.(3)Combing the characteristics of MapReduce, the whole process of patent texts clustering was designed in a parallel way, including word segmentation, feature selection, TF-IDF weight calculation, text representation and clustering using the algorithm proposed in this paper.(4)At last, the effect of the improved K-Means algorithm and the feasibility of parallel design of patent texts clustering are tested by establishing a Hadoop, using several groups of data and designing some experiments.

Keywords/Search Tags:

big data, patent, text clustering, MapReduce

PDF Full Text Request

Related items

1	Research On Patent Technology-efficacy-application Diagram Building And Application Based On Mapreduce Computational Model
2	Patent Text Clustering Analysis And Visualization
3	Research Of The Patent Map And Its Application In The Biomedical Field
4	Design And Implementation Of Patent Information Management System Based On.NET
5	Research On Cross Language Patent Text Analysis
6	Clustering And Classification Of Data And Text Using Such Technologies As Genetic Algorithm
7	Research On Technology Opportunity Analysis Method Based On Outlier Patents
8	Hot Spot Identifiction Of Potential Patent Requirements Of Enterprises Based On Multi-Source Data
9	Exploration And Realization Of Land Price Classification Model Based On MapReduce
10	Study On Methods Of Data Mining And Text Mining Based On Fuzzy Logic And Neural Network