Research On Patent Text Clustering Based On Improved K-means Algorithm

Posted on:2021-04-27

Degree:Master

Type:Thesis

Country:China

Candidate:T F Li

Full Text:PDF

GTID:2428330647464148

Subject:Computer Science and Technology

Abstract/Summary:

Patents record the contents of invention and innovation and contain the technical information of various disciplines.As the cornerstone of technology development,patent not only shows the latest status quo of technology development,but also guides the future development direction of technology in various disciplines.With the rapid development of science and technology in various disciplines and the continuous acceleration of the situation,the patent data is growing exponentially.With the continuous increase of the accumulation of patent information,it is of great value for competitive intelligence to obtain valuable patent information from a huge number of patent information databases.Patents contain important technological invention and innovation information,and the big data analysis of patents can obtain important technological development trends,which is of great significance to grasp the technological competitiveness and the future development plan of technology.From the huge amount of patent information,the valuable information contained in the patent is mined through the data mining technology,and the worthless patent information is removed.Then the similar patents are analyzed and compared by clustering,and the associated information,complementary information,citation and citation information and development trend information of related patents are extracted.Data mining techniques include clustering algorithm,neural network method,decision tree method,genetic algorithm,rough set method,fuzzy set method,association rule method and many other algorithms.K-means algorithm belongs to a kind of clustering algorithm,the application of k-means algorithm to cluster analysis on the patent data,and response to k means clustering algorithm is sensitive to noise,selection of the initial clustering center of the random way lead to the clustering results are difficult to stable,have improved algorithm needs the author select necessary parameters,the results are dependent on the value of the parameters set and other issues,put forward an improved algorithm of gradient transition,do not need to set parameters,unsupervised access to the initial clustering center,can effectively remove the noise points,through the simulation experiments on UCI machine learning text data set,The algorithm has strong stability,anti-interference and accuracy.The fluctuation of clustering results is about 5%,and the noise points can be removed by 96%.It can be applied to text clustering problems.By the steel industry related patents as data analysis,natural language processing method is applied to convert the patent text to a weight,k-means of improved algorithm are optimized by the gradient transition to clustering of patent,and then find keywords,through collecting the characteristics of the further deep information of data mining,to the related patent data set and can respond with labels,improve the ability and efficiency of patent analysis.

Keywords/Search Tags:

patent analysis, feature extraction, gradient transition, clustering, initial clustering center, transition factor

Related items

1	A Patent Clustering Method Based On The Characteristics Of Multiple Problems
2	Design And Development Of Patent Extraction And Analysis System
3	Precise Clustering Algorithm For Chinese Text Based On K-means
4	Ksummary Analysis Method Based On Adaptive Multiple Clustering
5	Research On Fuzzy Kernel Clustering Algorithm Driven By Viewpoint
6	Research On Clustering Algorithm Of K-medoids And Its Application In Text Clustering
7	Research On Text Clustering And Its Application In Topic Detection Analysis
8	Research On Problems Related To The Initial Center Selection In K-means Clustering Algorithm
9	Study On Text Clustering And Keyphrase Extraction Of Patent Document
10	Improvement Based On FCM Clustering Algorithm