Font Size: a A A

Research On Core Patent Recognition Method Based On Text Data Mining

Posted on:2022-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:M ChenFull Text:PDF
GTID:2518306473991659Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Core patents generally refer to the patents corresponding to the key core technologies of a certain product in a certain technical field.Identification of core patents in an industry is an effective way to mine the key core technology information of the industry,which can provide technical research and development guidance for enterprises in the industry.In order to more comprehensive,which can identify the core patent is proposed in this paper a kind of core patent recognition method based on text data mining,this method first through text mining related intelligent segmentation algorithm for patent field,then on the basis of the niche using modified PageRank algorithm on the patent rate to identify the core of the area under patent.The core patent recognition task in this paper includes three sub-tasks: classification sub-task,clustering sub-task and core patent recognition sub-task.In this paper,the intelligent algorithm used in the three sub-tasks is studied in depth,and the patent in the field of power system and equipment is analyzed by using the method in this paper.The main research contents are as follows:(1)A multi-feature patent text classification algorithm based on BERT-A-Bilstm is proposedThere are many words related to the professional field of the text,and it is difficult for the BERT model to obtain its accurate semantics.Based on this,this paper uses the improved TF-IDF algorithm to extract the statistical features of the text,and then stitching the semantic features extracted from the neural network as the final text features and input them into the Softmax classifier to get the classification results.Finally,the experiment proves that the proposed multi-feature patent text classification algorithm based on BERT-A-Bilstm has A good performance in each evaluation index.(2)A patent text clustering method based on improved K-means is proposedIn the patent clustering sub-task,in order to further classify the classified patents according to their technical topics,this paper proposes a patent text clustering method based on improved K-means.At present,most patent text clustering analysis uses the traditional Kmeans clustering algorithm.The random selection of the initial clustering center of the traditional K-means clustering algorithm will affect the clustering effect.In order to reduce the random initial clustering center instability caused by the clustering results,this paper proposes a combination of distance and the density of the initial clustering center selection method,according to the density of data points in the neighborhood and several positions from the identified the initial clustering center distance of the initial clustering center to establish initial fitness function,data points one by one according to fitness function value was chosen as the initial clustering centers,and in the Iris,Wine,and experiments on Cancer data set,verify the effectiveness of the proposed method,the method to a certain extent,reduce the random selection of the initial center lead to the influence of local optimal clustering results.At the same time,the patent text data from Botou area of Cangzhou City are used to make an empirical analysis of the proposed patent text clustering method to verify the feasibility of the proposed patent text clustering method.(3)A core patent recognition algorithm based on improved PageRank is proposedThe core patent recognition algorithm is a key step in the core patent recognition method in this paper.In recent years,when the relevant scholars use the PageRank algorithm or its related improved algorithm to identify the core patent,they ignore the inherent concentration and other characteristics in the patent literature citation network and the influence of time factors on the entry degree.Based on this,in this paper,the corresponding improving PageRank algorithm,puts forward the analysis and patent citation network centricity age in combination with the PR value of weight allocation method,and based on the literature aging rate from the aspects of time factors influencing patent evaluation result optimize the algorithm,through the analysis of the transfer matrix of Markov probability of convergence is proved in this paper,the improved PageRank.Experiments show that the improved PageRank algorithm in this paper comprehensively analyzes the influence of network centrality and patent age factors on patents,which is more conducive to identify patents with short publication time and high quality,and makes the identification of core patents more accurate.
Keywords/Search Tags:Core patent, Text mining, Text classification, Text clustering, Identification of core patents
PDF Full Text Request
Related items