Font Size: a A A

Research On Patent Classification Method Based On Similarity Measure

Posted on:2021-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:T TongFull Text:PDF
GTID:2518306047981589Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid change of human society,the speed of technological innovation is faster and faster,and the industry competition in the same field becomes more and more fierce,which puts forward higher requirements for the technological innovation of enterprises in a certain field.As a kind of special knowledge text with rich information,patent provides strong support for technology development in a series of fields such as education,finance and production.In the face of massive Patent Texts,how to change the traditional text classification algorithm to adapt to the characteristics of patent texts has become an urgent problem to be solved.Similarity measurement method is a kind of method to study the distance between samples.At present,the patent classification algorithm based on statistics and machine learning has been relatively mature,and it is difficult to improve the classification accuracy.Therefore,how to choose an excellent similarity measurement method to achieve better classification effect is the current research focus and hot spot.According to the characteristics of Patent Texts and the existing similarity measurement methods,this paper proposes a new similarity measurement method.First of all,TF-IDF is used as the feature selection method of patent abstracts.Considering the influence of low-frequency words of patents on the classification results,CHI-square is introduced and a new similarity measurement method of patent abstracts is proposed based on the cosine similarity of included angle,which improves the accuracy of patent classification.Based on this method,considering the structural and unstructured characteristics of patents,a hybrid similarity measurement method based on IPC classification number and abstract is proposed.This method takes into account the similarity of IPC patent classification number and patent abstract to further improve the accuracy of similarity measurement and classification.Secondly,on the basis of sao-x structure(i.e.considering the purpose of the patent on the basis of subject-predicate-object,and focusing on the form of for,to and gerund phrases),this paper extracts the SAO-x structure of the claims of the patent text as the characteristic item,and proposes a new multi-dimensional similarity measurement calculation method based on Jaccard distance and Mahalanobis distance,through which SAO-x structure similarity is calculated to realize the classification of patents.Finally,the effectiveness of the two similarity measurement algorithms proposed in this paper is judged by experiments.
Keywords/Search Tags:Patent classification, Similarity measure, CHI-square, SAO-x structure, Mahalanobis distance
PDF Full Text Request
Related items