| At present,the research on patent similarity is very important.When the user applies for a patent,they need to conduct similar patent searches in the patent database to prevent patent infringement,to conduct patent novelty search and to get inspiration from similar patents.Therefore,this puts forward certain requirements for the retrieval of similar patents.The claim of the patent text is its core content,which comprehensively expounds the technical scope protected by the patent.Patent similarity is usually judged based on the claim of the patent.This paper conducts an in-depth study of the claim of the patent Based on the SAO sentence features of patent texts,a patent text similarity algorithm of syntactic representation is proposed This paper is based on the assumption that similar keywords and sentences will appear in similar patents.Firstly,the sentence similarity between patent texts is calculated by the keyword semantic information and structure characteristics of the sentence,and then the patent texts similarity is calculated by the sentence similarity.The main work of this paper is as follows:Firstly,the text mining technology is used to extract the keywords in the patent texts;for the words with poor effect of word-segmentation,after summarizing its word-formation rules,a rule-based named entity recognition technology is proposed to identify them accurately.Then,considering that the expression of the sentence in the patent text has a syntax structure of Subject-Action-Object(SAO),Subject-Action(SA)or Action-Object(AO),5 this paper proposes a kind of Chinese patent text similarity algorithm by cutting the patent text into a set of sentences.The semantic information of the keywords in the sentences and the position information of the keywords are comprehensively utilized to calculate the similarity between the patents' sentences.In addition,since the "sub-sentence" in the text is sequential,there is a connection between the Msub-sentencen before and after the text.Therefore,the set of patent text sentences can be treated as a set of time series,and the similarity between patent texts is calculated by comparing the similarity between sentence sequences by the DTW algorithm.Finally,we verify the effectiveness of this algorithm through experiments.The experimental result shows that the patent texts similarity algorithm proposed in this paper is better than the traditional algorithm. |