Font Size: a A A

Application Of Semantic Association In Patent Text

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:T Y FuFull Text:PDF
GTID:2428330611997327Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of China's economy,the importance of patent is increasing.As a new element resource,intellectual property,represented by patent,plays an increasingly prominent role in the fierce industry and market competition.At present,China is the only industrial country in the world,and the number of invention patents in China ranks first in the world for three consecutive years.Patent is the carrier of technical information.Patent is the best source of technical information.The classification of patent becomes more and more important.Processing and classifying a large number of patents can provide strong guidance for the development trend of industries or enterprises.The traditional manual classification method takes a long time and needs a lot of knowledge reserve.The purpose of this paper is to use KNN algorithm combined with semantic association technology to improve the classification function of patent documents and reduce the time requirements of the algorithm,so as to achieve high efficiency and low time consumption of patent classification.Due to the characteristics of patent writing,patent abstracts are closely expressed,the language is more concise,accurate and standardized,which makes people clear at a glance.The features of patent abstracts are highly generalized.Therefore,compared with traditional texts,patent abstracts are more suitable for targeted algorithms and semantic association classification methods.In the actual patent classification,large classification is carried out according to the patent number.The research data of this experiment is ship and marine engineering patent,which belongs to small class subdivision and is known to be divided into several categories,so KNN algorithm(proximity algorithm)is used here.The disadvantage of KNN algorithm is that when there is a lot of data in the experiment,the computational efficiency of the algorithm without KNN is obviously reduced.The K-neighbor algorithm pre-processing before data testing,based on these shortcomings of the algorithm,so three decisions are introduced into the K-neighbor algorithm,pre-processing the data in advance,achieving the effect of dimension reduction,and improving the efficiency of the K-neighbor algorithm.The main work of this paper is as follows:(1)Using sense LDA theme model,semantic association and TF-IDF algorithm to extract keywords.In the process of patent document processing,the first step is to preprocess theobtained data,which includes word segmentation and stop word processing technology.Then,according to semantic association and td-idf algorithm,the key words and their corresponding weight values are calculated.Finally,the subject word classification of patent abstract text based on sense LDA subject model is constructed.Finally,the dimension reduction of patent abstract document is achieved And keyword extraction function.(2)Three decision-making improved KNN algorithm to classify patent documents quickly.According to the key words extracted in(1),after reducing the dimension of data,three decision-making processing datasets are used to improve the accuracy and running time of KNN algorithm.
Keywords/Search Tags:Semantic analysis, KNN, Three-way Decision, Sentence-LDA, TF-IDF
PDF Full Text Request
Related items