Font Size: a A A

Automatic Classification Of Chinese Patents

Posted on:2018-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:S X NiuFull Text:PDF
GTID:2348330536460959Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development of society,science and technology are the first productive forces.Patent which carried technology information has become an important factor which measures the innovation capacity of a country and an enterprise.Patent documents are an important resource that can be used to protect the rights of individuals,organizations and companies.Therefore,the patent data research,processing,analysis,mining is of great significance,and patent classification is critical.Patent mining has achieve great process in recent years,so did text classification and patent classification technology.However the research of patent classification remains to be improved.Based on the basic framework and principles of text categorization,this thesis designs a Chinese patent automatic text classification system based on vector space model and word vector.The main contents of this system are as follows:(1)Firstly the abstract of the patent is downloaded as the original data set.Then,the patent text is processed and the representation of the patent text is derived.(2)Secondly,the word vector of the field of the patent can be obtained by training the patented documents.The final model is the representation of the text which is combined with the word vector.In this paper,two methods are proposed:the method of feature selection based on word vector and the method based on word vector and space vector model.(3)Finally,Using the machine learning method to train model,then it can classify the text through the model,and get the accuracy of classification.In order to verify the validity of the automatic classification of patent documents in this paper,this thesis uses the standard data set Stanford Sentiment Treebank(SST)to test.The obtained model uses support vector machine algorithm(SVM)and random forest algorithm(RF)as well as other classical algorithms to analyze the result.The Experimental results show that the proposed method is valid for the classification of Chinese patent documents.
Keywords/Search Tags:Vector space model(VSM), Stanford Sentiment Treebank(SST), support vector machine(SVM), random forest(RF)
PDF Full Text Request
Related items