Font Size: a A A

The Study Of Automatic Chinese Patent Classification Based On Deep Learning Theory And Method

Posted on:2017-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:S G MaFull Text:PDF
GTID:2308330503463925Subject:Library and file management
Abstract/Summary:PDF Full Text Request
In the current era of economic globalization, science and technology has become the first productive force. The progress of countries and enterprises are depending increasingly on innovation and technology. As a carrier of technology, the quantity and quality of the patents has become an important indicator of national innovation capacity, as a result, the number of patent applications were greatly increased. WIPO statistics show that the patent texts contained 90%-95% of the world’s inventions, characterizing the level of development of science and technology of the world. How to obtain and use the technology information to provide strategic support for the development of countries and enterprises has gained lots of interests from relevant experts and scholars.Nowadays, the patent texts are classified mainly by hand. The automatic patent text classification has begun as an adjunct. However, the massive automatic patent text classification has not been implemented, so automatic patent text classification is becoming more and more meaningful. Based on semantic features of patent text and automatic classification techniques, patent workers will be able to classify a large number of patent texts automatically and efficient. It will improve efficiency of classification and help to use the technology information from the patent text.Therefore, based on the basic framework and principles of patent text classification system, this paper designed an automatic Chinese patent classification model based on deep learning theory, which includes contents as follows. First, after the patent text preprocessing and feature selection, we got a formal representation of the patent texts; then based on the deep learning theory, we used the denoising auto encoder to build a deep learning network to automatically get low-dimensional feature codes of the patent texts, and we used support vector machine algorithm to classify the patent text on the top layer of the network, and we constantly adjusted the parameters of the layers according to the classification results until we got a good-behaving classifier; Finally, we used the classifier to classify sorted patent texts to obtain the accuracy, feasibility recall and F values of the classification test to validate this designed method. Furthermore, in order to verify the effectiveness and superiority of the method which this paper presented, we used the classical algorithms included K-nearest neighbor, support vector machine and the back-propagation neural network algorithms to classify the same patent texts to compare with the method this paper presented. The method this paper presented has gained an average of more than 95% classification accuracy and more than 94% recall rate on the test set, better than the classical algorithm, which proved its efficiency and superiority.
Keywords/Search Tags:patent text categorization, deep learning, support vector machine, denoising auto encoder
PDF Full Text Request
Related items