Font Size: a A A

Research And Application Of Text Mining In The Patent Automatic Classification Based On Neural Network

Posted on:2010-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:F MaFull Text:PDF
GTID:2178360275488262Subject:Information Science
Abstract/Summary:PDF Full Text Request
Patent information has documented the invention of human society and the trajectory of success. In the mass of the patent database, in order to find the required patent information, each of patents will be allocated in international patent number (IPC). However, patent classification still use manual operation. Manual classification is obviously inefficient, high cost and poor consistency of resluts. Therefore, the realization of the patent text automatic classification has an important significance.Automatic classification of patent refers to the classification of a given system that automatically determines the process of the categorization in accordance with the contents of the patent text (title, summary). Because of patent information is contained in the large-scale, unstructured text information, in order to find the available knowledge hidden in them, this article will introduce text mining technology into auto-classification system. The use of radial basis function RBFNN algorithm that realize this system.In the process of feature vector construction, first, selecting ICTCLAS to separate the word, and on this basis, in order to further improve the accuracy of segmentation, the word of the IPC message added to the existing dictionary. Then, using the information gain (IG) and mutual information (MI) as a standard feature selection to reduct the dimensionality. Finally, using the classical formula (TF×IDF) to calculate the weight of the characteristics in the vector space model (VSM). At the same time, in order to reflect the difference between the different location information, proposing a algorithm that consider the position of information to calculate the weight of the characteristics (PTF×IDF algorithm).In the process of classification, the use of the RBFNN to relize the automatic classification of patent text. During the classification process, using k-means cluster method to get the nodes number and center of the hidden layer, through adjusting different values of RBF widths, get the best performance of RBFNN classifie. Then selecting the least square error method to calculate the connection weights of output layer and save them. At last, classfying the test samples. The experimental results show that the patent based on text mining automatic classification system has better classification results, the average F1 value is higher than 70%.
Keywords/Search Tags:Patent, Automated Text Categorization, Text Mining, Radial Basis Function Neural Network
PDF Full Text Request
Related items