Font Size: a A A

The Research Of Automatic Text Categorization System Based On Neural Networks

Posted on:2008-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z L WangFull Text:PDF
GTID:2178360215987737Subject:Information Science
Abstract/Summary:PDF Full Text Request
Automatic text categorization (ATC) refers to the task of automatically sorting a set of documents into categories from a predefined set. ATC is an effective means of organizing and managing the massive information resources, and facilitates the information's storage, retrieval, transmission, development and utilization. Therefore the research of ATC has practical significance and applicable values.As a frequently used method of ATC, neural network classification method has the advantages of self-learning ability and robustness, but also has the shortcomings of long training time and poor interpretability. Based on the characteristics of Radial Basis Function Neural Network (RBFNN), such as simple network design, fast convergence, well generalization and better explanation, this paper has a deep study on the performance of RBFNN algorithm in Chinese ATC.The RBFNN ATC system can be divided into two main processes: text to vector space model and construction of RBFNN classifier. First, used ICTCLAS which have developed by the Chinese Academy of Sciences for text segmentation and the disposal of stop words; Then selected different methods of feature selection and weighting to take out features for the construction of text vector space. Second, used k-means cluster method to get the nodes number and center of the hidden layer, through adjusted different values of RBF widths, get the best performance of RBFNN classifier, at last selected the least square error method to calculate the connection weights of output layer. The experiment results of average F1 value higher than 85% showed the pretty performance of RBFNN on ATC.An evaluation index system of ATC system is also proposed based on the Analytic Hierarchy Process (AHP). According to the judgment matrix get from expert questionnaire, calculated the weighting of indicators by Expert Choice software. The results guided the process of designing and performance testing in the ATC system.
Keywords/Search Tags:Automated Text Categorization System, Radial Basis Function Neural Network, Influence Factors of ATC System, Analytic Hierarchy Process
PDF Full Text Request
Related items