Font Size: a A A

Study On The Text Classification Method Based On The Optimization RS-BPNN

Posted on:2013-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZouFull Text:PDF
GTID:2248330374464386Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the prosperity of IT industry and the rapid development of Internet technology, the information resources also grow quickly and abundantly. Because most of information resources exist with the form of dynamic and heterogeneous Web text, how to find the information that people need conveniently in the huge data sea has become the focus of attention. Web text classification is a main way to solve the above problems. Therefore, this paper introduces and discusses the Web text classification technology in the following several aspects:First, an improved x2statistical method is put forward. Because the traditional x2statistics always distributes high weight to low-frequency words that have smaller classification ability, and then that feature words having strong classification ability is relatively distributed lower weight. Therefore, the improved algorithm considers the words’frequency in the documents, like this, it can avoid the above defect to some extent.Second, put forward the use of the optimization back propagation algorithm. This method is different from the traditional back propagation algorithm, in the process of the construction of the classifier, its learning step will be to fine-tune. Therefore, we need to calculate the corresponding learning step before adjusting the connection weights of the neural network. The calculation of learning step combines with delta-bar-delta rules, the reference of the rules can avoid the problems, such as:if the value of learning step is too high to fall in the local minimum value and if the value of learning step is too low to bring the network oscillation.Third, aim at the problem that using the vector space model (VSM) to represent a text can produce the high text characteristic dimension and then lead to the neutral network is not easy to convergence, combing the feature selection and rough set theory, a text classification method based on optimization RS-BPNN is raised. In this method, first of all, use the improved statistics proposed in this paper to reduce the dimensional, and then use the attribute reduction to delete the redundant characteristics, finally, use the optimum neural network to classify.Fourth, in this paper it designs and implementations a text classification experimental system based on an optimized RS-BPNN text classification algorithm. Then, we use the objective of the Chinese corpus to design several group comparative experiments for the system. Experimental results show that the optimization of RS-BPNN text classification can optimize the topology structure of the BP network and improve the BP algorithm slow convergence speed.
Keywords/Search Tags:Web text categorization, Rough Set, BP neural network, x~2statisticalapproach
PDF Full Text Request
Related items