Font Size: a A A

Based On The Chinese Text Of The Rough Set And Neural Network Classification

Posted on:2009-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2208360242488524Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of communication and computer technology, the information-processing has become a necessary tool for people to obtain useful information. As one of the most important research areas on automatic information processing, Text Categorization has a extensive application prospect.Rough Set theory and Neural Network are widely used in pattern recognition systems, however, the application and research in the field of Text Categorization is rarely used. Although Rough Set can get obviously categorization rules with information reduction under the premise of not influencing the accuracy of Text Categorization, it is sensitive with noise data so the categorization rules are fuzzy. Neural Network has a strong ability to learn fuzzy data, but it can not remove uncertain and vague information and its performance is weakened because the vector of text is very huge. So this thesis combines the Rough Set and the Neural Network, designs and implements a Chinese Text Categorization System base on Rough Set and BP Neural Network. The main work includes:1) This thesis studies the traditional solutions to some key technical problems in the field of Text Categorization and analyses the characteristics of the existing methods. Introduces about Rough Set theory and Neural Network in detail, and analyses the advantage by combining the two methods.2) This thesis combines Rough Set theory and Neutral network perfectly, and on the basis of expressing the text in vector space model, constructs the model of Chinese Text Categorization System base on Rough Set and BP Neural Network. Firstly, the vector space is reduced by Rough Set. Then it is trained by BP Neural Network. The new text is classified by using the categorization knowledge getting from the training result. The ability of the two methods used in Text Categorization is improved greatly by combining them perfectly.3) In order to achieve the objective of overcoming the deficiencies of current attribute reduction algorithm, this thesis improves Rough Set's Johnson Algorithm by integrating the characteristics of text vector space model. The improved algorithm regards the degree of the feature of the vector as the heuristic information, has improved the reducing speed and can receive better result.4) An archetypal Chinese Text Categorization System based on Rough Set and BP Neural Network is developed, the system is tested with close test and open test by using objective and full-scale corpus of Chinese. The results show that the system has higher categorization accuracy and the method of Chinese Text Categorization based on Rough Set and BP Neural Network is feasible.
Keywords/Search Tags:Text Categorization, Rough Set, Neural Network, Vector Space Reduce, Attribute Reduction
PDF Full Text Request
Related items