Font Size: a A A

Research And Application Of Text Categorization Algorithm Based On Rough Set Theory

Posted on:2008-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WangFull Text:PDF
GTID:2178360215471615Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the information resources on Internet increase exponentially. As a result, it is almost impossible to deal with the mass information manually. In the recent years, more and more researchers are concerning how to organize and manage the information efficiently and effectively. At present, traditional information retrieval isn't appropriate for disposing large amounts of text data. The more effective retrieval algorithms are needed by users to dispose documents by their importance or relevance, to compare their classes, or to find the model and trend of multi-documents. The key technologies toward this goal, text categorization is focused widely by researchers.Recently text categorization are important research areas in information technology, we advance an automatic text categorization way based on Rough Set Theory, applying rough set theory to searching harmful information in BBS has a great theory significance and practice value, it can effectively analyze and process the inaccurate, inconsistent and uncertain information without any prior information. At present, the application of Rough Set Theory for machine learning, knowledge acquisition, decision analysis, knowledge discovery, expert system and pattern recognition has been proved to be very successful. The main advantage of rough set theory is Methods of text categorization have been researched in this paper, which mainly includes attribute reduction methods and value reduction algorithm. The mainly works are shows as follows:1.Describing the processtion of text categorization and rough set theory, analysing and comparing several algorithms of text categorization;2.We mainly study the algorithm of attribute reduction. Several reduction algorithms are proposed, an improved algorithm of attribute reduction based on rough set and Tabu search is developed. The effectiveness of the algorithm is demonstrated by experiments. And in this paper, we discuss the value reduction algorithm in rough set theory.3. Text miming technology is used in BBS information monitor system to improve the searching ability to weed out the harmful information by the well-trained text categorization model, clean up the net environment. Lastly, training the model by the new text in often with the time of using to make the categorization result good more and more. It will be a great useful help in the study and application life.At the end of this paper, contents of this paper are summarized, and the orientation of text data mining is proposed.
Keywords/Search Tags:Rough Set, Discretization, attribute reduction, decision table
PDF Full Text Request
Related items