Font Size: a A A

Application Of Rough Set Theory In Chinese Text Categorization

Posted on:2007-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W F DuFull Text:PDF
GTID:1118360212459956Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
As the explosive increment of data, information processing has become the indispensable tool for people to acquire information and knowledge. Text categorization is the important research field of information processing. It is the process of automatically determining the categorization of some text according to the content of text, which is under the established categorization system. This paper has made some research and discussion about the critical point of text categorization, including Vector Space Model, the fuzzification of real valued word frequency vector, knowledge acquisition based on rough set, the computation of rule strength in knowledge base, the conflict resolution when disagreement among the results of the related rules and so on, moreover, the complexity comparison with other method of categorization is given. In addition, we realize a text categorization system with the method, which acquires the experimental result of the categorization accuracy. The main content is listed here:Part One: Knowledge reduction research based on rough set theory1. The improvement of Skowron discernibility matrix. The condition that the elements in Skowron discernibility matrix satisfy is improved. Thus, the verification to the condition is simpler and the elements satisfying the condition in discernibility matrix are less. Then the complexity of computing reducts with discernibility function decreases effectively;2. The relationship among several knowledge reduction approaches. As to decision table, positive domain reduction, entropy reduction, distributive reduction, distribution reduction, approximate reduction and so on are introduced from different views. It is proved that the distributive reduction and the entropy reduction are equivalent. Furthermore, as to consistent decision table, positive domain reduction, entropy reduction, distributive reduction, distribution reduction and approximate reduction are all equivalent.3. Logical characteristics of knowledge reduction approaches. The knowledge of decision table is represented with rules, which can be looked as formulae of non-classical logical system. This paper will study the logical characteristics of knowledge reduction by logical approach. It is proved that the rules are equivalent between fore and aft the reduction of decision table under the...
Keywords/Search Tags:data mining, rough set, fuzzy clustering, text categorization, vector space model
PDF Full Text Request
Related items