Font Size: a A A

A Based On Eeps Of Chinese Text Automatic Classification Algorithm

Posted on:2007-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:H T XuFull Text:PDF
GTID:2208360185971225Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
To effectively organize and analyze massive Web information resource and help users promptly get knowledge and information they need, it needs to implement Web pages automatic categorization by their contents. The prompt development of Web not only provides an unprecedented experiment environment and application platform for text automatic categorization, but also a new challenge. Text automatic categorization, as a basis of Web page automatic categorization, develops promptly.The method of feature extraction based on DF has low computation complexity, and it has equivalent performance with the methods of DF and IG The method of DF is suitable for massive text classification task. However, because the method of DF only uses document frequency to scale the distinguish capacity, we find it has two disadvantages.Emerging Patterns (EPs) are itemsets whose supports change significantly from one data class to another. They can serve as a good classification model because they represent knowledge which discriminates between different classes of datasets. EPs have an excellent categorization performance. The eEPs (essential emerging patterns) is a special kind of EPs. The eEPs not only has all the virtues of EPs that are very useful for constructing accurate classifiers, but also has fewer quantities that are very efficient for mining and using them.The categorization methods based EPs view the samples as sets of items instead of the points in the n-dimension space. They build the classifiers by finding those patterns (itemsets) whose supports change significantly from one data class to another. The categorization methods based EPs have an equivalent performance whit C4. 5 and naive Bayes methods. The categorization methods based EPs have been applied in many fields successfully, such as DNA analysis, but we don't see the reports about applying categorization methods based EPs to text automatic categorization.
Keywords/Search Tags:Chinese text automatic categorization, Feature extraction, Document frequency, Distinguish capacity, Emerging patterns
PDF Full Text Request
Related items