Font Size: a A A

The Research On Text Categorization Technology Based On Hierarchical Categorization And Ensemble Learning

Posted on:2008-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:H W ZhangFull Text:PDF
GTID:2178360215469813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and spread of Internet, electronic text information greatly increases. It is a great challenge for information science and technology that how to organize and process large amount of document data, and find the interested information of user quickly, exactly and fully. As the key technology in organizing and processing large mount of document data, text classification for user to find the required information quickly. Moreover, text classification has the broad applied future as the technical basis of information filtering, information retrieval, search engine, text database, and digital library and so on.Research on text classification and its related technologies are done in the paper. From the angle of improving the speed, precision and stability, several methods and techniques are presented. Our primary works are as follow.1) Research on hierarchical text categorizationHierarchical classification is the taxonomy structure which organizes categories in tree structure to performance, hierarchical classification is a big improvement, and it is an effective classification method. Local top-down approach is a favorite hierarchical classification technique used in text. A local approach usually proceeds in a top-down fashion first picking the most relevant categories of the top level and then recursively making the choice among the low-level categories, children of the relevant top-level categories. The advantage of this approach is computationally more efficient, but it has to make several correct decisions in a row to correctly classify one example, and errors made at top levels are usually not recoverable. Be aimed at the above-mentioned characteristic, a new hierarchical text classification method is proposed: not only picking the most relevant categories at each inner level, but considering much strip route; And have considered every leaf node altitude in taxonomy structure, this may balance the result deviation that the altitude of leaf node brings about.2) Research on ensemble text categorizationEnsemble learning technology is used to solve identical question based on training many editions classifiers, it may obviously enhance the study system to exude the performance of generalization. The theory of ensemble shows that the two ingredients involved in the generalization error for it are the strength of the individual classifiers and the correlation between them. For this object, a method of individual choice for ensemble based on group decision is proposed, and the detail of ensemble learning is introduced. The proposed method is evaluated by the comparison of experiments with standard data sets in machine learning database and the existed classifier ensemble methods.
Keywords/Search Tags:Test Classification, Hierarchical Classification, Ensemble Learning, Generalization, Group Decision
PDF Full Text Request
Related items