Font Size: a A A

Chinese Text Classification Based On Active Learning

Posted on:2007-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:G X SunFull Text:PDF
GTID:2178360182493932Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularization of the information technology, the effective automatic classification of information, especially for the Chinese text information is an important branch field studied in Chinese information processing at present.At present most classification systems of Chinese text have been adopted "training-classifying" mode. It is a kind of "passive learning process ". Adopting this kind of method, performance of classification system totally depends on quality of the training material, lacking adaptability, and not suitable for processing of the magnanimity, heterogeneous, dynamic text information. To this situation , we have proposed a kind of Chinese text classification system based on active studying called ALCTCS, and has been further investigated the structure of this system, feature selection and classifying course.First of all, we have constructed the system model of ALCTCS. Training course extends to classifying course, drives classifying, while the feedback of classifying result trains the system again, make they become an organic whole. This system has introduced the active learning mechanism during training, thus broken the relatively independence of training and classifying in traditional mode. It can overcome complete dependence of the training material again in the course of training at the same time, thus strengthened adaptability of the system.Secondly, on the basis of TF and MI, we have introduced the "reserve-choose" mechanism for feature selection, have constructed and designed the feature selection algorithm based on wrapper approach, this algorithm overcomes the information-losing problem in algorithms of making up, and can reflect changes of feature in text.Moreover, we have constructed and designed the text classifying algorithm based on clustering. It can balance the contradiction between time efficiency and classifying accuracy in the classifying course.Finally, we coded the two algorithms, and analyzed their performance through the experiment.
Keywords/Search Tags:Chinese text, classification, feature selection, wrapper, clustering
PDF Full Text Request
Related items