Font Size: a A A

Research On Chinese Text Classification Algorithm Based On Active Learning Approach

Posted on:2007-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360212967029Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatical Text Categorization is categorizing natural language texts according to given topics, which is a very important problem in natural language processing field. High dimensionality feature space is one of characteristics in text classification tasks while texts were represented by VSM. This situation inevitably increases computation complexity during training and leads to long training time. However there may be some texts in the traing set which do little help to classification and will decrease the classification accuracy. To resolve this problem, we apply active learning algorithm to text classification.Active learning strategy allows the learner to dynamically select more informative samples from the candidate training set to compose the training set and remove noise samples. One side the size of the training set is smaller then before when they achieves the same testing accuracies, so the training time can be decreased. On the other side that even can increase testing accuracies. So the training efficiency of text classification which has huge training samples and high dimentionality feasure space can be increased by active learning method.In our work, we applied active learning algorithm Rsm to text classification and implemented a Chinese Text Classification System based on Radial Basis Function Neural Network (RBFNN). The more important research is text classification algorithm based on active learning. Firstly, the text classification system was constructed. We used VSM to represent text, LTC to value the feature's weight and RBFNN to categorize. Secondly, we applied three active learning algorithms Rsm, Rand and QBC to system and implemented them based on RBFNN.Experimental results show that Rsm contrasted to other two active learning methods has more advantages on testing accuracy and time expenditure.
Keywords/Search Tags:Nature Language Processing, Text Classification, Feature Selection and Extraction, Active Learning
PDF Full Text Request
Related items