Font Size: a A A

Based On The Rapid Large-scale Text Hierarchical Classification Problem Of Centralized

Posted on:2012-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y D MiaoFull Text:PDF
GTID:2248330371465746Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently large scale text classification is a very popular topic, especially in web data classification and hierarchical classification. It is even more difficult for some traditional classifiers to get good results while processing the Wikipedia-based data which has large number of categories.In this paper we propose a centroid-based classification method to process the data which has a large number of categories. This method can not only reduce the training cost,-but also get good results in some categories which have few, training samples with a huge number of features. By using feature combination, the method also has better performance, The system is evaluated on the 1st LSHTC evaluation. The results show that the method has good performance in all evaluation tasks.When the category size of the training data set is huge, common linear search methods have poor performance in time complexity. Based on this consideration, in order to improve the prediction (test) speed, using the hierarchical information can speed up the prediction speed. By comparing with other methods which participated in LSHTC evaluation, our system has great advantage in both time and memory complexity without hierarchal information. After using the hierarchal information, our method is 5-6 times faster than our own flat method.In the multi-label classification problem, we process a method to transfer this problem to single-label method by using label expansion and rebalanced weighting method. By using the combined ranking algorithm in our multi-label classification prediction, the system also has a good performance. Our system is evaluated on the 2rd LSHTC evaluation and the results also show that the system has good performance in both tasks.
Keywords/Search Tags:Text classification, multi-label classification, class centroid, feature selection
PDF Full Text Request
Related items