Font Size: a A A

Text Classification Based On A Novel Ensemble Multi-label Learning Method

Posted on:2016-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2308330473465341Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the dramatic development of Internet, the text resource increases exponentially. As an important way of organizing and managing large mount of text, text classification not only solve the problem of the use of the text, but also can extract latent and valuable knowledge. Text classification often boils down to a multi-label learning problme due to the fact a text owns multiple properties. As a new hot research field in machine learning, multi-label learning is an important means of modelling multi-meaning objects, and has widely applied to information retrieval, Web mining, Bioinformatics and automatic tagging.In order to discuss the application of multi-label learning to text classification, the study of the thesis includes text classification, processing of data set, pre-processing procedure of text, text transformation, feature selection, term weighting, multi-label classification, and multi-label performance measurement. On the basis, how to improve the algorithms of feature selection and multi-label classification is also studied.The main work of this thesis includes:(1) Text Classification Framework Based on Multi-label LearningText classification is often considered as a case of multi-label learning. On the basis of in-depth study of text classification and multi-label learning, this thesis proposes a novel framework of text transformation, multi-label learning algorithm and performance measurement.(2) Text Classification Based on Ensemble Multi-label Learning(En-MLKNN)Currently, the three feature selections have been applied successfully to text classification, but the degree of importance of each attribute varies depending on different applications. Based on proposed framework, this thesis improves the prediction performance of text classification by using ensemble method and proposes a novel multi-label learning method En-MLKNN based on the multi-label learning method MLKNN. Experiments on two classic datasets show that En-MLKNN algorithm is superior to some state-of-the-art multi-label learning algorithms in terms of several evaluation criteria.(3) Text Classification Based on the Ensemble Multi-label Cost-sensitive LearningThough En-MLKNN can be applied to text classification and achieves better performance, there exists class imbalance that needs to be resolved. Based on cost-sensitive learning, this thesis refines En-MLKNN and proposes En-MLCKNN to solve class imbalance problem. Experiments on same datasets show that En-MLCKNN algorithm outperforms some state-of-the-art multi-label learning algorithms in terms of several evaluation criteria.
Keywords/Search Tags:text classification, multi-label learning, cost-sensitive, machine learning
PDF Full Text Request
Related items