Font Size: a A A

Research On Multi-label Text Classification Algorithm Based On Label Correlation

Posted on:2017-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:S H WangFull Text:PDF
GTID:2348330566956738Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Big Data era need effective text mining,text classification is one of the key technologies of text mining.The assumption of single class labels of samples have been unable to accurately describe the semantic information of the real object,and compared to the single-label classification,multi-label classification problems no doubt more in line with the characteristics and the law of real world objects.The current mainstream multi-label classifier includes CBA,CMAR,ML-kNN and other methods,but they all have some problems.The defect of the existing label correlation classifier makes the performance of classifiers insufficient in some aspects,which can not meet the demand of accurate classification.This paper proposes an improved Label Correlation multi-label classification model based on AdaBoost-SVM called LC-ASVM.This model determine frequent label pair by the degree of support among feature labels,clear labels relationship,and form a more accurate classification features;It gets label correlation coefficient between the pairs by using the Spearman rank correlation coefficient,and then builds the label correlation matrix;It measures the confidence that SVM matching all the categories by the projection distance between feature points and hyperplane,updates the label correlation matrix by the iterating of AdaBoost layer by layer,and the label correlation matrix tends to to stabilize with the convergence of AdaBoost classifier.In order to improve the accuracy,the time complexity of SVM classifier increases,and combined with Adaboost iterative strategy.To improve the efficiency-decrease problem caused by it,this paper further using K-mediods for data reduction,with the balance of efficiency and classification performance of LC-ASVM model.By the experimental of parameters optimization and comparation with other classification algorithms,it is assigned that the optimization parameter of LC-ASVM model,and we can study its generalization performance and accuracy.Experimental results show that this method improves the classification accuracy comparing with the original AdaBoost-SVM classifier and other mainstream multi-label classifier,and it has a good classification performance whether in large or small sample sets.
Keywords/Search Tags:multi-label classification, correlation analysis, AdaBoost, SVM
PDF Full Text Request
Related items