Font Size: a A A

Research On Multi-label Learning And Its Application In Text Classification

Posted on:2019-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:K RuiFull Text:PDF
GTID:2417330551960996Subject:Statistics
Abstract/Summary:PDF Full Text Request
The concept of multi-label learning originates from the text classification.With the development of the last decade,multi-label learning has become a new field of machine learning.A previous item,a picture,a document,often expressed as a fixed and single concept marker.However,with the explosive growth of the data generated by the Internet,in real life,a sample is often expressed as multiple concept marks,and it is also of multiple meanings.Therefore,the traditional single labels classification algorithms were not satisfied with the current demand.In recent years,the focused on the study of multi-label learning was aims to improve the precision of classification and efficiency of algorithm.It is lack of considering the data set whether if containing redundant features.In the mainstream multi-label feature selection algorithms,most scholars are calculated the information entropy between features and labels,then use entropy to measure the correlation between features and labels.However,most of these methods are not complementary and complex in computation,and these redundant features often restrict the performance of a multi label classification algorithms.Moreover,in the face of the limitations of high dimensional samples and the problem of boundary text classification,it also restricts the classification efficiency of classifiers.At the same time,in the field of text categorization,the limitation of high dimensional samples and the problem of boundary text classification also restrict the classification efficiency of classifiers.According to the above problem,thesis research work mainly includes:(1)Aiming at the problems of multi label learning,we have constructed the Rough Entropy of Positive Region and uses the Rough Entropy of Positive Region to measure the correlation between the features and the labels.By dividing each features and labels into subspaces,the important features are selected by certain proportion.The idea based on Rough Entropy of Positive Region just fills the deficiency of traditional information entropy.The selected features are more reasonable,and the dimension space of data sets is also reduced.(2)In the field of text classification,a new and effective kNN text classification algorithm is proposed based on the minimum risk cost of three-way decisions theory.According to the minimum risk cost loss theory,the risk loss value is set up and the document set in the boundary domain is found.In the boundary domain of the articles was found,and the membership degree is used to classify.This method greatly improved the performance of the kNN classifier.
Keywords/Search Tags:Multi-label learning, Text classification, Rough sets theory, Rough entropy, Rough Entropy of Positive Region, kNN
PDF Full Text Request
Related items