Research On Multi-label Learning And Its Application In Text Classification

Posted on:2019-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:K Rui

Full Text:PDF

GTID:2417330551960996

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

The concept of multi-label learning originates from the text classification.With the development of the last decade,multi-label learning has become a new field of machine learning.A previous item,a picture,a document,often expressed as a fixed and single concept marker.However,with the explosive growth of the data generated by the Internet,in real life,a sample is often expressed as multiple concept marks,and it is also of multiple meanings.Therefore,the traditional single labels classification algorithms were not satisfied with the current demand.In recent years,the focused on the study of multi-label learning was aims to improve the precision of classification and efficiency of algorithm.It is lack of considering the data set whether if containing redundant features.In the mainstream multi-label feature selection algorithms,most scholars are calculated the information entropy between features and labels,then use entropy to measure the correlation between features and labels.However,most of these methods are not complementary and complex in computation,and these redundant features often restrict the performance of a multi label classification algorithms.Moreover,in the face of the limitations of high dimensional samples and the problem of boundary text classification,it also restricts the classification efficiency of classifiers.At the same time,in the field of text categorization,the limitation of high dimensional samples and the problem of boundary text classification also restrict the classification efficiency of classifiers.According to the above problem,thesis research work mainly includes:(1)Aiming at the problems of multi label learning,we have constructed the Rough Entropy of Positive Region and uses the Rough Entropy of Positive Region to measure the correlation between the features and the labels.By dividing each features and labels into subspaces,the important features are selected by certain proportion.The idea based on Rough Entropy of Positive Region just fills the deficiency of traditional information entropy.The selected features are more reasonable,and the dimension space of data sets is also reduced.(2)In the field of text classification,a new and effective kNN text classification algorithm is proposed based on the minimum risk cost of three-way decisions theory.According to the minimum risk cost loss theory,the risk loss value is set up and the document set in the boundary domain is found.In the boundary domain of the articles was found,and the membership degree is used to classify.This method greatly improved the performance of the kNN classifier.

Keywords/Search Tags:

Multi-label learning, Text classification, Rough sets theory, Rough entropy, Rough Entropy of Positive Region, kNN

PDF Full Text Request

Related items

1	Research On Tendentious Label And Streaming Data Feature Selection Algorithm
2	Using Rough Sets To Analyze The Factors Affecting University Student Employment
3	The Evaluation Research Of Postgraduate Education Quality On The Basis Of Rough Set Theory
4	The Application Of Rough Set Model In The Analysis Of The Results Of High School Mathematics Simulation Text
5	The Exploration Of Rough Set Theory's Application In Senior Middle School Physics Teaching
6	Study On The Model Of Large Equipment Purchase In University Evaluation Of Rough Sets Andtuple Linguistic
7	Analysis Of Examination Results Based On Rough Set Theory And Its Application
8	Research On Evaluation Model Of University Teacherâ€™s Research Ability Based On Rough Set And SVM Technology
9	Analysis Of Middle School Mathematics Test Based On Rough Set Theory
10	The Application Study On College Graduates’ Employment With Rough Set Theory