Font Size: a A A

Research On Multi-label Text Classification Methods Based On Rough Sets

Posted on:2017-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2348330512451230Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the widespread use of a variety of e-commerce platform and social media sites,many assessing information about the product performance and expensed experience has been accumulated on the Internet.The data imply the mode of consumption behavior for users,and the limitation of service information for businessmen.The analysis and exploration for such data have the important practical significance for analysis of user consumption behavior,decision-making of electronic business and improvement of the marketing strategy.Text mining is an important branch in the field of data mining,while the traditional single tag supervised learning methods are difficult to meet the demand of diversity of text information processing.Therefore,for text mining,the study of multi-label text classification method and the reasonable use of multi-label learning method to deal with all kinds of text data has great significance.As a kind of effective tool for dealing with uncertain information,rough set theory has made a lot of research results in the aspects of learning the rules of the classification and attribute reduction.Aiming at the practical application,the web document classification and aspect mining of comment,we propose the research of multi-label text classification method based on rough set theory.The main research contents and conclusions are as follows:(1)Building and analysis of the experimental corpus for multi-label textWe select a large number of web page documents and auto product reviews data as experimental corpus.After the corpus is processed by text mining methods,we build the Chinese multi-label text datasets.At the same time,aiming at the problems to identify more performance for comment text,we propose a kind of identification framework based on multi-label learning.(2)Multi-label text classification based on robust fuzzy rough set modelOwing to the uncertainty of multi-label data and noise data,a novel multi-label robust fuzzy rough classification model is proposed.The model is an extension of k-mean robust statistics fuzzy rough classification model that is used to solve the single label classification problem.Firstly,for each unlabeled instance,the membership with respect to each label is obtained by similarity measures.Secondly,according to the membership,the degree of correlation is defined.Finally,an appropriate threshold is given to demarcate the correlated and uncorrelated labels.On real multi-label text datasets,experimental results indicate that the proposed model is outstanding in multi-label classification for web page text.(3)Chained multi-aspect recognition method with label-specific features based on rough setsAiming at the evaluation of characteristics of the multi-aspect performance appeared in the automotive product reviews,we propose a chained multi-aspect recognition method with label-specific features based on rough sets.Through of extracting exclusive features for every label and building exclusive feature classifier chain,we can solve multi-aspect identification problem in this way.In the Sina car review corpus,compared with a variety of multi-label classification methods,the subset accuracy of the proposed method reaches up to 95%.Hence,our method was feasible for recognizing the multiple aspects of automobile reviews.
Keywords/Search Tags:Multi-label text categorization, Feature selection, Rough sets, Aspect recognition of product
PDF Full Text Request
Related items