Font Size: a A A

Research On Uncertainty Measure And Feature Selection For Weakly Multi-Label Data

Posted on:2019-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:S C GaoFull Text:PDF
GTID:2428330626452078Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Handling of multi-label data is a significant task in machine learning and data mining,which has raised extensive research by many scholars.However,it is hard and costly to obtain complete labels of data for learning and training in real world applications,such as image recognition,text categorization,etc..It is usually easy to obtain partially labeled multi-label data or multi-label data with missing labels,i.e.weakly multi-label data.So far,there are few studies on the uncertainty measurement issue in weakly multilabel data.Actually,uncertainty measurement helps us to disclose the substantive characteristics and more substantial content of weakly multi-label data.Information entropies have been used to describe and evaluate the uncertainty of single-label data.However,they are often used to handle single-label data and are not suitable for multilabel data,especially weakly multi-label data.Therefore,in this paper,we propose a new form of conditional entropy to measure the uncertainty in weakly multi-label data in order to further explore the value of weakly multi-label data and assist the task and application of feature selection for weakly multi-label data.In this paper,we focus on the uncertainty measurement and feature selection of weakly multi-label data.The main research work and contributions are as follows:(1)We propose a form of Monotonous Tolerance Conditional Entropy to describe and measure the uncertainty of weakly multi-label data.This new form of uncertainty measurement mainly deals with feature space and incomplete label space by means of the similar classes and the tolerance classes in fuzzy rough sets,and combines conditional entropy to complete new definition to make it more suitable for weakly multilabel data.We also prove the reasonability of the uncertainty measure definition in the weakly multi-label data by theoretical analysis and formula derivation.Experimental results on real-life data sets further demonstrate the important properties of the proposed uncertainty measure.(2)We present a feature selection algorithm for weakly multi-label data based on proposed uncertainty measurement.An important application of uncertainty measurement is feature ranking and feature selection.Based on the uncertainty measurement,we present the definitions such as attribute significance and reduction of weakly multilabel data,and propose a feature selection algorithm for weakly multi-label data.Comparison with other algorithms shows that our feature selection algorithm is effective.
Keywords/Search Tags:Uncertainty measurement, weakly multi-label data, conditional entropy, rough-set, feature selection
PDF Full Text Request
Related items