Research On Feature Selection Algorithms For Partially Labeled Hybrid Data

Posted on:2023-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Yan

Full Text:PDF

GTID:2568306839468044

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of internet of things,artificial intelligence and other information technologies.The scale and dimension of data are growing geometrically,while the high-dimension of data reduces the running efficiency of various machine learning algorithms and affects the classification performance of classifiers.Feature selection is an effective method of data prepocessing,which can effectively reduce data dimension,improve the data compactness and the efficiency of learning algorithms.As an important theory of granular computing,rough set theory has become an active research work in the fields of feature selection,knowledge discovery,data mining and so on.Many data sets in real-world applications are hybrid data which are composed of symbolic,numerical and missing features.Meanwhile,since obtaining labeled data often needs expensive resources or long experimental procedures,only a small portion of data objects are expected to be labeled,partially labeled hybrid data is generated.However,for partially labeled and dynamic data sets,existing feature selection algorithms based on rough set theory often need an amount of repeated calculation,and even reduce classification accuracy due to the existence of a large number of unlabeled objects.To fill this gaps,this dissertation conducts research on decision label annotation and feature selection on partially labeled hybrid data based on the rough set theory.The main research works of thesis are shown as follows.(1)When an object set is added into or deleted from the partially labeled hybrid data,through analyses on the change of local data neighborhood granules,the incremental updating mechanisms of information granularity are proposed.On this basis,incremental feature selection algorithms with the variation of a single object and group of objects are proposed.A series of comparative experiments on real datasets verify the effectiveness and efficiency of the proposed algorithms,and the group incremental feature selection algorithm is more efficient.(2)When a feature set added into or deleted from the partially labeled hybrid data,by analyzing the change of neighborhood granules in data set and combining with the incremental learning,the incremental updating mechanisms of information granularity are established.On this basis,the incremental feature selection algorithms are designed when a feature set added into or deleted from the partially labeled hybrid data.Theoretical analysis and comparative experiments on real datasets have verified the effectiveness and efficiency of the proposed algorithms.(3)Because the decision labels contain rich information,a large number of unlabeled objects may reduce the classification accuracy of feature selection results.An extended decision label annotation algorithm based on the enlarged neighborhood granule is proposed in this dissertation.To acquire the decision labels of unlabeled objects to improve the classification accuracy of the selected feature subset,a portion of decision labels of unlabeled objects are annotated by making use of labeled objects.Meanwhile,the feature measure of extended information gain is designed for partially labeled hybrid data.On this basis,an information gain-based semi-supervised feature selection algorithm is proposed.Experimental results on partially labeled hybrid datasets demonstrate the effectiveness of proposed algorithms.(4)In order to overcome the shortcoming of a single-measure in the existing feature selection algorithms for ordered datasets,the multi-measure based on dominance-based rough set is proposed,which not only considers the certain information,but also uses the discern information.And a greedy forward feature selection algorithm is designed.Theoretical analysis and relevant experimental results verify the effectiveness and efficiency of the algorithm.

Keywords/Search Tags:

Partially labeled hybrid data, Feature selection, Granular computing, Rough set, Incremental learning, Multi-measure

PDF Full Text Request

Related items

1	Research On Online Streaming Feature Selection Algorithm Based On Granular Computing Theory
2	Research On Semi-supervised Feature Selection Model And Algorithm For Mixed-type Data
3	The Optimal Scale Selection For Multi-granular Labeled Decision Systems
4	The Optimal Scale Selection For Generalized Multi-granular Labeled Decision Systems
5	Research On Feature Selection Method Based On Dominant Neighborhood Rough Se
6	Research On Incremental Mechanisms Of Feature Selection And Robust Fuzzy Rough Computing Models For Ordered Data
7	The Research On Incomplete Label Distribution Feature Selection Based On Granular Computing
8	Research Of Rough Set And Granular Computing Theory Application In Air Tickets Recommender System
9	On Multi-Granular Labeled Classification For Spatial Remote Sensing Data
10	Research On Hybrid Attribute Data Knowledge Acquisition Method Based On Granular Computing