Font Size: a A A

Research On Feature Selection Algorithms For Partially Labeled Hybrid Data

Posted on:2023-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z C YanFull Text:PDF
GTID:2568306839468044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of internet of things,artificial intelligence and other information technologies.The scale and dimension of data are growing geometrically,while the high-dimension of data reduces the running efficiency of various machine learning algorithms and affects the classification performance of classifiers.Feature selection is an effective method of data prepocessing,which can effectively reduce data dimension,improve the data compactness and the efficiency of learning algorithms.As an important theory of granular computing,rough set theory has become an active research work in the fields of feature selection,knowledge discovery,data mining and so on.Many data sets in real-world applications are hybrid data which are composed of symbolic,numerical and missing features.Meanwhile,since obtaining labeled data often needs expensive resources or long experimental procedures,only a small portion of data objects are expected to be labeled,partially labeled hybrid data is generated.However,for partially labeled and dynamic data sets,existing feature selection algorithms based on rough set theory often need an amount of repeated calculation,and even reduce classification accuracy due to the existence of a large number of unlabeled objects.To fill this gaps,this dissertation conducts research on decision label annotation and feature selection on partially labeled hybrid data based on the rough set theory.The main research works of thesis are shown as follows.(1)When an object set is added into or deleted from the partially labeled hybrid data,through analyses on the change of local data neighborhood granules,the incremental updating mechanisms of information granularity are proposed.On this basis,incremental feature selection algorithms with the variation of a single object and group of objects are proposed.A series of comparative experiments on real datasets verify the effectiveness and efficiency of the proposed algorithms,and the group incremental feature selection algorithm is more efficient.(2)When a feature set added into or deleted from the partially labeled hybrid data,by analyzing the change of neighborhood granules in data set and combining with the incremental learning,the incremental updating mechanisms of information granularity are established.On this basis,the incremental feature selection algorithms are designed when a feature set added into or deleted from the partially labeled hybrid data.Theoretical analysis and comparative experiments on real datasets have verified the effectiveness and efficiency of the proposed algorithms.(3)Because the decision labels contain rich information,a large number of unlabeled objects may reduce the classification accuracy of feature selection results.An extended decision label annotation algorithm based on the enlarged neighborhood granule is proposed in this dissertation.To acquire the decision labels of unlabeled objects to improve the classification accuracy of the selected feature subset,a portion of decision labels of unlabeled objects are annotated by making use of labeled objects.Meanwhile,the feature measure of extended information gain is designed for partially labeled hybrid data.On this basis,an information gain-based semi-supervised feature selection algorithm is proposed.Experimental results on partially labeled hybrid datasets demonstrate the effectiveness of proposed algorithms.(4)In order to overcome the shortcoming of a single-measure in the existing feature selection algorithms for ordered datasets,the multi-measure based on dominance-based rough set is proposed,which not only considers the certain information,but also uses the discern information.And a greedy forward feature selection algorithm is designed.Theoretical analysis and relevant experimental results verify the effectiveness and efficiency of the algorithm.
Keywords/Search Tags:Partially labeled hybrid data, Feature selection, Granular computing, Rough set, Incremental learning, Multi-measure
PDF Full Text Request
Related items