Font Size: a A A

Research On Feature Selection Method Based On Neighborhood Rough Set

Posted on:2019-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2428330548969533Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gene chip technology is an effective tool for studying gene expression profile data.It has been widely used in medicine and other fields by analyzing thousands of gene data in gene expression profile data.The rapid growth of gene expression profile data shows the characteristics of large scale and complex content.This not only results in the increasing dimensions of feature space and reduces the efficiency of learning algorithms,but also makes the appearance of a large amount of redundant data interfere with the experimental results.With the development of gene chip technologies,there is a great influence on cancer research.Microarray gene expression profile data has been widely used in the identification of cancer biomarkers or key genes,effectively promoting the development of traditional histopathology,and improving the accuracy of cancer diagnosis and classification.The further understanding of the cancer etiology of discovering new therapies is strengthened.The gene expression profile data has high dimensional characteristics,and the traditional classification method for gene data has poor classification accuracy.Therefore,feature construction and gene selection are applied to gene expression profile data to overcome high-dimensional problems.Feature selection has been applied to process microarray gene expression profile data,which is the process of selecting the smallest subset of information genes.The smallest subset of these genes is the most predictive of the classification model of genome,which allows the classifier to accurately classify samples.The goal of feature selection algorithm is to minimize the feature space of the microarray data,so that the most important attributes are selected to improve classification accuracy.With the rapid development of rough set theory and its application,many rough set models have become effective tools to process uncertain data and perform feature selection,rule extraction and knowledge discovery.This article focuses on the study of gene expression profile data from the perspective of optimizing feature selection.Based on the related concepts of neighborhood rough sets,it improves the classification accuracy of feature selection algorithm and reduces the time-consuming of algorithm.It can effectively process some genes expression profile data.The main research contents of this article are as follows:(1)The information in the boundary domain of the rough set is uncertain,and the information in these areas is very important.The uncertain information plays an important role in attribute reduction.Aiming at the characteristics of fuzzy boundaries in neighborhood rough sets,a feature selection method in neighborhood rough sets based on dependence degree and distance function is presented.Firstly,based on the neighborhood rough set model,the concepts of the neighborhood dependence and the necessity of attributes are introduced.Then,a distance function about the mean definitions of the upper and lower approximate sets is given in neighborhood decision systems.A feature selection method based on neighborhood rough sets is proposed,and applied to feature selection of cancer gene data.A feature selection algorithm in neighborhood rough sets based on the dependence degree and the distance function is constructed.Finally,the different classifiers are used to test the algorithm.Experimental results show that this proposed method is effective and feasible,and by comparing with the existing feature selection methods,it has better classification performance and can effectively deal with uncertain information in the boundary domain of the neighborhood rough set.(2)Since the traditional clustering algorithms only pay attention to the distance relationship among data,and ignore the problem of global distribution data structure,this paper proposed a feature selection method based on EK-medoids cluster and neighborhood distance.First of all,it calculated the effective distances between data samples by using the sparse reconstruction method,and constructed an effective distance-based similarity matrix.Then it matrixed the similarity introduced in the K-medoids clustering algorithm,and obtained these new cluster centers.This paper developed an EK-medoids clustering algorithm which can effectively cluster these original data sets.Finally,it investigated a neighborhood distance in neighborhood rough set,and according to the classification results of clusters,it defined an attribute importance based on the neighborhood distance,and designed an EK-medoids cluster and neighborhood distance-based feature selection algorithm on the basis of heuristic searching method,which can further reduce the time complexity of cluster algorithms.The experimental results show that our proposed algorithm not only effectively can improve the accuracy of the clustering results but also select the feature subset with high classification accuracy.(3)Aiming at the characteristics of slow convergence speed and time-consuming of the algorithm,a feature selection algorithm with ant colony clustering optimizing the neighborhood is proposed.Firstly,a total deviation error of each sample is calculated by the traditional ant colony algorithm from each sample to its corresponding clustering center,and a set of random numbers is generated to check the error and find the best path with the smallest deviation error.The pre-clustering is performed on the original data set,and then feature selection is performed according to the neighborhood distance of each cluster.Finally,the clustering is performed again on the selected feature subset to verify the classification accuracy of the selected feature subset.The experimental results show that the proposed algorithm can select the feature subset with higher classification accuracy.
Keywords/Search Tags:Feature selection, Neighborhood distance, Neighborhood rough set, Cluster
PDF Full Text Request
Related items