Font Size: a A A

Research On Stratified Feature Selection Algorithms For High Dimension Data

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:R J ChenFull Text:PDF
GTID:2428330620958457Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the number and dimensions of data we can get are growing at a geometric speed.This poses a great challenge to our analysis of the data.At the same time,the characteristics of the sample will inevitably lead to many unrelated features and redundant features,thus bringing ”curse of dimensionality”.This will not only make the learning model more prone to over-fitting,but also increase the time complexity and space complexity of the learning model.Feature selection,as an effective means of dimension reduction,plays an important role in data processing.This paper focuses on the problem of feature redundancy in high-dimensional data.Therefore,this paper studies how to identify feature groups efficiently from the perspective of how to identify feature groups,and how to use the structure of feature groups to select features.In this paper,a stratified feature selection method is proposed.This method introduces the class label information in the weighted collaborative clustering algorithm and proposes a subspace clustering algorithm.Based on the clustering results,this paper proposes a stratified feature weighting algorithm to sort the features.Then based on the stratified feature selection method,this paper proposes a feature weight based method to learn the importance of features to simplify the model.Finally,based on the stratified feature selection method,this paper further analyzes that the top-ranking features in the same feature group may still be highly correlated,so this paper proposes a diverse constraint method to reduce the correlation between features.From a large number of experiments,we can know that the above three stratified feature selection methods can effectively select informative and diverse features...
Keywords/Search Tags:Data Mining, Feature Selection, Supervised Learning, Co-clustering
PDF Full Text Request
Related items