Font Size: a A A

Feature Selection Based On Attention Mechanism:An Efficient Architecture For Massive Complex Data

Posted on:2020-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:D N GeFull Text:PDF
GTID:2428330572961804Subject:Engineering
Abstract/Summary:PDF Full Text Request
Feature selection technology,as an important research direction in the field of data mining,selects a set of the most useful feature subsets from the original feature set.It has many ftunctions,including the ability to effectively solve the "dimensional disaster",reduce the complexity of the model,and make the data more understandable and helpful for data analysis.However,with the rapid development of information technology,data acquisition has become easier and easier,which has led to tremendous changes in the form of data that feature selection faces.These changes include,for example,an increasing sample size,a rapid increase in feature dimensions,and an increasingly complex internal structure of data.The existing feature selection methods are difficult to cope with these changes from the design of computational modes and technical methods,and there are many defects.Therefore,how to make efficient feature selection for massive and complex data has become an important research topic.Research on feature selection based on deep learning is the frontier research field of feature selection technology.It is considered to have the ability to process massive data and can effectively deal with feature selection under big data.However,current research work in this field 1s currently more difficult to deal with feature selection under complex data*For example,when the data changes complexly(such as adding noise interference,sharply decreasing the labeled samples,and the sample features are sequential structures),the performance is often unstable or even invalid.This paper considers the feature selection mechanism from a new perspective.Based on the combination of deep learning technology,this paper proposes a new feature selection architecture based on attention model The specific research work is as follows:1)For feature selection,it is difficult to balance computational complexity and performance with noise interference in large sample data.A new feature selection method based on deep learning has been developed to convert the importance evaluation of features into consideration.The problem of allocation across all feature dimensions is called attention-based feature selection(AFS).AFS consists of two loosely connected modules named attention module and learning module.The attention module is used for feature weight generation and the learning module is used for problem modeling.The core of the attention module is to use the two-category model to measure whether the feature is selected in each feature dimension as the assigned attention.In this paper,expenments are carried out on the MNIST dataset and the noise dataset of MNIST.The results show that AFS still has high accuracy and excellent de-redundancy in the presence of noisy interference,and the improved accuracy can reach up to 9%.Its computational complexity also has good scalability in the face of large data sets,and can be further reduced by the model reuse mechanism;2)For the problem that feature selection is easy to over-fitting in small sample data,an AFS method based on hybrid strategy is proposed,called AFS-hybrid.The method is based on AFS and is improved by combining existing feature selection methods.Similar to adding training samples,the simulated sample data is constructed by using the weights generated by the existing feature selection method,and the pre-training is performed in the attention module to first converge to the local optimal value,and then starting from the local optimal value,using the real Small sample data is trained to make it easier to converge to local optimal values,thereby alleviating over-fitting problems.At the same time,based on the AFS framework,it is beneficial to retain the original advantages,such as high robustness against noise interference and excellent de-redundancy.Experiments using the published small data sets Isolet-5 and Lung discrete show that the performance of AFS-hybrid is significantly improved in the existing feature selection methods;3)For the problem of feature selection in the high-frequency time series data,it is difficult to locate the time-delay time.The AFS method based on multi-layer attention model is proposed,which is called AFS-multilayer.In order to accurately measure the importance of time series sample data in the feature parameter dimension and the feature time series dimension,an attention model corresponding to the feature time series dimension is added on the basis of the AFS architecture,so as to comprehensively consider the parameter and timing dimension of the feature,and then two attention values are obtained,which are simultaneously applied to the corresponding feature dimensions,and the back propagation of the learning module adjusts the value of attention.Experiments show that such a layered design can accurately capture the time lag of the parameters of interest,and AFS-multilayer achieves better positioning results than other feature selection methods on the simple MISO industrial dataset.
Keywords/Search Tags:Feature Selection, Attention Mechanism, Deep Learning, Big Data
PDF Full Text Request
Related items