Font Size: a A A

Research On Feature Selection Algorithm Based On Lasso And Mutual Information

Posted on:2021-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J B MengFull Text:PDF
GTID:2428330626460972Subject:Statistical information technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network technology,the wave of big data and artificial intelligence has also followed,with greater amount of more complex data.When a large amount of data is produced,how to deal with these data becomes an urgent problem to be solved.The multi-label learning framework is put forward,as the traditional single marker method cannot meet the practical needs.With the in-depth research of experts and scholars,more research methods based on multi-label learning have been put forward one after another.Feature selection has a good effect in solving the problem of high dimensionality of data.It operates on the basis of original feature space,removes redundancy,and obtains a feature subset with superior performance.By removing redundancy,feature subset can be selected from the original feature space to effectively improve the classification performance of the classifier,reduce the operation time and improve the computational efficiency of the algorithm.However,these traditional feature selection algorithms can only deal with static feature data,because they need to obtain the whole feature space before operation,and then deal with the corresponding feature subset.However,in real situation,the feature space is often generated dynamically and changes in real time,which makes the traditional feature selection algorithm unable to deal with this kind of stream feature selection.Based on the above problems,this thesis proposes two feature algorithms to solve the corresponding feature selection problem and the main contents are as follows.1.Considered the problem of high computational cost in the process of removing redundant features and selecting feature subset in traditional feature selection algorithm,Lasso feature selection algorithm is introduced in this thesis to quickly process high-dimensional data and select feature subset.In addition,in view of the fact that the traditional information entropy is not complementary and the calculation is complicated,fuzzy information entropy is introduced to replace the traditional information entropy in order to improve the classification performance.Based on these two points,this thesis proposes a feature selection algorithm based on lasso algorithm and fuzzy mutual information,which is proved to be effective by experiments.2.For dynamic feature selection,it mainly focuses on how to reduce the data dimension.Based on the selection criteria of "maximum correlation,minimum redundancy",considering the actual situation,feature space is usually characterized by high dimension and sparsity,while the redundancy between features is relatively small.In this thesis,a limit idea is adopted to fully consider the selection of features with high correlation with the marker space,to ignore the redundancy between features and selected feature subset,to calculate the mutual information generated in real time between features and the marker space,and finally select features with mutual information greater than the threshold value to obtain the final feature subset.Based on this idea,a fast flow feature selection algorithm is proposed based on mutual information.Experiments show that the algorithm can save operation time,but also improve the efficiency of classification.
Keywords/Search Tags:multi-label learning, feature selection, lasso algorithm, fuzzy mutual information, flow characteristic selection
PDF Full Text Request
Related items