Font Size: a A A

Research And Application Of The SPEC Feature Selection Algorithm Based On Correlation

Posted on:2019-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ShiFull Text:PDF
GTID:2348330545455727Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of machine learning and deep learning technology,learning models have been mature.In recent years,data processing and feature extraction process have become the focus of public attention.The quality of the feature is measured by the feature selection algorithms.Feature selection algorithms analyze the quality of features from a statistical point of view,that can better understand the data itself and extract better features.Herein,we reported the optimization of feature selection algorithm,make the feature selection algorithm could be applied in more scenarios.The main work and innovation of this paper are as follows:Firstly,with the lack of computing power of SPEC feature selection algorithm under the condition of containing numerical feature,we proposed a SPEC feature selection improvement algorithm(SPECMIC)based on maximum information coefficient.The SPECMIC feature selection algorithm uses the MIC coefficient to calculate the correlation that can enhances the ability of calculate the correlation of numerical features,and is better suited to include numerical feature scenes.Analysis using public dataset revealed that the SPECMIC feature selection algorithm has better effect on the numerical feature data set.Second,building an APP Usage Prediction to analyze the user application.By using the SPECMIC feature selection algorithm to analyze the features,the optimal feature subset dimension is reduced by 900 without redundant features,irrelevant features and noise features.And there is a 3%increase of the accuracy of the APP Usage Prediction model.Thirdly,in terms of the disorder of the sequence of the feature subsets in the SPEC feature selection improvement algorithm,an improved SPEC feature selection improvement algorithm(RSPEC)is proposed.The RSPEC feature selection algorithm can enhances the importance of features with high correlation and high redundancy,reduces the importance of features with low correlation and low redundancy.The RSPEC feature selection algorithm makes the feature ranking more reasonable.Analysis using public dataset revealed that RSPEC has better effect on the sequence of the feature subsets in the SPEC feature selection improvement algorithm.Fourthly,a black word detection model is constructed to extract the words needed by the malicious URL detection model.Analysis the deficiencies of the existing TF-IDF keyword extraction algorithm in the malicious Web site detection model showed the use of feature selection algorithm to extract keywords has a better effect.In addition,comparing the SPEC algorithm and RSPEC algorithm proved that RSPEC algorithm has a better performance.
Keywords/Search Tags:Feature Selection, Maximum Information Coefficient, APP Usage Prediction, Malicious URL Detection Model
PDF Full Text Request
Related items