Font Size: a A A

Research And Application Of Machine Learning Based Code Vulnerability Detection Mechanism

Posted on:2019-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:2348330569495768Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,more and more applications of software are used in various fields,People are increasingly dependent on software,and the accompanying software security issues have also become more and more attention.Among them,software vulnerabilities are the primary security risks of software and also are the main factors affecting the stability and effectiveness of software.Predicting software vulnerabilities and defects has become a top priority for solving security and stability issues,and it has received extensive attention and attracted a large number of researchers.At the same time,the development of machine learning provides a new method for software vulnerabilities and defect prediction.Code vulnerability prediction based on machine learning has become a research hotspot.This thesis mainly studies the methods of code metrics and feature selection,as well as the impact of categorical imbalances in the prediction of code defects and migration learning problems in software vulnerability prediction.The main research contents and results are divided into the following three parts:(1)Analysis of the impact of category imbalance on the performance of software vulnerability prediction model.A method to analyze the impact of class imbalance on code defects prediction model is proposed.By using the smote method and the method of random undersampling,different imbalance data sets are constructed to compare the prediction performance of different algorithms in this data set family.Finally,it is concluded that the random forest and k-nearest neighbor algorithms have relatively stable prediction performance in imbalanced datasets and can tolerate a certain degree of imbalance.The random forest has the best stability.At the same time,the smote and the random undersampling method dealing with unbalanced data to improve the performance of the model is verified to be valid through experiments.(2)Code vulnerability and defect feature selection and measurement method analysis.A method of feature selection and evaluation for two different metric features is proposed,based on the feature selection process of kernel principal component analysis.Different features are selected according to different kernel functions,and then their training performance is verified and evaluated through model training.Comparing the two metrics and the effect of different kernel functions on them,it was finally shown that Chidamber & Kemerer based on the object-oriented metrics method is better than the process-oriented metrics McCabe & Halsted.They use the linear kernel function to characterize the principal components and achieve the best results.(3)Research on feature-based migration learning prediction model.A feature-based migration learning method is proposed.The principal component analysis method is used to establish the correspondence between the features of different data sets and features are extracted under different dimensions.Compare the performance effects of the model in different dimensions.Through experimental verification,this feature migration learning method based on principal component analysis has certain feasibility and can achieve better results.
Keywords/Search Tags:Code vulnerability, software defect, category imbalance, feature migration learning
PDF Full Text Request
Related items