Font Size: a A A

Security Bug Report Identification And Bug Localization Based On Deep Learning

Posted on:2020-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:P C LuFull Text:PDF
GTID:2428330590473263Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing scale of software systems,software vulnerabilities occur more frequently,which pose great threats to software system security.Bug reports,as an important part of software development and maintenance,have become the important information source for software vulnerabilities detection.Identifying the bug reports related to software vulnerabilities(denoted as security bug reports)and further locating these bug reports to the related source files can help software maintainers to effectively identify and fix software vulnerabilities and provide guarantee for software system security.However,current studies on security bug report identification suffer from the problems of class imbalance and noisy data,so there is still a room to improve the performance of these methods.Moreover,natural language in bug reports and programming language in source files have the problem of lexical gap,which has become the key limitation of current bug localization methods.To solve above problems,this paper proposes a novel security bug report identification method based on learning to rank and deep learning technologies and a bug localization method based on deep learning and information retrieval technologies.Firstly,for security bug report identification methods,different from previous keyword-based noisy data filtering methods,this paper uses text similarity based filtering method,using learning to rank technique to filter out non-security bug reports which have high similarities with security bug reports.In addition,this paper uses the pre-trained deep learning model BERT as the classifier because the generative properties of natural language learned from pre-trained language model task can benefit our specific task.We also propose the two-phase fine-tuning method,which first uses large dataset as to finetune BERT model and then uses the target dataset to fine-tune the model again.The twophase fine-tuning method avoids the problem that small datasets are difficult to fully train the model.Secondly,for bug localization method,this paper combines deep learning and information retrieval technologies.This paper first uses two BERT models to learn the relevancy between the text of bug reports and the comments and identifiers of source files respectively to mitigate the problem of lexical gap.Then this paper uses revised vector space model to calculate the text similarity between bug reports and source files,as a complement to the relevance scores of BERT models.In addition,this paper also computes the collaborative filtering score and class name similarity between bug reports and source files.After that,we take these five scores as inputs of a feature combination layer and obtain the final relevancy score.Then we rank source files according to the final relevancy scores and select the Top-k source files as the related source file to this bug report.Finally,we conduct experiments to evaluate our security bug report identification and bug localization methods.The experimental results show that our proposed security bug report identification method outperforms the state-of-the-art method and our bug localization method also achieves satisfactory performance on large dataset.
Keywords/Search Tags:Deep learning, security bug report identification, bug localization, learning to rank, information retrieval
PDF Full Text Request
Related items