Font Size: a A A

Research On Bug-fix Submission Log Identification Method Based On Text And Patch Information

Posted on:2022-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:C K WuFull Text:PDF
GTID:2518306572460104Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Understanding the maintenance activities performed in software repositories can help software practitioners reduce maintenance costs,make decisions about resource allocation,and improve effectiveness by planning ahead and preallocating resources for source code maintenance.For most software systems,bugs are tracked through the software repository's issue tracking system,and code changes are merged in the form of commits to the source code control repository.Therefore,it is convenient to examine these basic artifacts of software development(new bug reports or commits)to detect bugs in real time.The goal of this project is to automatically classify a series of commits in the software development process with high accuracy,classifying the commits as bugfix related or non-bug-fix related.Generally this task is done by keywords or rule based to classify the text of these commits,but this will have a high false alarm rate.The machine learning-based approach,while avoiding to some extent the problem of relying on security experts to write rules,still requires relying on humans to extract features.In recent years,the research results achieved by deep learning techniques in various fields have provided directions for the classification of code submissions.To address the current problems of high false alarm rate of commit text classification and underutilization of patch code information in commit recognition,this paper extracts the information of text and code segment by the related techniques of natural language processing,receives the bimodal information of text and code segment with Code BERT pre-trained model,and extracts the semantics of text and patch information with Bi LSTM,CNN.In order to make full use of the patch code information,we mine bug fix patterns in the form of rich edit scripts from several open source projects,generate abstract syntax trees for the code,obtain edit scripts by computing the syntax tree differences between the before and after versions of the code,and then match them with the bug fix patterns we have mined to obtain classification features,and finally the pattern matching results together with the features extracted by deep learning are The result of pattern matching is finally fed into the classifier together with the features extracted by deep learning.In order to capture multiple features and potential distributions of the data,in this paper,we construct an integrated model to classify the submission logs.We compare the classification results of several machine learning classifiers,such as support vector machines,random forests and nearest neighbor algorithms,and the integrated model on the dataset,and make consistent predictions of the classification results,so that the classification results have a high degree of confidence.
Keywords/Search Tags:bug fix pattern, user submission log identification, deep learning, pattern matching, ensemble learning, conformal predictions
PDF Full Text Request
Related items