Font Size: a A A

Research On Commented-out Code Identification Methods Based On Deep Learning

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhangFull Text:PDF
GTID:2518306764494014Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Code comments are explanations and descriptions of the code.While well-written code comments can help software developers better understand and maintain their code,misuse of code comments should be avoided,such as invalidating code by means of comments.Not only is commented-out code unhelpful to program execution,but deprecated commented-out code can even prevent developers from understanding it properly.In addition,too much commented-out code in a project can interfere with the proper use of some code analysis tools.Therefore,it is necessary to detect and prompt software developers and maintainers to modify and remove commented-out code from the source code.To help developers identify commented-out code in software projects,researchers have proposed a number of automatic or semi-automatic methods for identifying commented-out code.Currently,the most popular detection method in industry is commented-out code detection by hand-written rules.However,it has been shown that using manual rules to match the commented-out code is difficult to cover all cases of the commented-out code,and the accuracy and recall are not sufficient for practical use;on the other hand,writing and maintaining rules is a time-consuming and error-prone task.In this paper,we analyze the commented-out code examples in code comment and propose a deep learning-based commented-out code detection method to help developers identify commented-out code more efficiently.The main work of this paper has two aspects as follows.(1)In this paper,we propose an commented-out code detection method Att-Bi-LSTM based on attention and bidirectional long and short-term memory networks.The method first preprocesses and lexical analysis of code comments to obtain a sequence of Tokens of code comments,then trains word vectors using the GloVe model,and subsequently feeds the trained word vectors into a bidirectional long and short-term memory network to assign a higher Finally,the commented-out code recognition classifier is constructed via the model output layer.We study the problem of commented-out code recognition using the Python programming language as an example.Experimental studies using the CodeSearchNet public code dataset show that our method has a precision of 0.979 and an F1 value of 0.978 on the training set,a precision of 0.969 and an F1 value of 0.969 on the test set,and a precision of 0.967 and an F1 value of 0.967 on the validation set,all of which are compared with the Baseline model for each group highest precision and F1 value.(2)To further improve the accuracy and robustness of the classifier,this paper proposes a fuzzy test-based adversarial sample generation method,which guides the fuzzy test process by analyzing the information of the internal neurons of the classifier,and then uses a mutation-based method to generate an adversarial sample set for the classifier to be tested,and finally retrains the model with this adversarial sample set to improve the robustness and accuracy of the model.Experiments show that the method is able to improve the classification accuracy of the deep learning-based commentedout code classifier by 0.7% to 2%.(3)This paper implements a commented-out code detection tool to identify and detect commented-out code in Python source code.The tool uses a deep learning model combined with a rule engine based on regular matching to maximize the accuracy of the commented-out code detection while ensuring the efficiency of detection.
Keywords/Search Tags:deep learning, commented-out code, code classification, fuzzing, robustness
PDF Full Text Request
Related items