Font Size: a A A

Research On Automatic Generation Of Code Comments

Posted on:2022-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:C A NiuFull Text:PDF
GTID:2518306725984429Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Code comments play an important role in software quality assurance,as they can improve the readability of code and make it easier to understand,reuse and maintain.However,for a variety of reasons,sometimes developers do not add the necessary comments,making the process of software development and maintenance often take a lot of time for developers to understand the code,greatly reducing the efficiency of software development and maintenance.In recent years,many works have used machine learning techniques to automatically generate code comments.These methods have achieved good results by extracting semantic and structured information from the code and feeding it into a neural network model to generate the corresponding comments.However,current code comments generation models still have some shortcomings.First,it may destroy the code structure during preprocessing,which leads to inconsistencies in information between different inputs and makes the model learn poorly;second,the sequence length of the AST traversal method SBT is too long,which reduces the training speed of the model;third,due to the limitation of the sequence-tosequence model,it cannot generate words outside the vocabulary(Out-Of-Vocabulary word,OOV word)in the comments,for example,the identifiers such as variable names and method names that appear very few times in the source code are usually OOV words,but without them,the comments will be difficult to understand.To solve the above problems,this thesis proposes a new code comments generation model CodePtr.firstly,it solves the problem of code structure being broken and inconsistency between inputs by adding a complete source code sequence encoder;secondly,it proposes the AST traversal method-X-SBT,which shortens the traversal sequence length to less than half of the SBT.Finally,this thesis introduces the Pointer-Generator Network to automatically switch between the two modes of word generation and word copying at each step of decoding,especially when encountering identifiers that appear very few times in the input,the model can directly copy them to the output,thus solving the problem of not being able to generate OOV words.Finally,this thesis compares the CodePtr and baseline models experimentally on a large dataset,as well as conducts an experimental analysis of the effectiveness of the main parts of CodePtr,and the results fully illustrate the effectiveness of CodePtr and the main parts.Also,this thesis designs and develops an automatic code comments generation system,which is integrated into the development environment as an IntelliJ IDEA plugin.
Keywords/Search Tags:Software quality assurance, Code comments generation, Out-of-vocabulary word, Plugin
PDF Full Text Request
Related items