Font Size: a A A

Research On Code Annotation Generation Method Based On Seq2seq Framework

Posted on:2021-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:W T FengFull Text:PDF
GTID:2428330647958914Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The code annotation can improve the readability of the program and help the programmer to promote the maintenance of the software efficiently.Therefore,the automatic generation of code annotation is of great significance;However,generating high-quality code comments is challenging for two reasons:First,the code contains a lot of encapsulated classes,function methods and so on,hiding a lot of guiding information;Second,the code has rich nested structure and complex call relationship,which makes it more difficult to understand.In this thesis,we study and improve the automatic generation algorithm of code annotation based on deep learning,considering the information of function description,code sequence and code structure.The main work of this thesis includes the following three aspects:(1)Implemented a code annotation generation benchmark model based on the seqto-seq framework.The benchmark model of automatic code annotation uses the bidirectional recurrent neural network to encode code fragments,to learn the context information of code,and then generate corresponding annotation.In this model,the attention mechanism is used to assign the corresponding weight to the code token,and the decoding algorithm is used to sample the probability distribution on the output vocabulary to generate the most likely sequence.(2)Proposed a code comment generation algorithm combining function description information.In view of the problem that the existing code annotation generation task lacks the use of function document information,we propose two models that fuse code sequences and function description information captured from the Python standard library and the third-party libraries.One is combining the function name in the code and the function's description information into a new vector when encoding,and then use the vector to replace the representation of the function name token;The other is using the pointer network to generate a series of pointers to the function description information,and selectively copy the words that may be output from the description information.Then generate the final natural language annotation.(3)Design and implement a code comment generation algorithm combining code structure information.In view of the shortage of the syntax structure information used in the existing code annotation generation task,we introduce the abstract syntax tree structure to enrich the semantic information of the code.We first implemented a code comment generation algorithm based on the Tree-LSTM network,using Tree-LSTM to encode the code's abstract syntax tree structure.Further,we design a code annotation generation algorithm based on a bidirectional tree encoder,using BiTree-LSTM to encoder the tree structure of the code,to realize the two-way utilization of tree structure information.In this model,we design a hybrid attention mechanism to integrate code sequence and structure representation to better generate the final natural language annotation.In this thesis,a series of experiments were performed on the public Conala dataset and Django dataset on the proposed model which combines the function description information and the code structure information.The experimental results show that our improved method performs significantly better than the benchmark system.
Keywords/Search Tags:Code annotation, pointer network, abstract syntax tree, Tree-LSTM, BiTree-LSTM
PDF Full Text Request
Related items