Font Size: a A A

Research On Code Summarization Generation Method Based On Deep Learning

Posted on:2022-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:N N GaoFull Text:PDF
GTID:2518306758992039Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Code summaries are natural language descriptions of code,and highquality code summaries help developers better understand and reuse software to reduce software development time and labor costs.As the application of deep learning in the field of natural language processing becomes more and more deep,the application of deep learning technology to the task of automatically generating code summaries has gradually become a new research direction,which is of great significance for developers to understand programs,maintain projects,and refactor code.This article examines how to effectively apply deep learning models to code summary generation techniques by summarizing the techniques for automatic generation of code summaries.In the study of deep learning-based code summary automatic generation technology,there is a major problem that needs to be solved urgently:compared with the weak structure of natural languages,programming languages have strong structure.Therefore,how to make full use of the structural and semantic information of the code to generate a sufficient and complete code summary is a key issue for automatically generating code summaries.The most common solution is to turn the code text through the parser into an abstract syntax tree,then traverse it and then encode it through the encoder of the recurrent neural network.After obtaining the context vector of the code,it is input to the decoder for decoding,and finally the corresponding code summary is output according to the probability distribution.However,the traditional "encoder-decoder" model is difficult to solve the long-term dependency problem,and the independent encoding of the semantic and structural information of the program does not consider the connection between the two.Aiming at the current Sequence-to-Sequence-based research methods that ignore the relationship between code text and structural information,this study selects the self-attention mechanism proposed in the Transformer model to solve the long-distance dependence problem.The principle is to use the attention mechanism to dynamically generate different weights to handle long sequences,so long-distance dependencies can be accurately captured.In view of the problem that the separate encoding of structural information and semantic information can lead to the neglect of the relationship information between the two,this study proposes a method of converting the program into a data flow graph and as the input sequence of the model,which fully considers the dependencies between the variables of the program.Combined with the above solution,this study proposes a CStrans code summary generation model based on Transformer,and its summary generation steps are as follows: First preprocess the code and summary.Compared with the preprocessing method of parsing code into AST in many models,this paper chooses to extract variables from AST and the dependencies between variables to construct a data flow during the preprocessing stage,so that when there is the same variable name but different semantics,the dependency can be expressed through the data flow.The sequence containing semantic information and structural information is then encoded by the encoder of the Transformer model,and the decoder decodes to generate the corresponding code summary.Taking DeepCom's Java dataset as experimental data,this paper selects the top 10 items in the dataset to conduct multiple experimental verifications on the model proposed in this study.Experimental results show that compared with other code summary generation models,the proposed model has a good effect on the automatic generation performance of code summary.
Keywords/Search Tags:Deep Learning, Code Summary, Graph Transformer, Data Flow Graph, Abstract Syntax Tree
PDF Full Text Request
Related items