Font Size: a A A

Structure-aware Graph Neural Network For Code Comment Generation

Posted on:2022-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhangFull Text:PDF
GTID:2518306782452484Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Code comments are mainly used to describe the functions provided by the program code and play an extremely important role during software development and project maintenance.However,in the actual development process,due to the high time cost of manually writing code comments,a large number of program codes suffer from missing or mismatched comments.Code comment generation aims to automatically transform structured program code into natural language comments that can describe the function of the program code,which reduces the workload required for software developers to write code comments to a certain extent.Some research works directly represent the program code in the form of source code sequences and employ recurrent neural networks as encoders to model the source code sequences.However,the program code is a kind of structured data that contains structural information specific to the program code,and this structural information has an important impact on the correct understanding of the function of the program code.Although these methods extract the sequential information of the program code to a certain extent,they cannot take into account the complex structural information in the program code.In order to extract the structural information in the program code,some research works represent the program code as an abstract syntax tree,and use LSTM or Tree-LSTM to encode the abstract syntax tree.However,the approaches based on LSTM cannot consider the connection relationship of nodes in the abstract syntax tree well.The approaches based on Tree-LSTM pay more attention to the information propagation from leaf nodes to root nodes,but ignores the information propagation in the reverse direction.Therefore,based on the related encoding ideas of graph neural networks,this thesis conducts research on program understanding,and achieved the following two research results:(1)This thesis proposes a structure-aware graph neural network for code comment generation.The structure-aware graph encoder of this model takes the graph structure representation of the program code as input,including the grammatical dependency graph and the semantic dependency graph.Then,the model employs the hierarchy-based information propagation mechanism and the neighbor-based information propagation mechanism to reasonably encode the two dependency graphs.To obtain graph-level features,the graph aggregation network of this model first adopts a bidirectional long short-term memory network and a max-pooling method to aggregate node features on the grammatical dependency graph and semantic dependency graph,respectively.Finally,the initial input vector of the decoding stage is obtained by fusing the graph-level features of the grammatical dependency graph and the semantic dependency graph.The model uses the graph neural network as the underlying encoding model,and uses different information propagation mechanisms to model the dependency graph,which can better learn the complex dependencies in the program code.(2)This thesis proposes a structure-aware hybrid encoding network for code comment generation.The encoder of this model includes sequential encoding layer,grammatical structure encoding layer and aggregation encoding layer.The sequential encoding layer takes the sequential form of the program code as input and adopts Transformer to capture the sequential information of the program code.The grammatical structure encoding layer takes the semantic dependency graph as input and adopts graph attention network to effectively extract the grammatical structure information of the program code.The aggregation encoding layer adopts Transformer to aggregate the structural representation of the sequential encoding layer and the grammatical structure encoding layer.The model uses this hybrid encoding method to better learn the complex dependencies in the program code.The experimental results show that the model has achieved good performance on various programming language datasets,and is able to capture the complex structural information in the program code effectively.Meanwhile,in order to verify the contribution of each component of the model,this thesis conducts ablation study on different code comment generation models.Finally,this thesis presents the natural language comments generated for different programming languages through case study to visually verify the effectiveness of the model.
Keywords/Search Tags:code comment generation, structure-aware, graph neural network, natural language processing, abstract syntax tree
PDF Full Text Request
Related items