Font Size: a A A

Research On Code Summarization Based On Call Dependency

Posted on:2024-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:L YuanFull Text:PDF
GTID:2568307052996259Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Code summarization aims at generating natural language comments for codes,which helps developers better understand and maintain code.However,writing code summary by hand is slow and laborious.Therefore,it is worth studying how to generate code summary automatically.Many researchers are exploring how to generate high-quality code summaries.However,many existing work focuses on the syntactic and semantic knowledge of code,overlooking call dependencies between codes.There is a gap between the real development and research.Researchers neglect that programs are consist of functions and classes with call dependencies.Lacking dataset with call dependencies is an obstacle of the problem.To solve the problem thoroughly,this paper constructs a new large-scale multilingual dataset with call dependencies called Call Code.New dataset faces new challenges.First,the average length of samples exceeds 512,which causes the problem of long code.Secondly,call functions contains complex relationships.It is also a challenge to make use of low-level and high-level call relationships.The main contributions are as follows.1.Aiming at the problem of lack of dataset with call dependency,a new dataset is constructed.To construct Call Code with 614,652 examples,We apply static call dependency extraction to Python,Java,Go,Javascript source files.2.Aiming at the problem of long code,a Multi-Encoder-Decoder model called CDMT is proposed.Our new dataset faces the challenge that the length of the code snippets exceeds the max inputs length of Transformer.To learn the relationship between main function and call functions,we design a Multi-EncoderDecoder model with three fusing methods.Furthermore,our model is available for fine-tuning on pre-trained models.3.Aiming at the problem of understanding complex relationship between main function and call functions,a Transformer-based model called CDGSum based on call dependency subgraph is proposed.A function will be called many times while it will call other functions.To capture the high-level call dependencies and call sequences,we present a model based on call dependency subgraph.We use GGNN to learn the complex relationships between main function and call functions by passing and update messages.Finally,We conduct extensive experiments to show the superior performance of CDMT and CDGSum.This paper verifies the feasibility of call dependencies,multiencoders and call dependency subgraph.
Keywords/Search Tags:code summarization, call dependency, pre-trained model, natural language processing
PDF Full Text Request
Related items