Font Size: a A A

JAVA Code Summary Via Knowledge Graph And Deep Learning

Posted on:2021-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:J SuFull Text:PDF
GTID:2428330611999979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the expansion of the scale of software systems and the change of versions in response to demand,the ever-expanding code has become a challenge for programmers to develop and maintain systems.Because the code is,complex and changeable,it is difficult to read other people's code.Code summary provides a high-level natural language description of the functions performed by the code,which is beneficial to developers for software maintenance,code classification and retrieval.However,almost all existing code summaries are usually created manually by developers.With the continuous evolution of software systems,code summaries often have mismatches,errors,or untimely updates,resulting in that the developers have to take a lot of time to understand the function of the code.Manual code summarization has a heavy workload,so automatic code summarization is required.At present,in addition to the traditional template-based,information-based retrieval methods,and probabilistic model-based methods,the automatic generation of code abstracts is mainly based on CNN and RNN networks in deep learning technology.According to the probability model,the features of code text are extracted by the decoder decoding the feature vector to generate code summaries.However,these methods often treat the source code as plain text,ignoring a lot of code-related knowledge,and the effect is relatively limited.This knowledge,such as the function description corresponding to the application API and the description of the API-related problems,can often reflect the functions and explanations corresponding to the code.However,the knowledge often exists in many different resources.How to extract knowledge from different data sources and fuse them together is a challenge.After acquiring relevant code knowledge,how to apply it to the code summary generation task to achieve better results is also a work that needs continuous researchSpecifically,for the above problems,the main work of this article is as follows:First,the algorithm designed in this article extracts Java code related knowledge from the source code,API official documents and Stack Overflow question and answer data,and establishes the corresponding code knowledge base.And for scalability,three entity recognition models are used to identify API entities on the manually labeled data set,and a data fusion method is designed to construct a Java code knowledge graph.Next,based on the completion of the Java code knowledge graph,the traditional Seq2Seq model is modified.We design a code summary model based on the code knowledge graph and mixed attention mechanism,and experiments are conducted on open source data sets to verify the effectiveness of the model.Finally,in order to improve the quality of the code summary,we design and experiment a code summary model based on transfer learning and code knowledge graph.
Keywords/Search Tags:code summary, code knowledge graph, deep learning, hybrid attention mechanism, transfer learning
PDF Full Text Request
Related items