Font Size: a A A

Code Completion Based On Semantic Context

Posted on:2022-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:T T WangFull Text:PDF
GTID:2518306572460284Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,the scale and complexity of software are also increasing.With the rapid increase in development requirements,in order to reduce the difficulty of software development and shorten the development cycle,researchers have turned their attention to software automation development.In intelligent software development,code completion realizes that the compiler proposes the next possible code token,such as method call or object field,based on the existing code in the context.In recent years,with the application of deep learning,a large number of scientific research results have emerged in this direction,which has promoted the development of software intelligence.In the existing research,most of the codes are regarded as language text and sent to the sequence model in natural language processing to realize the code completion.However,the natural language processing model uses UNK to replace the generated token when it does not exist in the dictionary.This is meaningless for the program.Compared with the natural language,the code has more important structural information,such as the execution sequence of the code and the program logic.In response to the above problems,this article uses trees and graphs for code representation,and proposes code completion based on the mixture network of abstract syntax tree and pointer generator,code completion based on Code Property Graphs and pointer generator network,code completion based on Code Property Graphs and graph neural network.In the code completion method based on the mixture network of abstract syntax tree and pointer generator,the abstract syntax tree is used to represent the code,which is converted into a sequence after traversal,and the natural language processing model is used for learning.In order to enrich the semantic information and structural information of the node,the parent node state representation is used when learning the hidden layer state of the node.At the same time,in order to solve the UNK problem,the selector is proposed to copy a Token in the code above to replace it.The accuracy of the mixture network model based on abstract syntax tree and pointer generator is77.221%,which is 4.92% higher than the accuracy of the LSTM baseline model,and0.908% higher than the accuracy of the LSTM-parent node information model.In the code completion method based on the Code Property Graphs and the pointer generator network,it is also very important to consider the order of executing code statements and the conditions that need to be met to adopt a specific execution path,and program data dependence and control dependence information are added.Data dependence means that a defined variable is used or changed,and control dependence means that the execution of the operation of the current node is limited to the execution of a certain node.The two kinds of dependent information are abstracted as a path sequence,and the pointer generator network is used for learning to realize code completion.The accuracy rate of the network model based on the code attribute graph and the pointer generator is 78.180%,which is 0.959% higher than the mixture network model based on the abstract syntax tree and the pointer generator.In the method of code completion based on Code Property Graphs and graph neural network,graphical representation is used to capture data dependence and control dependence information.Gated graph neural networks can learn graph structures with different edges,and add edges to the structure of the abstract syntax tree.Refine the data dependent information,and add edges to the variable use and calculation to represent the process.To construct a subgraph of the generated node,the network learns the vector representation of each node,and then learns the vector representation of the complete graph,equating the code completion to the classification problem,and completes the Token recommendation.Based on the code attribute graph and graph neural network model,the accuracy rate is 41.655%,which is 5.922% higher than the non-subdivided data dependent information training model.The effect is not as good as the first two methods,but this is the first time to implement code completion using graphs to achieve Token granularity,which is of research significance.
Keywords/Search Tags:Code Completion, Code Property Graphs, Data Dependence, Control Dependence, Graph Neural Network
PDF Full Text Request
Related items