Font Size: a A A

Research On Automatic Code Generation Approach Based On Tag-graph Embedding

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:H W ZhangFull Text:PDF
GTID:2428330602964596Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age,a variety of application software has appeared in people's work and life.When developing an application,program developers often need to spend a lot of time and energy.Therefore,in order to improve the efficiency of software development,automatic code generation technology came into being.However,with the existing technical support,generating executable program code based on natural language description is a very challenging work.At present,the application of artificial intelligence algorithms is becoming more and more widespread,and the deep learning algorithms included therein provide better choices for code generation.Many researches have applied deep learning frameworks to automatic code generation to improve the accuracy of generating executable code.Because the program code contains a huge search space and complex program structure,most of the existing models focus on searching the program code's space structure or syntax structure,while ignoring the global relationship of method calls in the program code.What's more,when the sequence learning models widely used for automatic code generation at present is faced with long natural language descriptions and program code input,there will also be long-term dependencies,which reduces the effectiveness of model prediction.In order to solve the problems of the existing automatic code generation technology,this paper proposes a sequence learning model based on label graph embedding technology.Under the condition of inputting a given natural language description,this method integrates the global structure information of the program code into the structure of the label graph through the fusion and extraction of label graph modeling and label graph embedding technology,which will be shown by the vector features nodes correspondingly.Furthermore,by predicting the vector features of nodes,the process of generating program code by natural language description is completed.This paper will identify the following challenges and propose solutions respectively:(1)The problem of insufficient representation of the global structure information of program code.Label graph modeling technique is proposed to deal with this problem.Perform a unified analysis of all program code contained in the data set,find the connections in the code snippets,extract the method names in the code snippets,and search the calling relationship between the method names in all the program code to establish the representation.By doing so,the label map of the overall program code base can be established to improves the model's ability to express the global structure information of the program code.(2)The problem that the current program structure information lacks effective embedding means.Label map embedding technique can be used to solve this problem.On the basis of generating the label graph,the information of all nodes in the label graph is fused,and the node features are extracted to obtain the node features of all neighboring nodes.This will effectively retain the method call timing and special restriction information,so as to more effectively embed the program global structure information,which can effectively improve the accuracy of subsequent sequence learning models.(3)The long-term dependency problem of long program description statements in the traditional sequence learning model.To this end,this paper proposes an attention mechanism model and applies the attention mechanism model to a sequence learning model to enhance the memory function of the model for input data and make the model's prediction results more accurate.
Keywords/Search Tags:Code Generation, Seq2Seq, Tag-Graph Embedding, Attention Mechanism
PDF Full Text Request
Related items