Font Size: a A A

Research On Code Completion Based-on Semantics And Synatx Combining Deep Learning

Posted on:2022-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S Q FuFull Text:PDF
GTID:2518306602975999Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a kind of method for generating programs automatically,code completion has received widespread attention from both academia and industry.It is an important research topic in the field of software engineering.Comparing with traditional code completion technologies,the methods based on deep learning have the advantages of high accuracy and various completion forms,which bring great changes to the automatic program generation technologies.The existing code completion technologies based on deep learning have achieved good results in program characterization and model selection.However,there are still some shortcomings,such as the inability to accurately predict the words out of the vocabulary and the loss of long-distance dependent information.Moreover,the existing researches on the representation and utilization of semantic information in programs for code completion are not sufficient,which affects the accuracy of code completion.In order to design a more effective code completion approach,this paper studies the above-mentioned shortcomings and provides solutions.By designing a word segmentation method for program identifiers,and a better method to represents and remembers the program context,we propose a code completion method(BPE-TCN)based on Byte Pair Encoding and Temporal Convolutional Network.In this model,in order to express and predict the words out of vocabulary,this paper uses Byte Pair Encoding to process the words that appear in a program,so as to put all the words into the vocabulary as much as possible.At the same time,in order to solve the problem of depend information loss when coding over long distances,this paper uses Temporal Convolutional Network as the encoder,which can use the extended convolution to expand the receptive field of the model,and the filter skipping part of input values to obtain input information farther away from the current step.On those basis,this paper defines a semantic information representation method by studying the relationship between program abstract syntax tree and control dependency.A semantic encoder structure is proposed to assist the code completion in order to use the important semantic information in programs.Combining with the BPE-TCN method,this paper proposes a code completion approach named BTCN-LSTM,which is based on semantics and syntax.In order to verify the effectiveness of the approach,we conduct an empirical study on a open Python dataset.The experimental results show that the accuracy of the code completion method based on Byte Pair Encoding and Temporal Convolutional Network is improved about 1.5%compared with the existing methods.After combining the semantic and syntax information,the accuracy of the BTCN-LSTM approach is further improved,and the overall improvement is about 5.2%.
Keywords/Search Tags:Code Completion, Byte Pair Encoding, Temporal Convolutional Network, Program Semantic Features
PDF Full Text Request
Related items