Font Size: a A A

Research On Code Recommendation Based On Program Analysis And Neural Network Language Model

Posted on:2019-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:J N ZhangFull Text:PDF
GTID:2438330548457840Subject:Engineering
Abstract/Summary:PDF Full Text Request
Large projects such as the kernels,drivers and third-part libraries all follow a code style and have recurring patterns.In this article,we explore code recommendations based on NLP,use the source file context as input to predict the next token,and learning the meaningful potential patterns.Using word vectors to represent code tokens and machine learning techniques based on NLP,we can capture interesting patterns and predict code that can't be predicted by simple grammar and semantic methods as in traditional IDEs.Our methods try to learn these grammar or patterns automatically.In the past,the method is mainly aimed at a specific language,such as studying more strong typed language--Java and recently researched weak typed and dynamic language--Javascript.We first try to built a model that was not based on any specific language and achieved a prediction model.It shows a prediction based on the C language for the Linux kernel with an accuracy of 56.1% and 43.6% on Twisted based on a network library of python language.Then we considered the features of Python,such as weak type and dynamic characteristics.First we analyze language with AST and use word2 vec pre-training,then we do experiment again and achieve an accuracy of 56.3%.First,we build a model that was not based on any particular language syntax and semantics.Then based on the weak type and dynamic of Python,we use AST rules to handle a more authoritative open source data set and extract more representative tokens,then use word2 vec pre-training and experiment,the accuracy has been improved compared with previous experiment,it shows a 56.3% accuracy.The specific work is as follows:1.Extract tokens based on NLP,just remove the annotations in the code anddirectly tokenize.Construct word vectors as neural network input and experiments.Evaluate the experimental results with several important accuracy indicators andanalyzed some potential patterns.2.Based on the characteristics of Python,we choose a large open source data setand build an AST to analyze the syntax and grammar in the code base.Then extracttokens that can represent the using patterns and pre-training with word2 vec.Finally,we do the test with model again.3.Compared two experiments,we find the accuracy is improved after using AST and word2 vec.In order to explain more details,we count the contribution of tokens in the context to predict the next token.
Keywords/Search Tags:Big code, tokenize, program analyze, attention model, GRU, code recommendation
PDF Full Text Request
Related items