Research On Code Recommendation Based On Program Analysis And Neural Network Language Model

Posted on:2019-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:J N Zhang

Full Text:PDF

GTID:2438330548457840

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Large projects such as the kernels,drivers and third-part libraries all follow a code style and have recurring patterns.In this article,we explore code recommendations based on NLP,use the source file context as input to predict the next token,and learning the meaningful potential patterns.Using word vectors to represent code tokens and machine learning techniques based on NLP,we can capture interesting patterns and predict code that can't be predicted by simple grammar and semantic methods as in traditional IDEs.Our methods try to learn these grammar or patterns automatically.In the past,the method is mainly aimed at a specific language,such as studying more strong typed language--Java and recently researched weak typed and dynamic language--Javascript.We first try to built a model that was not based on any specific language and achieved a prediction model.It shows a prediction based on the C language for the Linux kernel with an accuracy of 56.1% and 43.6% on Twisted based on a network library of python language.Then we considered the features of Python,such as weak type and dynamic characteristics.First we analyze language with AST and use word2 vec pre-training,then we do experiment again and achieve an accuracy of 56.3%.First,we build a model that was not based on any particular language syntax and semantics.Then based on the weak type and dynamic of Python,we use AST rules to handle a more authoritative open source data set and extract more representative tokens,then use word2 vec pre-training and experiment,the accuracy has been improved compared with previous experiment,it shows a 56.3% accuracy.The specific work is as follows:1.Extract tokens based on NLP,just remove the annotations in the code anddirectly tokenize.Construct word vectors as neural network input and experiments.Evaluate the experimental results with several important accuracy indicators andanalyzed some potential patterns.2.Based on the characteristics of Python,we choose a large open source data setand build an AST to analyze the syntax and grammar in the code base.Then extracttokens that can represent the using patterns and pre-training with word2 vec.Finally,we do the test with model again.3.Compared two experiments,we find the accuracy is improved after using AST and word2 vec.In order to explain more details,we count the contribution of tokens in the context to predict the next token.

Keywords/Search Tags:

Big code, tokenize, program analyze, attention model, GRU, code recommendation

PDF Full Text Request

Related items

1	Research On Search Based Code Recommendation Techniques
2	A Code Automatic Evaluation Method For Students'Program Based On Code Format And Semantic Features
3	Research On Code Recommendation And Comment Generation With Context Information
4	Code Recommendation Of Student Program Based On Data Driver
5	Research On A Code Recommendation Tool For Big Code
6	Research On Program Code Classification Based On Deep Learning
7	The Research And Development Of Code Generation Based On The Analysis Of Semantic
8	Research On Code Block Completed Recommendation Algorithm Based On Variance Code Clone Search
9	Research On Code Snippet Recommendation Method Based On Code Statement Granularity Representation
10	JavaScript Code Recommendation Based On Program Analysis And Machine Learning