Font Size: a A A

Research Of Code Assist Technology Based On Statistical Language Model And Static Analysis

Posted on:2018-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:J M JiangFull Text:PDF
GTID:2348330515997930Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As software systems are getting larger and larger,developers is used to apply well-developed frameworks and libraries to help improve development efficiency and quality.However,it's hard to learn massive APIs even for experienced programmers.Researchers have proposed many technologies applied in code assist system that can help developers use these APIs,while most of current code assist systems have defects.Some reseachers consider that the majority of software are "natural",and leverage natural language processing methods in code assist.Natural language processing models have the advantage of efficiency.However,code is different from natural language in that code has structure information.To exploit the structure information,some reseachers propose graph-based model with high accuracy.Their methods have the drawbacks that the graphs,in which store the structure information,take too much disk space.Besides,their method is developed based on graph matching,causing it to be time inefficient.To make use of the advantages of both natural languge model and graph-based model,we consider to turn graphs into sequences.Control flow graph contain some structure information of the program.By extract all method call serials in a substructure of the control flow graph,we can represent the local structure information of the control flow graph into method call sequences.In this way,we can apply statistic language model to method call sequences in analogy with nature language.Based on the above analysis,we propose a Program control flow graph based N-gram model(Pro-N-gram model),and implement an Eclipse plugin of code assist system.Firstly,we develop a technology to generate Program based N grams of API methods(Pro-n-gram).By discussing all situations,we construct control flow graph at the statement level.We parse the statements containing complicated method call to generate control flow graph at the method level.Then,we replace the method nodes in the control flow graph with their FNQ(Full Qualified Name)parsed by PPA(Partial Program Analysis)tool.We develop a deep first search method to find all Pro-n-grams in the final method level control flow graph,and count them.At last,consider the difference between control flow graph and natural language,we apply re-computing trick to ensure Pro-N-grams counts are consist with language model,and proposed Program control flow graph based N-gram model(Pro-N-gram model)that can predict hole method given multiple contexts.The innovation highlights and main contributions are:Firstly,by leveraging PPA to parse FQN,we eliminate the ambiguity of same method name from different packages,and prevent the method name dictionary to get too large.Secondly,we propose a statistic method to generate method level control flow graph,and extract N grams of method call serials based on it.These serials summarize all possible execute sequence,and present the structure information in the control flow in the form of sequences.Thirdly,we propose the Pro-N-gram model by dexterously combining the statistic language model and static analysis.Using the N grams of method call generated from control flow graph,inspired from N-gram language model,we come up with Pro-N-gram code assist model that can predict hole method name given multiple contexts.
Keywords/Search Tags:Code Assist, Recommendation of API Elements, Statistic Language Model, Static Analysis, Control Flow Graph, Pro-N-gram
PDF Full Text Request
Related items