Font Size: a A A

Research And Implementation Of Automatic Code Summarization And Retrieval Technology For Open Source Reuse

Posted on:2020-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:B H LiuFull Text:PDF
GTID:2518306548495884Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of open source ecology,a large number of open source software resources have appeared.Among these software resources,there are many high-quality code fragments and their derivatives,such as code summaries,documents and so on.These high-quality resources are valuable.However,in the massive open source resources,how to mine these high-quality open source resources,so that they can be reused in the future software engineering,is still facing many challenges.This paper argues that we need to concentrade on two things in the big data open source world: understanding code and locating code.Therefore,this paper proposes the following research:1.Static call dependency extraction based on abstract syntax tree.The detail implementation of a piece of code is inseparable from the call dependency between it and other code snippets.The call dependencies within a project are very important for understanding the overall function of the project.Therefore,this paper uses the static scanning method on the basis of the abstract syntax tree,and designs a tool to automatically extract the multi-level call dependencies of code.At the same time,according to the statistical analysis of call dependencies between different projects,the features of call dependencies in Java project can be analysed.Then this paper explains how the call dependenct sequences can help the code understanding.2.Code summarization generation based on call dependencies.The technology of automatic code summarization aims to automatically generate natural language summarization describing the functions of code fragments.In the background of big data,machine learning and natural language translation are used to generate code summarization.The latest research in this field aims to optimize and improve the translation model of machine learning by analyzing the structure and semantic features of code,so as to make it more suitable for the transformation from code to summarization.However,these researches only consider the features within individual code snippets and ignore the call dependencis of code.Based on the call dependency extraction tool developed by ourselves,this paper takes the call dependencies of code into consideration and integrates them into the code summarization generation method.In this paper,experiments are implemented on large-scale data.Comparing with the latest research results of code digest generation,the method of this paper improves the generation effect of code summarization.3.Code retrieval technology based on code tags.Code retrieval technology is target at quickly and accurately locate the relevant code fragments in the massive code resources according to its functional requirements.In current code retrieval researches,the code semantic retrieval engine based on machine learning technology has achieved great success.Based on the idea of code summarization generation model,this paper designs a code tags generation model for code semantic retrieval.Experiments show that code tags,as key words of code semantics,have high generation accuracy.The search engine based on code tags is compared with the current code semantic search engine.The results show that our method is better.Among them,the static call dependency extraction tool based on the abstract syntax tree and the code summary generation model based on the call dependency extracted by the tool can help developers to understand the code.The code retrieval technology based on code tag aims to solve the problem of code location.
Keywords/Search Tags:Open Source Software, Call Dependency, Code Summarization, Code Search, Artificial Intelligence
PDF Full Text Request
Related items