Font Size: a A A

Research And Implementation On Open Source Code Annotation And Retrieval Technology Based On Crowdsourcing Intelligence

Posted on:2019-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2428330611993369Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the scale and complexity of software system increase day by day,there are a lot of repeated behaviors in software development.Theses behaviors exist in the analysis,design,coding,testing and other stages in the process of software development.Therefore,in order to reduce the cost of software development and improve the efficiency of software development.we need to reuse software at different levels.With the continuous development of software reuse,reuse can be divided into the following categories according to the degree of abstraction: code reuse,design reuse,analysis reuse,testing reuse,among which code reuse is the most common type of reuse behavior in software development.At the same time,with the continuous development of the open source community,the open source community has provides us with massive resources for software reuse,so we can make full use of these resources for software development,improve development efficiency and reduce development cost.To be able to reuse behaviors in the software development process,we can find the reusable resources we want through retrieval.However,with there is a huge semantic gap between natural language and code language,the existing code retrieval engine is generally ineffective.Therefore,a fast and accurate code retrieval engine is urgently needed to help developers locate code resources quickly.Based on a large number of reusable code resources in Github open source community,this paper studied two retrieval algorithms,designed and implemented a prototype system of code annotation and retrieval based on crowd intelligence:First,we propose a code retrieval algorithm based on user natural language query extension in terms of user intention understanding.We first train a word vector model through machine learning,then find out the key points of user search through keyword extraction and word analysis,and then extend the user's query keywords and use the extended user query statement to carry out code retrieval,so as to improve recall rate and accuracy of code retrieval.Second,we propose a code retrieval algorithm based on code annotation to solve the semantic gap between natural language and code language.We first use of collection Codepedia platform software project related quality code label,then put the code annotations associated with code fragments,by extracting key words in the comment with the user in the query keyword matching,understand user intent,solve the semantic gap of natural language and code language,improve code recall ratio and accuracy of retrieval,and then use Elastic Search Search engine to improve the efficiency of code retrieval.Thirdly,in the aspect of platform,we build a code retrieval and reuse system based on swarm intelligence.My code based on group intelligence retrieval multiplexing is distributed in the form of crowdsourcing task,let the developers and students to participate in the platform of crowdsourcing tasks,at the same time of reading the best open source code for open source code to add high quality code mark,for this,we set up a series of perfect mechanisms include: user guide mechanism,code labeling mechanism,user feedback mechanism,incentive mechanism,etc.
Keywords/Search Tags:Software Reuse, Crowd Intelligence, Keyword Expansion, Code annotation, Code Retrieval
PDF Full Text Request
Related items