Research And Implementation On Open Source Code Annotation And Retrieval Technology Based On Crowdsourcing Intelligence

Posted on:2019-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Yu

Full Text:PDF

GTID:2428330611993369

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the scale and complexity of software system increase day by day,there are a lot of repeated behaviors in software development.Theses behaviors exist in the analysis,design,coding,testing and other stages in the process of software development.Therefore,in order to reduce the cost of software development and improve the efficiency of software development.we need to reuse software at different levels.With the continuous development of software reuse,reuse can be divided into the following categories according to the degree of abstraction: code reuse,design reuse,analysis reuse,testing reuse,among which code reuse is the most common type of reuse behavior in software development.At the same time,with the continuous development of the open source community,the open source community has provides us with massive resources for software reuse,so we can make full use of these resources for software development,improve development efficiency and reduce development cost.To be able to reuse behaviors in the software development process,we can find the reusable resources we want through retrieval.However,with there is a huge semantic gap between natural language and code language,the existing code retrieval engine is generally ineffective.Therefore,a fast and accurate code retrieval engine is urgently needed to help developers locate code resources quickly.Based on a large number of reusable code resources in Github open source community,this paper studied two retrieval algorithms,designed and implemented a prototype system of code annotation and retrieval based on crowd intelligence:First,we propose a code retrieval algorithm based on user natural language query extension in terms of user intention understanding.We first train a word vector model through machine learning,then find out the key points of user search through keyword extraction and word analysis,and then extend the user's query keywords and use the extended user query statement to carry out code retrieval,so as to improve recall rate and accuracy of code retrieval.Second,we propose a code retrieval algorithm based on code annotation to solve the semantic gap between natural language and code language.We first use of collection Codepedia platform software project related quality code label,then put the code annotations associated with code fragments,by extracting key words in the comment with the user in the query keyword matching,understand user intent,solve the semantic gap of natural language and code language,improve code recall ratio and accuracy of retrieval,and then use Elastic Search Search engine to improve the efficiency of code retrieval.Thirdly,in the aspect of platform,we build a code retrieval and reuse system based on swarm intelligence.My code based on group intelligence retrieval multiplexing is distributed in the form of crowdsourcing task,let the developers and students to participate in the platform of crowdsourcing tasks,at the same time of reading the best open source code for open source code to add high quality code mark,for this,we set up a series of perfect mechanisms include: user guide mechanism,code labeling mechanism,user feedback mechanism,incentive mechanism,etc.

Keywords/Search Tags:

Software Reuse, Crowd Intelligence, Keyword Expansion, Code annotation, Code Retrieval

PDF Full Text Request

Related items

1	Research And Implementation Of Automatic Code Summarization And Retrieval Technology For Open Source Reuse
2	Research On The Attack And Defense Techniques Of Code Reuse
3	Change-History-based Automatically Fixing Of Code Internal Quality Issues
4	Research Of Annotation Positioning Technology In The Process Of Code Evolution In Git Repertory
5	A Research On Code-reuse Attacks And Detection Techniques
6	Research On Key Techniques Of Software Binary Code Reuse
7	A Code Description Semantics Vector Based Java Code Search
8	Research On Code Reuse Attack Protection Technique Based On Virtual Machine Monitor
9	A centralized object-relational database-based code services retrieval system tool for software reuse
10	Facilitating internet-scale code retrieval