Study And Implementation Of Multidimensional Open Source Crowdsourcing Code Annotaiton Evaluation Method

Posted on:2021-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:R M Wang

Full Text:PDF

GTID:2518306548494654

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As a product of group collaborative innovation,the rapid development of open source software project has accumulated massive high-quality resources,providing a solid foundation for software innovation learning and innovation practice.However,the rapid iteration and overall development of the project also brings challenges to the retrieval and reuse of project resources.Most search engines and the open source community currently obtain open source code by searching for keywords,but these keywords are mostly code-based.Therefore,when developers do not know how to implement a method,they cannot search the code to get the snippet they want.If the quality of the project comments is low,At this moment,the code comments are not helpful to the user,and the user needs to spend a lot of time analyzing the code.Therefore,effectively evaluating and improving code annotation quality is an important way to improve code reuse rate,development efficiency and software retrieval efficiency.Based on the excellent open source projects in Github and code annotation,We construct an annotation quality evaluation method based on code structure and annotation semantics.In addition,we built an online code annotaiton platform CodePedia,and organized large-scale code annotition competition based on the platform.By designing thes scoring mechanism,The method of marking quality assessment was used to mark the quality assessment of the competition.The main contributions of this paper mainly include the following three aspects:The first is the annotation importance assessment method based on code structure characteristics.We propose an annotated importance assessment method based on code structure features,which relies on the good structure and semantics of the code,extracts code structure features and code semantic features from the context code of the current line of code as the main basis for evaluating annotation importance,and trains the annotated importance assessment model.The second is multidimensional crowdsourcing annotation quality evaluation method based on code ssemantic.we put forward a multidimensional crowdsourcing labeling assessment method based on readability,completeness and accuracy of code comment.The assessment of annotation accuracy depends on keywords extracted from code annotations and code syntax analysis.In the method of assessment of annotation readability,we built a N-Gram model,and created the readability of the formula based on the degree of confusion.In the method of assessment of annotation completeness,we extract keywords of each annotation type based on the law of each annotation.Third,in the aspect of application,we built a reusable code retrieval system based on group intelligence CodePedia,through rational game content design,stage design and rating of mechanism design,successfully hosted the national contest of open source code annotation based on CodePedia platform.In addition,we designed the process processing labeling based on clone detection and useless annotation screening,and integrated the labeling evaluation method with expert scoring,labeling importance evaluation and labeling quality evaluation.

Keywords/Search Tags:

Code Comment, Code Comment Quality, Open Source, Crowdsourcing

PDF Full Text Request

Related items

1	Research On Automating Just-In-Time Code Comment Update
2	Automatic Code Comment Generation Model Based On GRU+GCN Dual Encoder
3	Research On Code Recommendation And Comment Generation With Context Information
4	Research On Relationship Between Code Quality And Software Defects For Open Source Software
5	Research On Code Comment Generation Mettod Based On Neural Network
6	Automatic Generation Of Code Comments Combining Comment Reuse And Program Parsing
7	Research On Code Search Technology Based On Features Of Code And Comment
8	Research On Encoder-Decoder Based On Auxiliary Syntax Information For Code Comment Generation
9	Research And Implementation On Open Source Code Annotation And Retrieval Technology Based On Crowdsourcing Intelligence
10	Research On Detection Methods Of Reused Open-source Code