Font Size: a A A

Research On Contribution Review Of Crowdsourcing Collaborative Development

Posted on:2018-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2428330623450963Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In GitHub,an increasing number of projects are using the Pull-based collaborative development model.This development model has inspired the creativity and enthusiasm of the crowd contributors;any developer can contribute code for the project that they are interested in by submitting a PR to the project's main repository.Because the program-ming levels of the crowd contributors are uneven,in order to ensure the quality of the code they contribute,each PR has to go through a rigorous review process before get-ting accepted.Unlike the traditional code review model,contribution review in GitHub,which is integrated with the development process of PR,is a socialized and lightweight contribution review model.It is more transparent,open and free,and any community member can participate in the review of contribution and provide feedbacks.These char-acteristics facilitate the interactions of participants during the review process and improve the reviewers' work efficiency,which leads to this model being widely adopted by the GitHub projects.However,there are also problems with this kind of contribution review model.Due to the lack of a unified coordination mechanism and strict execution steps and norms,the efficiency and quality of contribution review in GitHub is difficult to guaran-tee.Therefore,to explore efficient and reliable crowd contribution review technologies is critical for collaborative software development.Based on the crowd contribution data in GitHub,this thesis explores and studies the efficiency and quality of crowd contribution review.The main work and contributions are summarized as follows:Firstly,in terms of the duplicate of crowd contribution review,the problem of sub-mitting duplicate PR was first proposed and an automated detection method was provided.For a new submitted PR,we extract its textual information and change information,calcu-late the similarities between it and the historical PRs,and return a list of PRs that are most similar to it.With this list we expect to prevent duplicate review on duplicate contributions and thereby avoid additional review redundancy.Secondly,in terms of the thoroughness of crowd contribution review,a model was proposed to automate the identification of review topics covered by the review process of a PR.We first construct a two-level hierarchical taxonomy according to which we labeled a high-quality dataset of review comments.Furthermore,our proposed two-stage hybrid classification model can automatically identify the review topics involved in each review process under the training of the dataset.This method can present what types of reviews a PR has undergone and assist the reviewers to determine whether the review process of a PR is through and to make more targeted review comments.Thirdly,in terms of the compatibility of the crowd contribution review,we proposed to recommend cross-project reviewers for crowd contributions.First,we build a software relation database using the tagging data in Stack Overflow and the references data among contributions in GitHub.Based on this relation database,we recommend the core mem-bers and active contributors of the related projects to participate in the review of the crowd contributions of a project.Therefore,reviewers can more broadly understand the expec-tations from various stakeholders and make decisions that are more compatible with their needs.
Keywords/Search Tags:Crowd collaborative development, Contribution review, Duplicate contribution detection, Review topic identification, Cross-project reviewer recommendation
PDF Full Text Request
Related items