Empirical Study On The Theories And Mechanisms Of Crowd-based Development For Open Source Communities

Posted on:2017-08-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Yu

Full Text:PDF

GTID:1318330536467205

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Open Source has achieved a great success with the vigorous development over the last few decades.Compared to traditional software development model,it represents a novel paradigm of software development,called crowd-based software development,supported by the internet-based computing environment.By integrating the pull-request mechanism with social media,developers can freely pay close attention to the open source projects that they are interested in,submit contribution to any repository,and discuss the code quality originated from other contributors.The large quantity of external contributors have shown great powers supported by this model,which opens a new era of decentralized software development.However,the high volume of incoming contributions from crowds poses a serious challenge to project integrators in open source communities.They are in urgent need of exploring an effective and efficient aggregation process.In this thesis,we focus on the research questions relating to development productivity and software quality,in the new context of crowd-based software development.By analyzing the big data mined from GitHub,we deeply understand the mechanisms of crowd collaboration and search for best practices from open source.Firstly,to analyze the determinants of the pull-based development model,we build comprehensive models of contribution acceptance and evaluation latency by using the multi-level mixed effects regression based on our novel metrics in 5 different levels,involving project maturity,technical factors of code patch,social aspects,evaluation workflow and continuous integration process.Our empirical findings at this part can help contributors and project integrators organizing pull-request process more effectively.This section also lay a foundation for our following researches.Secondly,we construct a quantitative study by using large,historical data on process metrics and outcomes to discern the effects of one specific innovation in process automation: continuous integration(CI).We build separate models for core developers and external contributors using zero-inflated negative binomial regression,and investigate the team productivity and software quality affected by CI.The empirical results show that,by deploying the CI service,a project could merge 20.5% more pull-requests submitted by core developers,and the rejection ratios of core developers and external developers decrease by 42.3% and 26.1% respectively.Thus,continuous integration can improve the productivity of project teams,who can integrate more code contributions.Meanwhile,CI helps core developers detecting more bugs during programming(48% increase),without an observable diminishment in user-experienced quality.Thirdly,we undertake a large-scale,fine resolution study of CI-enabled pull-request mechanism,to better understand the nature of CI,predictors of CI failures,and the relationship of CI failures to the eventual quality of code changes.Firstly,we find that CI failures appear to be concentrated in a few files,just like normal bugs.In practice,projects and developers are likely to gain tangible improvements in CI productivity by focusing attention on these files.Secondly,a mature CI process,such as more customized and more test suites,is associated with better fault detection during the CI stage.Thirdly,the use of CI in a pull-request doesn't necessarily mean the code in that request is of good quality.The code originating in pull-requests with initial CI failures(even if repaired)has higher odds of being associated with an eventual quality problem.This finding suggests that code in such pull-requests warrants extra scrutiny.Fourthly,we study the reviewer recommendation in the new context of pull-based development model,which facilitates the evaluation process by enabling the crowd-sourcing to a larger community of users.Our empirical studies firstly confirm three traditional approaches of bug triaging and code review are feasible for pull-request reviewer recommendations on GitHub.Furthermore,we propose a novel recommendation approach by mining comment networks which can capture common interests in social activities between contributors and reviewers.Finally,we combine the expertise factor with the common interest,to recommend appropriate reviewers for pull-requests using the mixed approaches,which achieves the best recommendation performance.

Keywords/Search Tags:

Open Source Community, Crowd-based Software Development, Development Productivity, Software Quality, Pull-Request, Continuous Integration, Crowdsourcing, Reviewer Recommender

PDF Full Text Request

Related items

1	Research On The Inherent Mechanisms And Approaches Of Efficient Aggregation Of Crowd Contribution For Open Source Development Ecosystem
2	Research On Relationship Between Code Quality And Software Defects For Open Source Software
3	Research On The Behavior Of Open Source Platform Software Development Based On Supervised Learning
4	Design for quality: The case of open source software development
5	Recommendation Methods And Techniques For Crowd-based Software Development
6	Research On Software Crowdsourcing In Open Source Community
7	Research On Socio-Technical Congruence Metrics For Open Source Software Quality
8	Change-History-based Automatically Fixing Of Code Internal Quality Issues
9	From coding to community: Iteration, abstraction and open source software development
10	Research On Software Recommendation Method Based On Open Source Community And User Behavior