Font Size: a A A

Empirical Study On The Theories And Mechanisms Of Crowd-based Development For Open Source Communities

Posted on:2017-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:1318330536467205Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Open Source has achieved a great success with the vigorous development over the last few decades.Compared to traditional software development model,it represents a novel paradigm of software development,called crowd-based software development,supported by the internet-based computing environment.By integrating the pull-request mechanism with social media,developers can freely pay close attention to the open source projects that they are interested in,submit contribution to any repository,and discuss the code quality originated from other contributors.The large quantity of external contributors have shown great powers supported by this model,which opens a new era of decentralized software development.However,the high volume of incoming contributions from crowds poses a serious challenge to project integrators in open source communities.They are in urgent need of exploring an effective and efficient aggregation process.In this thesis,we focus on the research questions relating to development productivity and software quality,in the new context of crowd-based software development.By analyzing the big data mined from GitHub,we deeply understand the mechanisms of crowd collaboration and search for best practices from open source.Firstly,to analyze the determinants of the pull-based development model,we build comprehensive models of contribution acceptance and evaluation latency by using the multi-level mixed effects regression based on our novel metrics in 5 different levels,involving project maturity,technical factors of code patch,social aspects,evaluation workflow and continuous integration process.Our empirical findings at this part can help contributors and project integrators organizing pull-request process more effectively.This section also lay a foundation for our following researches.Secondly,we construct a quantitative study by using large,historical data on process metrics and outcomes to discern the effects of one specific innovation in process automation: continuous integration(CI).We build separate models for core developers and external contributors using zero-inflated negative binomial regression,and investigate the team productivity and software quality affected by CI.The empirical results show that,by deploying the CI service,a project could merge 20.5% more pull-requests submitted by core developers,and the rejection ratios of core developers and external developers decrease by 42.3% and 26.1% respectively.Thus,continuous integration can improve the productivity of project teams,who can integrate more code contributions.Meanwhile,CI helps core developers detecting more bugs during programming(48% increase),without an observable diminishment in user-experienced quality.Thirdly,we undertake a large-scale,fine resolution study of CI-enabled pull-request mechanism,to better understand the nature of CI,predictors of CI failures,and the relationship of CI failures to the eventual quality of code changes.Firstly,we find that CI failures appear to be concentrated in a few files,just like normal bugs.In practice,projects and developers are likely to gain tangible improvements in CI productivity by focusing attention on these files.Secondly,a mature CI process,such as more customized and more test suites,is associated with better fault detection during the CI stage.Thirdly,the use of CI in a pull-request doesn't necessarily mean the code in that request is of good quality.The code originating in pull-requests with initial CI failures(even if repaired)has higher odds of being associated with an eventual quality problem.This finding suggests that code in such pull-requests warrants extra scrutiny.Fourthly,we study the reviewer recommendation in the new context of pull-based development model,which facilitates the evaluation process by enabling the crowd-sourcing to a larger community of users.Our empirical studies firstly confirm three traditional approaches of bug triaging and code review are feasible for pull-request reviewer recommendations on GitHub.Furthermore,we propose a novel recommendation approach by mining comment networks which can capture common interests in social activities between contributors and reviewers.Finally,we combine the expertise factor with the common interest,to recommend appropriate reviewers for pull-requests using the mixed approaches,which achieves the best recommendation performance.
Keywords/Search Tags:Open Source Community, Crowd-based Software Development, Development Productivity, Software Quality, Pull-Request, Continuous Integration, Crowdsourcing, Reviewer Recommender
PDF Full Text Request
Related items