Font Size: a A A

Research On Learning Models For Three Prediction Problems In Open-source Software

Posted on:2019-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:1368330596958503Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Prediction problems in open-source software is one of the hot research problems in the software engineering field,which receives persistent attention from researchers and software development practitioners,especially for the largest open-source community GitHub with 57 million open source repositories.This research mainly aims to study the source code and developers' behavior logs in nuermous open source projects including project selection,code modification,defect fixing,etc.It uses software metrics to describe the participation states of developers,the change-proneness of software code and the defective code,and builds machine-learning based prediction models to provide developers with automated decision tools,help them improve development efficiency,boost the project evolution,and control the development cost.The studied three prediction problems include: 1)open-source software onboarding recommendation,it predicts which projects developers can join in successfully,helping developers search projects and avoid wasting numerous time and efforts in project learning,code understanding,and so forth.2)Software change-proneness prediction,it predicts which code files would be modified in its next version,helping developers control the development progress.3)Software defect prediction,it predicts which code files are defective,helping developers identify buggy code.These three prediction problems aim to solve three closely related fundamental problems in the open-source software.Specifically,in the project creation phase,the onboarding recommendation model helps developers filter projects,and accerate the project onboarding process.In the project development phase,the change-proneness prediction model helps developers wisely allocate development resources and guide newcomers starting to work.In the maintainance phase,the defect prediction model guides the testing resources onto the defective code,and further ensures the software product quality.In summary,the content and novelty of this dissertation are as follows:(1)For the research of open-source software onboarding recommendation,this dissertation designs nine project features to represent developers' complex onboarding decision patterns in different perspectives,and proposes a list-wise ranking model NNLRank.The NNLRank is a neural network model aiming to score candidate projects for developer onboarding.The neural network is optimized by a list-wise learning-to-rank loss function and the stochastic gradient descent method,whose derivation process and working processes are detailed in the dissertation.To verify the validity of the proposed model,we collect 2044 successful onboarding decision from Ghtorrent,a mirror of GitHub.Experiment results show that NNLRank significantly outperforms three standard learning-to-rank models SVMRank,BPNet,SVM,and an existing project prediction model LP.(2)For the software change-proneness prediction,the research content of this dissertation is the change-prone code.This research proposes a data selection based cross-project software change-proneness prediction model called SCP.This model aims to solve the instability of the existing prediction model,namely the data distribution between the source(training data)and target(testing data)projects differ largely,by directly measuring the distance between data distributional characteristics between two projects.The designed distribution measure uses the label information(i.e.,change-proneness)in data,and the unknown label in target project is estimated by a light-weight unsupervised approach.The model SCP is evaluated by 14 open source projects collected from Qualitas Corpus,and compared with the state-of-the-art change-proneness model CLAMI+ and three related models RCP,TCA+,TDS.Experiment results indicate that SCP gains substantial outperformance over compared models in terms of prediction accuracy and cost-effectiveness.(3)For the software defect prediction,the research object is the defective code in open source repositories.This research proposes a transfer learning based two-phase cross-project defect prediction model named TPTL.This model aims to build a source project estimator(made of two regression models)to select two source projects for building two transfer learning models TCA+ respectively achieving high prediction accuracy and cost-effectiveness,to finally solve the instability of the previously succeeded defect prediction model TCA+.The TPTL is verified by 42 defect datasets extracted from the PROMISE,and experimental results show substantial advantages for TPTL over the state-of-the-art defect model LT,a state-of-the-art transfer learning model Dycm,and three related models TCA+_Rnd,TCA+_All,TDS,in terms of both prediction accuracy and cost-effectiveness.This dissertation works on three fundamental prediction problems in the open-source software and proposes improved three learning models to enhance their prediction accuracy and cost-effectiveness.These prediction models aim to provide automation tools for supporting the automated decisions,avoiding of wasting time and efforts when developers make decisions,improving project development speed,decreasing development cost,and enhancing software quality.
Keywords/Search Tags:open source software, learning model, software onboarding recommendation, software change-proneness prediction, software defect prediction
PDF Full Text Request
Related items