Font Size: a A A

Code Review Time Prediction Method Based On Hidden Markov Model

Posted on:2021-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z PanFull Text:PDF
GTID:2428330614465886Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of open source code bases and version control systems,many distributed version control systems appear in everyone's field of vision.With its many open source projects,Github has become the most popular platform with more than 9 million developer users.Its functions are also constantly evolving,such as issues,continuous integration functions.Pull-Request(PR)is the main method for developers to contribute code in Github.Developers request PR to merge their code into the main branch of the project.However,it is impossible for everyone to submit code that is perfect,and there will be more or less problems.Therefore,code review is very important as an activity in PR.Code review is to allow reviewers to find out the deficiencies in the code submitted by the developer,but the code review needs to be done manually.The quality of the code submitted by the developer and the professionalism of the reviewer will affect the time required for the review,which is time-consuming Long code reviews will affect the development progress of the entire project.At present,some researchers have considered many initial attributes in PR,such as the creation time of PR and the amount of content modification.They use these initial attributes as the input attributes of their prediction model to predict the duration of the review.However,these methods only consider the initial attributes and ignore the temporal nature of a series of activities in the PR life cycle.In this paper,a new method of using hidden Markov model(HMM)and time series of developer activities is proposed.Considering the chronological order of developer activities,the key activities in PR are extracted to form a key activity sequence,and HMM is used.To classify sequences.First of all,this article obtains the complete activity history of a PR,and then through further data screening to find the key activities;in addition,this article also considers some of the initial attributes,such as the number of commits and personnel experience,etc.,these initial attributes and key activities Combining and arranging according to the chronological order in which they occur constitutes a sequence of key activities.Then we divided the active sequence into two categories according to its median duration,and trained two HMMs;inferring the final duration of the PR by comparing the probability of the new sequence appearing in the two HMMs.In addition,we have replicated the prediction methods used by other researchers,extracted many initial attributes,and used gradient boosting models(GB)to predict PR duration.Finally,based on these two prediction methods,this paper selected five open source projects on Git Hub for experiments.The results show that the model prediction accuracy rate proposed by this paper is about 70%,and the F-measure also reaches about 75%,with a maximum of 82.%;After using the GB model for prediction experiments,comparing the results,the indicators of HMM are slightly higher;the results show that this method can effectively identify and predict the duration of the PR to be reviewed at an early stage.
Keywords/Search Tags:Code review, hidden Markov model, gradient boosting model, activity sequence
PDF Full Text Request
Related items