Font Size: a A A

Research On Commit Classification Based On Combined Features And Combined Classifiers

Posted on:2021-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:S T WangFull Text:PDF
GTID:2428330614463858Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of the Internet industry,a growing number of developers are developing their own software development work on project hosting platforms represented by Git Hub,and more and more researchers are beginning to research software development related problems by analyzing the open source code repository on Git Hub.For example: social diversity in software development teams,prediction of the Pull Request duration,prediction of the relationship between commit and issue,and the impact of developer gender on the acceptance rate of Pull Request.There are many activities in the project version control system,and the commit is the most common one.Understanding the various activities performed during the software development process can improve the developer 's collaborative development efficiency,allocate various resources effectively during software development and maintenance,and reduce unnecessary overhead.At the same time,developers can observe the software development progress in the recent period more intuitively.Developers and maintainers can plan and allocate resources in advance to improve the efficiency of source code maintenance,and reducing number of uncertain events during development,and increase cost-effectiveness accordingly.To understand the commit activities,first classify them,so this article studies the commit classification method to improve the performance of classification tasks.This article finally classifies commits into three types: Corrective,Perfective,and Adaptive.First,a commit classification method based on fusion of the text features of the commit message and the features of the source code change and file change is proposed.Extract the text features of the commit message through the BERT deep learning model,obtain the source code change and file change features through the Change Distiller tool and mining the local code warehouse.Hereafter combine the two kinds of features that represented in the vector form.Finally,the SVM model based on the combined features is used to construct a classifier to implement the commit classification.Secondly,a commit classification method which combined multi-classifiers with confidence is also proposed.First,a classifier classified according to the commit message is established based on the BERT deep learning model,another classifier classified according to the source code change features and file change features is established based on the deep neural network.And then the confidence is calculated to combine the classifiers.Finally,the final predictive classification result is obtained by combining the classifiers.Thirdly,based on the above two commit classification methods,those experiments are executed on the open source projects' commits history dataset.The experimental results show that the accuracy of the commit classifier based on combined features is 78% and the Kappa coefficient of it is 69.2%;the accuracy of the commit classifier based on combined classifiers is 81% and the Kappa coefficient of it is 71.3%.Compared with the result of Levin et al.,The accuracy of the commit classifier based on combined classifiers is improved by 5%,and the Kappa coefficient of it is increased by 4.5%.Our study effectively prove that the context information of commit message,source code change and file change information of the commit play a vital role in the commit classification task.
Keywords/Search Tags:Commit Classification, Mining Software Repositories, Predictive Models, Natural Language Processing, Combine Classifiers
PDF Full Text Request
Related items