Font Size: a A A

Research On Technical Debt Detection And Classification Methods Based On Code Comments

Posted on:2021-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330605481160Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of information technology,computer software has been applied to different domains in society.However,due to the factors of resource,deadlines and business,the code provided by developers is not always standard,which affects the quality of code.Technical debt refers to the consequences that developers seek short-term gains(e.g.,software release plan,budget,etc.)at expense of long-term code quality.To help developers identify technical debt in software,researchers have proposed a variety of automatic or semi-automatic methods.At present,detecting technical debt by code comments is the most popular method.However,there is still room for improvement about the effectiveness of these methods,since self-admitted technical debt comments have the characteristics of length variability,low proportion and style diversity.In addition,some existing studies mainly focus on the detection rather than the classification of the technical debt.In practice,developers usually need to further classify the detected technical debt,which is a very time-consuming task.To solve these problems mentioned above,this paper conducts a research to deeply analyze the code comments indicating technical debt for helping developers efficiently defect technical debt.The main contributions of this paper are as follows:(1)To detect technical debt,this paper proposes a new approach based on the bidirectional long short-term memory networks with the attention mechanism.In this approach,we first preprocess the code comments to filter out the noise data.Then,we use Glo Ve model to obtain word embedding.Finally,a classifier based on the bidirectional long short-term memory networks with the attention mechanism is built to automatically learn features from these encoded comments.When training the classifier,we adopt a balanced cross entropy loss function to break out the class imbalance problem.Meanwhile,in order to enhance the expansibility of this approach,we apply genetic algorithms to search the optimal or approximate optimal value for the balance factor of the balanced cross entropy function.We experimentally investigate the performance of the approach in a real dataset.Experimental results show that the approach achieves 81.75% in terms of precision,72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the best baseline method by 8.52%,5.24% and 6.64%,respectively.(2)To effectively identify different types of technical debt,this paper proposes a new approach based on XGBoost to classify the self-admitted technical debt into multiple classes.In this approach,we first preprocess the code comments to filter out the noise data.Then,we adopt data augmentation to increase the samples of the small class.Besides,we apply feature selection to process the code comments.Finally,a classifier based on XGBoost is built.We experimentally investigate the performance of the approach based on XGBoost in a real dataset.Experimental results show that the approach achieves 63.14% in terms of macro-averaged precision,56.37% in terms of macro-averaged recall and 56.25% in terms of macro-averaged F-measure on average and outperform the best baseline method by 12.29%,3.77% and 6.46%,respectively.
Keywords/Search Tags:Self-admitted Technical Debt, Deep Learning, Natural Language Processing, Long Short-Term Memory, XGBoost
PDF Full Text Request
Related items