Research On Technical Debt Detection And Classification Methods Based On Code Comments

Posted on:2021-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2428330605481160

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of information technology,computer software has been applied to different domains in society.However,due to the factors of resource,deadlines and business,the code provided by developers is not always standard,which affects the quality of code.Technical debt refers to the consequences that developers seek short-term gains(e.g.,software release plan,budget,etc.)at expense of long-term code quality.To help developers identify technical debt in software,researchers have proposed a variety of automatic or semi-automatic methods.At present,detecting technical debt by code comments is the most popular method.However,there is still room for improvement about the effectiveness of these methods,since self-admitted technical debt comments have the characteristics of length variability,low proportion and style diversity.In addition,some existing studies mainly focus on the detection rather than the classification of the technical debt.In practice,developers usually need to further classify the detected technical debt,which is a very time-consuming task.To solve these problems mentioned above,this paper conducts a research to deeply analyze the code comments indicating technical debt for helping developers efficiently defect technical debt.The main contributions of this paper are as follows:(1)To detect technical debt,this paper proposes a new approach based on the bidirectional long short-term memory networks with the attention mechanism.In this approach,we first preprocess the code comments to filter out the noise data.Then,we use Glo Ve model to obtain word embedding.Finally,a classifier based on the bidirectional long short-term memory networks with the attention mechanism is built to automatically learn features from these encoded comments.When training the classifier,we adopt a balanced cross entropy loss function to break out the class imbalance problem.Meanwhile,in order to enhance the expansibility of this approach,we apply genetic algorithms to search the optimal or approximate optimal value for the balance factor of the balanced cross entropy function.We experimentally investigate the performance of the approach in a real dataset.Experimental results show that the approach achieves 81.75% in terms of precision,72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the best baseline method by 8.52%,5.24% and 6.64%,respectively.(2)To effectively identify different types of technical debt,this paper proposes a new approach based on XGBoost to classify the self-admitted technical debt into multiple classes.In this approach,we first preprocess the code comments to filter out the noise data.Then,we adopt data augmentation to increase the samples of the small class.Besides,we apply feature selection to process the code comments.Finally,a classifier based on XGBoost is built.We experimentally investigate the performance of the approach based on XGBoost in a real dataset.Experimental results show that the approach achieves 63.14% in terms of macro-averaged precision,56.37% in terms of macro-averaged recall and 56.25% in terms of macro-averaged F-measure on average and outperform the best baseline method by 12.29%,3.77% and 6.46%,respectively.

Keywords/Search Tags:

Self-admitted Technical Debt, Deep Learning, Natural Language Processing, Long Short-Term Memory, XGBoost

PDF Full Text Request

Related items

1	Research On Natural Language Syntactic Parsing Based On Deep Learning
2	Intelligent Device Text Classification Method Based On Natural Language Processing
3	Dependency Parsing Research Model Based On Deep Learning
4	Machine Learning-based Financial Analysis
5	Research Of Dialogue Generation Method Based On LSTM Neural Network
6	Research On Multi-Round Dialogue System Based On Deep Reinforcement Learning
7	Research On Chinese Word Segmentation Based On Deep Learning
8	Applied Study On Chinese Word Segmentation Based On Deep Learning
9	Study Of Software Self-admitted Technical Debt Predictive Approach Based On LDA And Cross Oversampling
10	Research On Chinese Word Segmentation Based On Deep Learning