Font Size: a A A

Study Of Software Self-admitted Technical Debt Predictive Approach Based On LDA And Cross Oversampling

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2428330611996874Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of software engineering,software systems have become more and more complex,software self-admitted technical debt has been greatly concerned by industry and academia.The software self-admitted technical debt refers to the entire software development lifecycle,developers in order to pursue the short-term benefits of the project,may be intend to choose shortcuts to complete the code implementation as soon as possible.This compromise can lead software developers to submit imperfect,reworked code that generates errors,or is only a temporary solution.After years of research,researchers have come up with some models and algorithms for identifying software self-admitted technical debt,but some recognition patterns are extracted by hand and the class imbalance problem is not considered.In view of the above problems,and based on the impact of the class imbalance,taking the recognition effect of software self-admitted technical debt as the starting point,studies extracting distinguishing words of software self-admitted technical debt and the class imbalance problem respectively.The main contributions of this thesis include the following two aspects:1)In the software self-admitted technical debt identification model problem,the past work picked up the identification patterns of software self-recognition of technical debt by a simple manual selection,only the 62 identification patterns of software self-admitted technical debt has been picked out.In view of this problem,the LDA(Latent Dirichlet Allocation)algorithm is proposed to extract the distinguishing words that identify the software self-admitted technical debt.The results show that the LDA efficiently expands other hidden distinguishing words compared with the way of manual extraction.2)In a traditional binary-class or multi-class imbalance classification problem,the prediction results of traditional classifiers tend to favour the majority category,which leads to the poor prediction effect of minority class.In view of this problem,the method of cross-oversampling is proposed in this thesis.By dismantling samples in minority class,this algorithm constructs a certain proportion of virtual samples to increase the number of samples in minority class,thus effectively extending the data of software self-admitted technical debt,and using feature selection to construct multiple classifiers to identify self-admitted technical debt.The experimental results indicate that in comparison with priormethods,this algorithm not only has the ability to expand the identification patterns of software self-admitted technical debt,but also improves the identification performance to a certain extent.Through the LDA and cross-oversampling methods,not only has the software selfadmitted technical debt distinctions been effectively expanded,but also the class imbalance has been improved,and the algorithm's recognition ability has been improved to a certain extent.
Keywords/Search Tags:Self-admitted technical debt, Software engineering, Class imbalance, Cross oversampling, Feature selection
PDF Full Text Request
Related items