Font Size: a A A

Research And Application Of Software Defect Prediction Based On Latent Dirichlet Allocation

Posted on:2016-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiuFull Text:PDF
GTID:2308330479985395Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software defect refers to the normal operation of software has an impact on the system error state. Researchers have proposed many methods which based on the size, complexity, programming language of software for defect prediction, and have made certain achievements. However, there are still some problems in software defect prediction, for example, in the analysis of the influence of the semantic information of software defects. This paper presents a method based on topic model to predict the defects in the software, the method combined with the software of semantic information and software size. The research object is stored in the software version control system(such as GIT) in the software source code and the defect information is collected by the Bugzilla. In our researching, topic model is used for semantic mining of the software source code. The topic model we used is the Latent Dirichlet Allocation(LDA). The main contents are as follows:① In data extraction and processing step, we implement a pre processing tool based on Lucene we called it PRETREATMENT. PRETREATMENT do the work of stemming, extracting identifier and other operations.② In order to solve the problem of synonyms and polysemy in the traditional prediction work, software defect prediction method based on semantic information mining is proposed, and we complete the preparation of the core algorithm by Eclipse.③ When used LDA for our prediction work,we made three improves. First we keep the user comments. It is because the annotation information in the process of software development is often plays a decisive role. Secondly, we defined the topic failure density which combined semantic information and failures. Thirdly, we explore the basic information of word-topic distribution and define similarity matrix.④ We investigate the capability of our proposed model in component failures prediction by using three major open source projects. We compare its prediction performance against the actual data from Bugzilla. The results show our predictor based on similarity of topics does a fine job of component failures prediction.⑤ We realized our method for component failure prediction in Sonar as a plug-in, and display the interface of our plug-in.
Keywords/Search Tags:software engineering, defect prediction, topic model, LDA
PDF Full Text Request
Related items