Font Size: a A A

Research On Key Technologies Of Defect Analysis In Software Engineering

Posted on:2019-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1368330548977399Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of modern software engineering,bugs are inevitable,and have become a key factor that affect the efficiency and quality of software development.Software defect anal-ysis technology is one of the most important activities in software maintenance.How to find and fix defects as early as possible has become a research hotspot in the field of software engineering Software defect prediction technology can predict software components that are more likely to be buggy based on the past history of classes,methods,or certain other code elements,to help de-velopers find defects early.For bugs that are already submitted to the issue tracking,bug report management technology can help developers fix the bugs as soon as possible.Though there are many studies focused on software defect analysis technology,but still faces some difficulties and challenges:1)Existing defect prediction technology can not solve the cross-project defect predic-tion problem well;2)The performance of existing software vulnerability prediction technologies are not well enough to be used in the practice;3)The existing bug report localization technologies cannot analyze the deep semantic relationships between source code and bug report,thus affect the accuracy of bug report localization;4)There exists numerous duplicate or similar bugs,but information about bugs frequently encountered by developers are hard to obtain.To address the above problems,this thesis focuses on the key technologies of mining fre-quently encountered bugs,defect prediction and bug report localization,and proposes a series of automation technologies to provide software developers with high-performance automation tools and optimize software defect discovery and repair process.The contributions of this thesis are summarized as follows:1.To solve the cross-project software defect prediction problem,we empirically investigated the performance of 7 composite algorithms that integrate multiple machine learning classi-fiers:average voting(Ave),Max,CODEPLogistic.BaggingJ48,BaggingNaive,BoostingJ48,BoostingNaive,and Random Forest(RF),and compared with state-of-the-art method CODE-P proposed by Panichella et al.and LR proposed by Zimmermann et al.The experiment results show that Max and BaggingJ48 perform better than others.2.Software vulnerabilities are a kind of software defects that may affect software security.We propose a novel approach VULPREDICTOR to predict vulnerable files;it analyzes software metrics and text mining together to build a composite prediction model.The experiment re-sults shows that VULPREDICTOR performs better than state-of-the-art approaches proposed by Walden et al.and VULPREDICTOR's 6 underlying classifiers.What's more,we find VUL-PREDICTOR also per-forms better than Max and BaggingJ48 which are proved to lead a better performance than other composite algorithms in the empirical study.Then,we investigate the performance of state-of-the-art machine learning algorithm Factorization Machines(FM)in vulnerability prediction,and the experiment results show that the performance of FM and VULPTEDICTOR on vulnerability prediction is similar.3.We propose a multi-abstraction bug report localization technique named MULAB,MULAB extracts the text information in source code and bug reports,and represents at multiple ab-straction levels.MULAB leverages multiple topic models to capture representations of documents at different abstraction levels,and compute the similarity between a concern and a code unit.We propose 12 variants of MULAB by using different data fusion methods.Ex-periment results show that our best variant of multi-abstraction approach outperforms PR,which is a state-of-the-art approach proposed by Scanniello et al.,by a substantial margin.4.We propose frequently encountered bugs(FEBugs),which refers to the software bugs that are often encountered in the development process and may affect many developers.We propose a novel model named RFEB which can analyze numerous posts on Stack Overflow about software defects and use iterative query refinement technique to mining the top N FEBugs given a specific domain of interest.We evaluate RFEB on 10 domains,and demonstrate that RFEB performs better than Stack Overflow's search engine in terms of NDCG10.
Keywords/Search Tags:Software Defect, Defect Prediction, Frequently Encountered Bugs, Vulnerability prediction, Bug Report Localization, Bug Fixing
PDF Full Text Request
Related items