Font Size: a A A

Automatic Classification And Comprehension Techniques Of Software Behaviors

Posted on:2020-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:1368330605950420Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Analyzing and understanding the software behavior properly are the pre-requisites for completing various tasks of software maintenance and development.However,the sheer size and complexity of modern software makes software behav-ior analysis a very time-consuming and difficult task.To solve this problem,we propose a series of software behavior automatic classification and comprehension technologies based on the program execution trace,aiming at helping program-mers to understand the program behavior more accurately and thus improve the efficiency of debugging and software maintenance.Our technique consists of two parts:a multi-label program-execution-trace-based failure report classification technique,and a hierarchical program execution trace abstraction technique.In practice,when a program fails,a crash report containing execution trace would be sent to the software vendor for diagnosis.However,because failure re-ports typically contain a large amount of program execution trace that is difficult to understand and they may be categorized differently based on different criteria,manually analyzing software failure reports and classifying them into report buck-ets are time-consuming and difficult tasks.To improve the classification efficiency of failure reports and assist developers understanding the program behavior,in the past decades,software engineering researchers have proposed many program ex-ecution trace classification techniques.However,the existing program execution trace classification techniques are built upon the single-label assumption,which is,one failure report is only labeled with one fault type.Our experiment results show that the single-label assumption is often unrealistic:in practice,the inher-ent characteristics of software behavior,such as multiple faults that contribute to failures and fault interactions,may negatively influence the effectiveness of these techniques.In this paper,we relax this assumption and empirically investigate the performance of these new approaches on the failure classification task under different application settings.We conducted experiments using eight classifica-tion techniques on five subject programs with more than 8,000 faulty versions to investigate how each such technique accounts for the intricacies of software behavior.Our experimental results show that multi-label techniques provide im-proved accuracy over single-label.We also evaluated the eficiency of the training and prediction phases of each technique,and offer guidance as to the applicability for each technique for different usage contexts.Further,we propose a composite algorithm named MLL-GA which combines various multi-label learning algorithms by leveraging genetic algorithm(GA).In total,we combine 12 multi-label learning algorithms,such as binary relevance(BR),random k-labelsets(RAKEL),ensembles of classifier chains(ECC),and MI.KNN.To evaluate the effectiveness of MLL-GA,we perform experiments on 6 open source programs,i.e.,teas,printtokens,printtokens,replace,flex,and grep.We show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665.The experiment results show that on average across the 6 datasets,multi-label classifi-cation techniques are more effective than the single-label classification techniques.Further,we investigate the efficiency of different classification techniques.Based on the result,we present actionable suggestions on applying these techniques under different scenarios.Based on the categorized crash reports,in order to assist developers under-standing the incorrect behavior of software,we propose a hierarchical program execution trace abstraction approach.Our approach abstracts the software be-havior into functionality phases laying under different levels and as different gran-ularity.It also maintains the relationship between functionality phases across the whole program execution trace.This approach builds multi-level abstractions and identifies frequent behaviors at each level based on the input execution traces,and then,it labels phases within individual execution traces according to the identified major functional behaviors of the system.To validate our modeling approach,we developed the prototype tool Sage and conducted a case study on a large-scale subject program,JAVAC,to demonstrate the effectiveness of the mining result.In the case study,SAGE shows how our method can label functionality phases at different granularities from a large number of method invocation events in the program execution trace.Furthermore,we also analyzed the software behavior ability and the efficiency of SAGE through quantitative research.The results show that our approach is capable of presenting users a high-level com-prehensible abstraction of execution behavior with 70%accuracy.
Keywords/Search Tags:Software Behavior, Program Execution Trace Analysis, Failure Report Classification, Program Comprehension
PDF Full Text Request
Related items