Font Size: a A A

Research On Software Multi-feature Defect Location Method Based On Information Retrieval

Posted on:2022-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiFull Text:PDF
GTID:2518306524490804Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the software development process,users or testers will submit a defect report describing the related problem after discovering an abnormality in the software under test.Developers need to carefully analyze the defect report and view numerous source codes to find the cause,which requires a lot of time and effort.In order to improve the efficiency and productivity of the entire software team,researchers have proposed automated defect location methods and tools to locate these source code files containing defects for developers to use.The defect location method based on information retrieval extracts basic information from the defect report and source code,calculates the similarity between the defect report and the source code,sorts the source code files,and realizes the automatic recommendation of the source code files with defects.However,these technologies still have more room for improvement in performance,and it is difficult to be popularized and applied in practice.In response to the above problems,this article takes defect reports and source code as the research object,from the perspective of improving the query source and application feature analysis,researches the multi-feature defect location method based on information retrieval to improve the performance of defect location.The main research contents and results of this paper are as follows:First,because the current defect location method based on information retrieval has many bad query problems,based on the research of existing classification reconstruction methods,an automatic query reconstruction method for bad queries is proposed.In the process,text attachments are selected for expansion,which alleviates the problem of excessively large number of bad queries.Then,the verb-object phrase filtering method based on two heuristic rules is used to reduce the words that may become noise during query reconstruction,improve the input query source,and improve the quality of the bad query of the defect report.Second,in view of the language mismatch between the defect report containing natural language and the source code containing programming language,the feature analysis method is used to improve the defect location model,from text similarity,defect tendency,collaborative filtering and version Variation and other aspects,combined with the improved vector space model r VSM for multi-feature extraction,using a multi-layer perceptron to link the text similarity between defect reports and source code files with high-level abstract concepts to obtain a potential similarity.Realize the sorting recommendation of suspicious source code files.Third,in view of the semantic mismatch and insufficient representation in existing research,a defect location solution based on word embedding and multi-scale convolutional neural network MCNN is proposed.First,the word2 vec pre-training model based on wiki corpus and MCNN automatic To extract semantic features,a non-fixed size multi-scale convolution kernel is used to solve the problem of insufficient deep-level feature extraction of text.Then extract the structural features of the source code based on the abstract syntax tree and MCNN to avoid the lack of syntactic and structural features of the source code.The neural network is used to fuse multiple similarity features to obtain the similarity measurement between the defect report and the source code.In addition,by introducing the Focal loss loss function,the problem of category imbalance in defect location is solved.Compared with other models in the experiment,the verification model has relatively better software defect location performance.
Keywords/Search Tags:information retrieval, software defect location, vector space model, word embedding, convolutional neural network
PDF Full Text Request
Related items