Font Size: a A A

Research On Code Clone Detection And Clone Bug Finding

Posted on:2020-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:P C WangFull Text:PDF
GTID:1368330575466584Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Coping code and then pasting with edits is a common behavior in software devel-opment.Although this behavior can improve the efficiency and save time in software development,it will result in a large number of same or similar code,called code clone.Code clones not only cause redundancy in software system,leading to software main-tenance problems,but also affect the quality of the software,leading to the introduction and propagation of bugs.Existing code clone detection works mainly focus on identify-ing the same or almost the same clone code,which will cause incomplete clone analysis and bug screening.This dissertation focuses on the following three aspects,respectively,to carry out research works,and the main contents and contributions include:(1)Research on Algorithm for Large-gap Code Clone DetectionWhen difference in code clone is large,we call it large-gap clone.Large-gap clone can better reflect difference between two similar code,and can reflect the extension of code,such as change and improvement.Besides,it can make code clone analysis more comprehensive,and help to filter and discover bugs related to code clone.However,it is more difficult to detect such clones.Existing methods based on text and token cannot detect such clones,and tree based and PDG based methods suffer from large gap.Therefore,to address this issue,we propose a novel algorithm for large-gap code clone detection,and implement it as a tool,CCAligner,to efficiently detect large-gap clones.Different from the existing approaches,we consider continuous code fragments,instead of tokens,as basic unit for matching.Besides,we further design the e-mismatch index to enhance the detection capability of clone gaps and improve the speed/accuracy of detection.Furthermore,by using asymmetric similarity coefficient,our approach is more suitable for measuring the clone with large gaps.Our experiments show that CCAligner outperforms the most popular detectors in detecting large-gap clones.For general clone detection,CCAligner remains competitive with the best clone detectors in terms of execution time,scalability,recall and precision.(2)Clone Metric AnalysisThe number of code clones in software systems is huge,which is hard for software maintainers to understand and manage.It needs some metrics to guide the screening of interesting code clones from a large number of detection results.Based on our novel large-gap clone detector CCAligner,we can find more large-gap clones and conduct more comprehensive code clone analysis.Therefore,based on CCAligner,we further carry out the measurement and analysis of code clone.Firstly,by communicating with the software industry and combining their actual needs,we are the first to propose these three metrics to measure clone:clone strength,clone frequency and clone distance,to reflect the degree and distribution of clone.The degree of clone is too high and the distribution is too centralized,which belongs to the bad smell of software.Next,we design the corresponding algorithms to measure each metric,and integrate them into CCAligner as our clone analysis tool.Finally,we conduct an empirical study to test the effects of this tool and explore the distribution of code clones in real software systems.We conduct empirical research on different software and different versions of the same software.Our experiments show that in different software systems,software with higher quality(long development time)is significantly better than other software in our three indicators.Besides,with the evolution of the same software,the three indicators show an increasingly good trend,which means that the quality of software is constantly improving.Our clone metric analysis tool can effectively analyze clone distribution and software quality from the perspective of code clone.(3)Research on Clone-related Bug DetectionSince code clone will lead to the introduction and propagation of bugs,clone-related bug detection can help identify problems caused by clones in software systems.Clone-related bugs are diverse,and existing approaches mainly rely on manually de-signed certain features to detect.These approaches can only detect a few bug patterns,and cannot detect unknown patterns.Therefore,to overcome these issues,we propose a deep learning approach,DeepCbd,which can automatically learn as many features as possible by itself.Besides,it applies gapped clone algorithm to extract gap code as input.Since bugs can be reflected on syntactic and lexical level of code,we correspondingly consider AST and Token representations as the input of the model.In design,for AST,we use Tree-LSTM to well suit such tree structure,and for Token,we design attention mechanism to enhance the learning effect.Moreover,We concatenate each learned code feature representation with its gap code representation by using gapped clone detector CCAligner,which further enhances the ability to detect clone-related bugs.The experiments show that compared with other baseline machine learning and deep learning approaches,DeepCbd outperforms with accuracy of 93.46%.Compared with existing related works,DeepCbd can identify most of the existing found bug pat-terns,and identify three newly found bug patterns.Moreover,by conducting cross-validation and generalization verification experiments,we verify that our approach has good generalization ability and practicability.In summary,the whole work of this paper can be divided into two aspects:code clone detection and clone bug finding.In code clone detection,we have carried out the research on large-gap clone detection and clone metric analysis.
Keywords/Search Tags:Code Clone Detection, Large-gap Clone, Clone Evaluation, Clone Metric, Code Clone Analysis, Empirical Studies, Clone-related Bug, Deep Learning
PDF Full Text Request
Related items