Font Size: a A A

Study On Detection Algorithms For Variance Code Clone And Its Applications

Posted on:2022-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:M WuFull Text:PDF
GTID:1488306323963619Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Code clone is very common in software development.Programmers usually copy,paste and modify code fragments in order to improve the coding efficiency,which generate code clones.Among these code clones,variance code clone that has much modifications is ubiquitous in software code.The variance code clone can be applied to many software engineering applications,such as software main-tenance,clone analysis,code completion,evolution analysis,bug detection and so on.However,most existing clone detectors are used to detect almost identical or very similar clones,and it's difficult for them to detect variance code clone.This dissertation studies the algorithm of variance clone detection and its applications.The main contents and innovation are as follows:1.Detection algorithm of variance code cloneAmong the existing clone detectors that can detect variance clone detection,the methods based on text and token are efficient,but the recalls of variance clones are low and they can hardly detect large-variance clone.The methods based on syntax tree or dependency graph have good detection ability for variance clone but the computational complexities are high.The detecting effects of methods based on machine learning are usually dependent on the training datasets.This dissertation proposes a new detection algorithm named LVMapper for variance code clone,based on the idea of the seed-and-extend method in bioinformatics.It proposes the innovative approaches such as dynamic thresholds,heuristic ver-ification algorithm,and seed index,which are effective for detecting clones with scattered modifications and suitable for detecting clones in large-scale datasets.The experiments on eight open source projects show that the large-variance clone detection effectiveness of LVMapper is much better than the state-of-the-art clone detectors such as SourcererCC,CCAligner and Oreo.The experimen-tal results on the widely used benchmarking dataset BigCloneBench shows that LVMapper has comparable recall and precision for Type-1 to Type-3 clones.Be-sides,LVMapper has good scalability and is scalable to 250M lines of code dataset.2.Analysis of variance code cloneUsing the variance clone detector LVMapper to detect clones for code corpus makes the clone data increase remarkably.Hence the analysis on large-scale clone data is challenging.Existing metrics of clone analysis are patchy and lack of variance clone data.It's necessary to design metrics for large-scale variance clone data in order to improve the quality of software and the maintenance.Based on the variance clone detector,this dissertation proposes the metrics for variance clone from different perspectives,which include clone degree,clone structure and clone evolution.Through analysis for different sizes(from 60K to 2.8M lines of code),different programming languages(Java and C)and three versions of open source software,it's found that the metrics can reflect the quality of software systems from differ-ent perspectives.This analysis tool is integrated into the variance clone detector LVMapper,which makes the detection and analysis for variance clone more effi-cient and can provide guidance and reference for software development.3.Code completion algorithm of variance code cloneThe efficiency of programming is important for software development.How-ever,most code completion tools can only complete code for single line.A few tools such as Aroma proposed by Facebook can complete code fragments,but it's difficult and costly for them to fully extract the structure and the features of code,Based on the algorithm of variance clone detection LVMapper,a new al-gorithm named CodeW eaver for code completion is proposed in this dissertation.Programmers just need to write the key code or code skeleton,then the algorithm searches for the candidate code fragments and provide the complete code auto-matically.The efficiency and quality of software development will be improved significantly.The experiments show that using the code snippets existing in the code cor-pus,the Top-10 recalls of CodeWeaver are more than 96%.Using the code snippets from coding Q&A website,Code Weaver can provide useful completion for more than 93%input,and the average completion time for one completion is 0.58s.Besides,CodeWeaver also has the best performance in the dataset of previous studies compared with existing tools such as Aroma.Overall,the above three pieces of work can be summarized as two parts:The first is the study of variance clone detection algorithm,the second is the applications of variance clone detection.The first two pieces of work were applied to the project with HuaWei for software development and optimization.
Keywords/Search Tags:variance code clone, code clone detection, clone analysis, code completion
PDF Full Text Request
Related items