Font Size: a A A

Research On Tree-based Clone Detection In Web Application

Posted on:2015-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:C Q LiFull Text:PDF
GTID:2428330488999751Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Code clone is a lso called dup licate code,and it is regarded as a bad sme ll in software.In software developme nt cyc le,code clo ne introduced by progra mmers can increase the mainta ining cost.In tradit iona l software,researchers have conducted extens ive stud ies on the existence and detection of code clones,which account to 13%-20% of the total source code.W ith t he deve lopme nt of Web techno logy,Web applicat ion has been wide ly deve loped and used.Compared to the tradit iona l soft ware,Web applicatio ns introduce code clones wit h higher probabilit y.However,researchers rare ly perfor m code c lone detect ion on Web applicat ions and only use hash-code-based method to find clones,ind icat ing that a lot of c lones might be missed when using these methods.In this paper,we present the design and imp leme ntat ion of a new tree-based code clone detection method,and imp leme nt it as a too l TCD.In TCD,we use characterist ic vectors to represent t he subtrees of source code in order to reduce the tree-matching cost.In TCD,a dimens ion reduct ion met hod is proposed to decrease the cost of computat io n wit h high dime ns ions of character ist ic vectors.We also leverage the random kd forests to build an index for the characterist ic vectors and use the kNN to find the nearest neighbors that are equiva lent to code c lones.In the Web applicat ion stat ist ical ana lys is,we use 14 of the most popular Web applicat ions as our test set.In the result,we find that all of the 14 Web applicatio ns have the code clone proble m,and the clone rate in a Web app licatio n ma y rema in at the same leve l wit h the versio n changed.Further more,code clones also exist in two different Web applications.The exte nsive exper iments show t hat,in most cases,TCD can detect three type of clones effic ient ly.Further more,TCD has its own advanta ge in speed,precis ion and quant it y of detected clones as compared to other tree-based code clo ne detection tools.
Keywords/Search Tags:Web application, Code clone, Parse tree, kd tree, Dimensionality reduction
PDF Full Text Request
Related items