| Vulnerabilities introduced by reusing third-party code base or shared code writing logic are called code clone vulnerabilities.Research shows that code clone vulnerability exists widely in real software,which poses a great threat to the security of the system.It is crucial to discover and patch the vulnerabilities as soon as possible.However,due to the high similarity between the code that contains the vulnerability and the code that has been repaired,the existing code clone vulnerability detection methods represented by Re De Bug and VUDDY cannot distinguish the two effectively,resulting in a large number of false positives.In addition,developers often modify the code for various needs when reusing the code,making the clone code and the original code syntactically different.However,existing vulnerability discovery methods that operate at the block and function level granularity are unable to identify code clone vulnerabilities that differ significantly from the original vulnerability,resulting in false negatives.To solve the above problems,this thesis proposes a fine-grained clone vulnerability discovery approach using code similarity analysis.This approach uses security patches to add vulnerability fix information to the vulnerability fingerprint as a component of the vulnerability fingerprint,and treats target functions matching the vulnerability fix information as repaired code clones when detecting vulnerabilities,thus accurately distinguishing the code clone containing the vulnerability from the repaired code clone.Besides,the approach works at line-level granularity,focusing more on the lines of code associated with generating the vulnerability rather than the entire function when constructing the vulnerability fingerprint.And this thesis uses a fuzzy matching algorithm to filter out certain irrelevant modifications in the clone code,so that clone vulnerabilities with certain differences from the original ones can be identified.Specifically,the approach first defines an enhanced vulnerability fingerprint,which consists of three components that are strongly related to vulnerability formation and repair,called the vulnerable source lines component,vulnerable context lines component,and vulnerability fix hunk component.Subsequently,this paper proposes a triple matching algorithm for the defined fingerprint to compare the target fingerprint with three components of the vulnerability fingerprint respectively.When the target function contains vulnerable source lines component of the vulnerability fingerprint,the similarity with the vulnerable context lines component exceeds a certain threshold,and does not contain any vulnerability fix hunk,the target function will be regarded as a potential clone vulnerability.In order to evaluate the performance of the approach,this paper compares the tool VCCD(Vulnerable Code Clone Discoverer),implemented based on the above scheme,with two baseline methods,Re De Bug and VUDDY,in a comparative experiment on ten popular open source projects from different domains in the real world.The experimental results show that VCCD is able to achieve the best F1-score of 89% in acceptable time,while the F1-scores of Re De Bug and VUDDY are 75% and 79%,respectively. |