Font Size: a A A

An Approach To Analyzing And Presenting The Differences Between Lines In A Hunk

Posted on:2022-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:X GongFull Text:PDF
GTID:2518306323460444Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the process of software development and maintenance,a large number of code changes are submitted to the version management platform every day.It is necessary to read and understand the code changes in the software development process,and it is more and more difficult to identify the code differences manually.Nowadays,the demand of software function is more and more diversified,and the software architecture is also increasing.If we can automatically compare and analyze the differences of code changes,we can separate the differences of code changes from the source code,which is helpful to read and understand the code changes.At the same time,it is helpful for developers to study the problems related to code changes,and to understand the evolution of software.Currently,the manual understanding code change is mainly carried out on the hunk set provided by the text code differentiation tool.Some tools also conduct secondary differential analysis for each hunk,and show the results in the side by side view,so that users can view the changes within hunk.However,the results obtained by the existing secondary differentiation analysis tools generally have the problem of statement mismatch,that is,improper matching between deleted and added lines in hunk and the split of token in the matched statement,which affects the change of understanding the fact.Firstly,the distribution of code difference mismatch problem is investigated and studied,which proves the universality of the problem in the secondary differential analysis;then,the reasons for the mismatch are analyzed and an improved algorithm is proposed.Finally,the results are displayed in the display tool based on Eclipse Plug-in,and the algorithm tool is shown in the design of Eclipse Plug-in The open source project has been verified by experiments.The algorithm is based on light syntax analysis.Firstly,the coarse-grained text differentiation algorithm is used to analyze the version source file,and the change hunk set of the version source file is obtained.Then,the text line in each hunk of the set is mapped to a statement line,and the statement line is recognized as a token based sequence.Secondly,all the delete statements and add statements are matched for similarity The longest common subsequence algorithm is used to obtain the internal difference token of the matched statement.In the analysis of similarity matching between sentences and differentiation within sentences,one word is taken as the minimum unit of comparison.At present,the algorithm is implemented in Java language,and the results are displayed in a self-made eclipse display plug-in.Experiments on five open source projects show that the algorithm can effectively overcome the mismatch between statements and lines in hunk and the token splitting problem.The results of variance analysis show that the overall accuracy of the algorithm is between 83% and 87%,and the accuracy of kdiff3 tool is between 70% and 78%,and the accuracy of beyond compare4 is between 71% and 76%.
Keywords/Search Tags:Code change, code change differentiation analysis, secondary differentiation analysis, token
PDF Full Text Request
Related items