An Approach To Analyzing And Presenting The Differences Between Lines In A Hunk

Posted on:2022-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:X Gong

Full Text:PDF

GTID:2518306323460444

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the process of software development and maintenance,a large number of code changes are submitted to the version management platform every day.It is necessary to read and understand the code changes in the software development process,and it is more and more difficult to identify the code differences manually.Nowadays,the demand of software function is more and more diversified,and the software architecture is also increasing.If we can automatically compare and analyze the differences of code changes,we can separate the differences of code changes from the source code,which is helpful to read and understand the code changes.At the same time,it is helpful for developers to study the problems related to code changes,and to understand the evolution of software.Currently,the manual understanding code change is mainly carried out on the hunk set provided by the text code differentiation tool.Some tools also conduct secondary differential analysis for each hunk,and show the results in the side by side view,so that users can view the changes within hunk.However,the results obtained by the existing secondary differentiation analysis tools generally have the problem of statement mismatch,that is,improper matching between deleted and added lines in hunk and the split of token in the matched statement,which affects the change of understanding the fact.Firstly,the distribution of code difference mismatch problem is investigated and studied,which proves the universality of the problem in the secondary differential analysis;then,the reasons for the mismatch are analyzed and an improved algorithm is proposed.Finally,the results are displayed in the display tool based on Eclipse Plug-in,and the algorithm tool is shown in the design of Eclipse Plug-in The open source project has been verified by experiments.The algorithm is based on light syntax analysis.Firstly,the coarse-grained text differentiation algorithm is used to analyze the version source file,and the change hunk set of the version source file is obtained.Then,the text line in each hunk of the set is mapped to a statement line,and the statement line is recognized as a token based sequence.Secondly,all the delete statements and add statements are matched for similarity The longest common subsequence algorithm is used to obtain the internal difference token of the matched statement.In the analysis of similarity matching between sentences and differentiation within sentences,one word is taken as the minimum unit of comparison.At present,the algorithm is implemented in Java language,and the results are displayed in a self-made eclipse display plug-in.Experiments on five open source projects show that the algorithm can effectively overcome the mismatch between statements and lines in hunk and the token splitting problem.The results of variance analysis show that the overall accuracy of the algorithm is between 83% and 87%,and the accuracy of kdiff3 tool is between 70% and 78%,and the accuracy of beyond compare4 is between 71% and 76%.

Keywords/Search Tags:

Code change, code change differentiation analysis, secondary differentiation analysis, token

PDF Full Text Request

Related items

1	Research On Program Change Impact Analysis Based On Code Text Analysis
2	Techniques Of Evaluating Software Evolution Based On Code Change Detection
3	Research And Implementation Of Analysis Method Of Program Change Based On Source Code
4	Pyreview:A Python Source Code Analysis Tool Based On Abstract Syntax Tree Differencing Algorithm
5	An Incremental Approach To Intermediate Representation Generation Based On Code Change Analysis
6	Code Bad Smell Detection Using Software Evolutionary Data Mining
7	Static IO Analysis In DFT/ADG Software
8	Change Impact Analysis And Its Applications For Aspect-Oriented Programs
9	Studying On Programming Differentiation Strategies Of Television Channels
10	The Research On Implement Technique Of Automatic Differentiation Tools