Font Size: a A A

Research On Code Merge Conflicts Based On Pre-trained Model And Text Matching

Posted on:2024-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiFull Text:PDF
GTID:2568307136489174Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology and the ever-changing software industry,remote collaboration and the use of software code repositories are becoming more and more common,which also creates more needs,such as conflict resolution and participation in collaborative development.The staff participating in the collaborative development can fix project bugs and develop new functions by submitting Commit and Merge.However,since different developers will create different branches,merging errors will occur when merging different working branches,which is especially common in large collaborative development projects.Due to the nature of collaborative development,developers may encounter repeated conflict issues,and repeated conflict detection and resolution will reduce their productivity and increase their workload.Currently,there is a lack of automated detection tools in collaborative work software to detect duplicate merge conflicts,which makes manual detection of duplicate merge conflicts difficult.Therefore,the detection and research of duplicate conflicts are of great significance to the software engineering industry.The main research contents of this thesis include:A code merge conflict matching method based on the Code BERT pre-trained model is proposed.The code merge conflict data including this branch code,other branch codes and ancestor node codes are converted into vectors,and code merge conflicts are matched by measuring the similarity between vectors.The results show that the Code BERT pre-training method and the traditional code cloning method reach an average of 74.88%,73.29%,and 74.07%,respectively.The Code BERT pre-training method has improved the accuracy and recall rate compared with the traditional code cloning algorithm,and the F1 value has increased by 10.3%.Under different parameter combinations,the best matching performance of the Simhash algorithm is that the Hamming distance is 4 and the fingerprint signature length is 64.A duplicate conflict resolution method based on text matching is proposed.First traverse the merged nodes containing historical conflict information,then find the merged content of different branches through the graph traversal algorithm,use text matching to detect the conflict information of the collaborative development project,and save it in the database.Later,when the project encounters a merge conflict,it can first query from the database that saves the historical conflict.If the match is successful,it will be returned to the developer to improve development efficiency.Also,the method uses historical conflict information to help developers resolve existing conflicts.
Keywords/Search Tags:Code conflicts, Duplicate conflicts, Similarity calculations, Text matching
PDF Full Text Request
Related items