| In an era of increasingly collaborative work,Git Hub has become the first choice for many developers and teams.Participants submit Pull Requests to add functions or fix bugs for a project.After submitting the Pull Request,it is up to the core reviewer of the project to decide whether to merge,modify or reject the Pull Request.For popular Git Hub Projects,there are massive Pull Requests,due to the collaborative work,developers may submit Pull Requests for the same problem,which will increase the follow-up workload of reviewers and developers.There is no automated detection tool in Git Hub to detect duplicated Pull Request at present,and the manual detection of duplicated Pull Request is very difficult,so it is of great significance to study the detection of duplicated Pull Request.This thesis proposes a detection method of duplicated Pull Requests based on graph neural network.Firstly,the method calculates the similarity of title,description,commit information,change code,and code change location information of the Pull Request,then carries out the detection of duplicated Pull Request based on the similarity and Adaboost algorithm,Finally,the detection method is compared with the traditional method.In this method,change code similarity is calculated by graph neural network model.To solve the problem of semantic similarity better,the method constructs an AST-based graph representation of programs by adding various types of edges representing control and data flow to AST.In this method,AST and diff information is combined to rewrite the code change files at the method level,the diff information refers to the code change information between different versions obtained through the git diff instruction,and this preprocess work should be carried out before files input into the graph neural network model to improve the correctness and rationality of the research.In the experiment,Python project and Java project are chosen,the Recall-rate@20 is about 22% better than the existing method in the research of Python project,for the research of Java Project which introduces graph neural network model and carries out the preprocessing work,the Recall-rate@20 is about 25% better than the existing method.In the study of the precision importance of eigenvalues,after removing the change code,the accuracy decreases by 30%-45%,and it is concluded that the change code is the most important factor. |