| The rapid development of computer technology has played a significant role in promoting the research and production of various industries,with computer programming is put into production more and more frequent as an indispensable tool.For the need of generating and maintaining a large number of computer programs,code intelligence emerges as the times require.The automatic program repair is an important branch of it.The purpose of automatic program repair is to utilize computer to automatically repair syntax or semantic errors in code without human intervention.With the development of deep learning,researchers utilize the relationship of natural language and program language,to solve the problem in automatic program repair by using natural language processing method.Most of the existing researches are lack of the mining of code hierarchy ad the extraction of code syntax rules.In addition,grammatical errors in the output of model are also one of the factors that affect the task development.The problems above are not only the key research direction of the development of automatic program repair,but also the difficulty of it.In this paper,based on the difficulties mentioned above,our main work includes:Firstly,considering the problem that existing deep learning models only learn the sequence information,but lack of modeling the hierarchical information contained in the code,this paper mines hierarchical information of code structure based on sequence-to-sequence model.Our model uses rules as the basic elements,extracting the abstract syntax tree structure,and excavates the depth information and distance information of nodes in the AST from the position vectors and the self-attention mechanism based on node distance.Experiments show that the model mining code hierarchical structure information performs better than the model that only consider sequence information,and achieves better results on open-source datasets.Secondly,aiming the problem that the existing deep learning model does not understand the grammatical rules contained in the code,a model of encoding both vocabulary and rules is proposed.On the one hand,we try to integrate the word vector information into the grammar vector,on the other hand,we use two encoders to encode the words and rules respectively.In the process of encoding,we use the cross-attention based o the relationship between words and rules.On the decoder side,different attention strategies are used.Experiments on opensource datasets show that the model that encodes both words and rules outperforms the model that just encodes words or rules.Thirdly,considering the problem that the output of existing model may not conform to the syntax rules,a cluster search algorithm based on abstract syntax tree is proposed.As reference,the generated nodes are used to construct an AST,and the prediction of next node is constrained by it.In addition,aiming at the problems of the metric XMatch,new evaluation are proposed from the grammatical and structure dimensions respectively.Experiments show that the new algorithm based on AST cam avoid syntax errors and improve the performance of the model. |