Font Size: a A A

A Programming Syntax Error Correction Model Based On Transfer Learning

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2428330647451034Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automated programming language syntax error correction is currently a hot research topic.Existing syntax error correction methods try to combine context-free grammar and deep learning methods which have achieved good results on syntax error correction tasks.But the correction models have become large and complex,resulting in slow training of neural networks.The training of large models requires a large number of error codes that have been marked with repair actions.There is currently no suitable method for automatically marking syntax errors in an error code,which makes fewer real datasets available.Transfer learning can transfer labeled data or knowledge structures from related fields to target field to improve the learning effect.The pre-trained model is a common transfer learning method,which has been widely used in the field of natural language processing and computer vision.Based on transfer learning method,this thesis proposes a program syntax error correction model.The main contributions of this thesis are:1.In view of the problems of few available datasets,slow training,and poor generalization in the syntax error correction task,this thesis proposes a code pre-training model.The pre-training model consists of two parts: generator and encoder.The generator is a masked language model,which is responsible for filtering simple syntax errors in the code and generating complex errors that are difficult to solve by itself.The encoder is a code encoding model.The encoder treats the code as a combination of the syntax tree and the token sequence to encode the token and structured information of the code.The encoding layer of the encoder can be trans-ferred to other code-related tasks for fine-tuning.The learned programming language knowledge can be shared between the source model and the target model.This thesis tested multiple pre-training methods on the Deepfix dataset.It proved the effectiveness of the code pre-training model.2.In order to further verify the effectiveness of the pre-trained code pre-training model,this thesis proposed a program syntax error correction model based on transfer learning,which transfers the encoding layer of the encoder to itself.This thesis adds the output layers of the fixing action to fix the error code,predicts the position of the syntax error,and generates a repair token from the vocabulary and the input sequence.The model uses an iterative method to fix multiple errors in the code one by one.The syntax error correction model is trained and tested on the Deepfix dataset.Compared with the unpre-trained model,the pre-trained syntax error correction model achieved 59.51 % error correction accuracy,solved 56.02 % error information in the test dataset,and improved the result of fixing structure error of code to 44.70 %.
Keywords/Search Tags:Deep Learning, Transfer Learning, Pre-training Model, Program Syntax Error Correction
PDF Full Text Request
Related items