The cross-task transfer capability of the pre-trained language model itself has brought the natural language processing into the "pre-trained" era.Research on the pre-trained language model has become a hot spot in recent years.However,the existing pre-trained language model has a large number of parameters,and it is difficult and costly for ordinary researchers to implement the model from 0 to 1.How to reduce the cost of model training without reducing the performance of the model has become a major challenge.Researchers consider combining the knowledge graphs and the pre-trained language model organically,and injecting knowledge into the pre-trained language model to improve the reasoning ability of the model.However,the two cannot be directly integrated,and external knowledge injection into the pre-trained language model will bring knowledge noise.In this thesis,we propose a pre-trained language model based on knowledge injection.The main work and innovation of this thesis include the following three parts:(1)Double-encoder pre-trained method based on SM-RTDAiming at the huge training cost of the pre-trained language model,a double-encoder pre-trained method based on SM-RTD is proposed.Firstly,the structured pruning method is used to prune the layers of the BERT model encoder,and the double-encoder model architecture is built in series to effectively reduce the model volume and save costs for the subsequent model pre-trained;Secondly,the SM-RTD series pre-trained task is designed on the double-encoder model architecture.In the serial encoder module,encoder1 completes the span mask pre-trained task,and encoder2 completes the replaced token detection pre-trained task,accelerating the convergence speed of model pre-trained;Finally,the large-scale corpus is preprocessed and trained as a training dataset.The model fully learns lexical,syntactic and semantic knowledge in a large number of text sequences,realizes implicit knowledge injection,and provides a basis for the transfer learning of downstream tasks.(2)Use text classification and regression tasks to fine-tune encoder2In view of the inconsistency of the input text data in the pre-trained and fine-tuning stages of the pre-trained model.In the fine-tuning stage,small fine-tuning datasets are used for supervised fine-tuning training and validation.First of all,build the single-encoder model architecture,select the encoder module that completes the replaced token detection pre-trained task in the pre-trained stage as the main coding module in the fine-tuning stage;Then,based on the single sentence classification task,the multiple sentence classification task,and the multiple sentence regression task,the model is fine-tuned using four data sets to achieve a small number of parameters updates and solve the problem of inconsistent pre-trained and fine-tuning input;Compared with the current mainstream model with the same parameter magnitude,the fine-tuned model has improved the accuracy of the target downstream tasks,and the comprehensive performance of the model is better than the comparison model.(3)Pre-trained language model based on knowledge injectionIntegrate the common sense knowledge of the objective world in the knowledge graph triplet with the fine-tuning datasets,and explicitly inject the knowledge into the model to achieve better performance in the downstream tasks,and improve the value of the evaluation indicators.Firstly,the id2 word method is proposed to process the knowledge graph and complete the mapping from entity_id to string;Secondly,the named entity recognition tool is used to extract the entities in the text sequence,link them with the knowledge graph entities,and then integrate the external entity knowledge with the original text sequence to form the input text sequence with rich knowledge;Finally,the location coding method is updated,and the K-self-attention algorithm based on the correlation coefficient between tokens is proposed to effectively alleviate the problem of knowledge noise caused by knowledge injection.The experimental results show that knowledge injection helps the model to improve the prediction accuracy on downstream tasks. |