Font Size: a A A

Research On Chinese Text Error Correction Method Based On Deep Learning

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuanFull Text:PDF
GTID:2518306494971259Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In daily life,text errors can be seen everywhere,the main text errors are divided into two kinds,one is homophony errors,the other is homonymy errors.The emergence of text errors will lead to the problems of different characters in web pages,which affect users' reading and inaccurate audio recognition.How to correct the error text reasonably is an important issue to be solved.Among the existing technologies,the error correction methods based on statistics are the most mature,such as black horse proofreading system.However,with the development of deep learning technology,error correction methods based on deep learning are springing up.The proofreading system developed by KDDI and HUST is one of them.In the field of natural language processing,many scholars have put forward different text error correction schemes,which have achieved certain results,but often can not achieve satisfactory results,so far there is no mature text error correction solution.With the development of deep learning technology,text error correction task also began to try to combine deep learning technology to deal with existing tasks.The more commonly used models are sequence to sequence model.This model transforms text error correction task into text generation task to deal with,and uses the excellent fitting ability of deep learning to improve the ability of correcting text errors in training.However,only using sequence model to deal with text error correction task is often ineffective,and it needs to be improved on the basis of sequence model.This paper mainly tries in the following two aspects:(1)Optimize the structure of two sequence-to-sequence models,LSTM and Transformer.the LSTM model is good at extracting time series features,such as textual information,while the Transformer model fully adopts the attention mechanism,which avoids the problem of gradient disappearance and gradient explosion caused by the RNN model encoding on longer sequences,and can perform longer text The structure of this model supports parallel computation and is more efficient in computation.In this paper,we optimize the structure related to error correction on the basis of the above two models in various aspects,and design several sets of controlled experiments to verify the effectiveness of the model under various optimization scenarios.(2)Optimization of the structure and training method of the pre-training model BERT.In this paper,Softmask BERT is used as the base model and the structure of the model is optimized.In addition,the specific training method of the BERT model is analyzed,and the model is modified to make it more suitable for textual error correction with the main goal of error correction task requirements.Finally,different experiments are designed in this paper to verify the rationality and effectiveness of the model structure and the improved training method.
Keywords/Search Tags:text correction, LSTM, Transformer, deep learning, BERT
PDF Full Text Request
Related items