Font Size: a A A

Research On Word Order Correction Method Based On Chinese Text

Posted on:2024-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:G H ZhaoFull Text:PDF
GTID:2558307067468204Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the diversification of international exchanges,Chinese is becoming more and more attractive to foreigners.Since the order of Chinese texts is differs greatly from that of foreign languages,many Chinese learners find it difficult to master the rules of Chinese language order.Therefore,word order correction of Chinese texts has become increasingly important.Chinese grammar error correction is usually divided into two categories: one is spelling error correction,the other is grammatical error correction,and contains word redundancy,word deletion,word selection and word order(disorder).As one of the most difficult errors in grammatical error correction,most researchers only correct the overall grammatical errors,and do not study and analyze word order correction in depth.Therefore,the research content of this thesis aimed at the correction of word order error in Chinese text,and solving the problem of word order error correction from the classical neural network and large-scale language models PLMs.The main research of this thesis are as follows:(1)A word order correction model based on classical neural network model was constructed.In view of the fact that information loss is easily caused in the process of manual feature extraction,the classical neural network model has achieved excellent performance in feature extraction.In this thesis,Bi-GRU,CNN and Transformer classical neural networks are used to correct word order errors from sequence to sequence.The experimental results show that the above three models have achieved good performance in word order correction.However,compared with Bi-GRU and CNN models,the performance of the word order correction model based on Transformer is better,with 5.75% and 3.58% improvement in accuracy and F-score respectively.(2)A word order error correction model based on BERT pre-training language model is constructed.Aiming at the problem that word order errors involve too much contextual semantic knowledge and cannot be effectively modeled,this thesis captures deep semantic features based on the BERT pre-training language model.First of all,we consider that the BERT error correction model based on characters is not applicable to the word order error correction of Chinese text.This thesis introduces the concept of Pretokenize word segmentation and proposes a word order correction model based on Word-Base BERT.The experimental results show that compared with the basic BERT,the word order correction model of Word-Base BERT proposed in this thesis improves the recall rate and F-score by 16.31% and 7.5% respectively.(3)A word order error correction model based on Ro BERTa pre-training language model is constructed.In this thesis,Ro BERTa model is used as the basic framework of the sequence-tosequence word order error correction task,GPT-2 is introduced as the decoder of Ro BERTa sequence-to-sequence error correction model,and a word order error correction model based on Ro BERTa-GPT-2 is proposed.The experimental results show that compared with the single Ro BERTa model,the word order correction model based on Ro BERTa-GPT-2 proposed in this thesis improves the recall rate by 3.18%.Finally,considering the influence of the mask strategy on the Ro BERTa word order error correction model,this thesis introduces the non-dynamic mask mechanism and proposes a word order error correction model based on Ro BERTa-Nomask.The experimental results show that the dynamic mask strategy has important value in improving the performance of the word order error correction model.
Keywords/Search Tags:Text correction, grammar correction, word order correction, Neural network, pre-training language model
PDF Full Text Request
Related items