Research On Optimization Of Chinese Text Error Correction Algorithm

Posted on:2021-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Liu

Full Text:PDF

GTID:2428330602987137

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computers and the Internet,electronic documents are used more and more frequently in work and life,and the traditional manual proofreading method has been unable to meet people's needs.Chinese text error correction is to check whether there is an error in the Chinese text and correct it.This technology has wide practical value in real life,so it is regarded as one of the important topics in the field of Chinese natural language processing.It is used in keyboard input methods,document editing,search engines,and speech recognition.After in-depth research on error correction research at home and abroad,this article has done related research on word errors and semantic errors.In terms of word error correction,this paper improves the traditional sequence labeling algorithm,and proposes a CSC-Bi LSTM-CRF algorithm based on sequence labeling.This algorithm divides the error correction task into two parts: error detection and correction.First,the target word is checked for errors through the context word vector,then the suspicious set is replaced with the confusion set according to the output of the sequence label,and finally the best candidate word is selected by probability statistics.In terms of semantic error correction,this paper proposes a DAE-Decoder algorithm,which divides the error correction task into two parts:encoding and decoding.It is pre-trained based on Bert and provides input text according to the mask language model(MLM)Each initial character in generates a set of replacement characters as candidate words,and then the decoder selects the correct characters from multiple candidatewords according to the character similarity and contextual suitability.Based on the analysis of the advantages and disadvantages of the CSC-Bi LSTM-CRF algorithm and the DAE-Decoder algorithm,this paper proposes a hybrid algorithm.After experimental verification and analysis,the hybrid algorithm has improved both in accuracy and recall rate,Embodies the feasibility and superiority of the hybrid algorithm.And the algorithm is more versatile and can be used to correct errors in different corpora.It provides a certain reference and reference for the research of Chinese text error correction related algorithms.It also has great significance for the research of NLP related fields.

Keywords/Search Tags:

Chinese text error correction, CSC-BiLSTM-CRF, DAE-Decoder, Hybrid algorithm

PDF Full Text Request

Related items

1	Research On Text Error Correction Algorithm Based On Deep Learning
2	Research And Application Of Chinese Text Error Correction Methods For Various Error Type
3	Research And Application Of Chinese Text Error Correction Method
4	Research On Chinese Text Real-Word Error Automatic Detection And Correction Algorithm
5	Research On Chinese Text Error Correction For Different Error Types
6	Research On Chinese Text Error Correction
7	Chinese Picture Text Extraction And Error Correction Based On Deep Learning
8	Research On Error Correction Method Of Chinese Short Text Based On BERT
9	Chinese Research On The Method Of Text Error Detection And Error Correction Under Speech Transcription
10	Efficient Modified Berlekamp-Massey Algorithm And Forward Error Correction Decoder Designs