Research And Application Of Chinese Spelling Correction Based On Masking Characters And Iterative Inference

Posted on:2024-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:L L Zhang

Full Text:PDF

GTID:2568306944458914

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet,people communicate more and more closely,and the errors in text information also increase sharply.These text errors greatly affect the effectiveness of information dissemination.The goal of the Chinese spelling correction task is to find and correct spelling errors in Chinese texts,which has wide practical value in real life.Spelling mistakes are common in practice and have a large impact on many downstream tasks of natural language processing,such as search optimization,machine translation,etc.At present,the following problems need to be solved urgently in mainstream research:Firstly,when the model uses the confusion set,the performance depends heavily on the quality of the confusion set.Secondly,when there are multiple wrong characters in a sentence,the characters to be corrected will be affected by the noisy context,leading to poor performance.Finally,when candidate characters and context words are more common,mainstream methods may correct the original correct characters,resulting in over-correction.To this end,we propose a Chinese spelling correction method based on masking characters and iterative inference.Specifically,our works are as follows.Firstly,we optimize the Chinese spelling correction algorithm based on masking characters.Aiming at the problem of limited confusion set,we design an error-driven confusion set masking method.In the pre-training stage,the characters that the model fails to correct are added to the confusion set,and the confusion set is updated iteratively.The model can learn the previous recognition knowledge and make continuous progress.Aiming at the noise context problem,we design a multi-character masking method based on noise distribution.The input text containing noise is constructed in the pre-training stage to make the model more robust against noisy contexts.Aiming at the problem of over-correction,we design a threshold-based error correction optimization method,which expands the selection difference between the candidate character and the original character in the model and improves the reliability of correction.Experiments verify the effectiveness of the method.Secondly,we optimize the Chinese spelling correction algorithm based on iterative inference.An iterative correction method is added in the inference stage,and only the character with the highest probability is corrected in each iteration,and the corrected result is used as the input of the next iteration.Make the model correct all wrong characters in the sentence as much as possible during inference.Experiments verify the effectiveness of the method.Thirdly,we design and developed a Chinese spelling error correction system,which includes three core functions of text correction,image recognition correction,and document correction,as well as two basic functions of system management and data management.Through a series of system tests,it shows that the function of this system meets the design requirements and can run stably.

Keywords/Search Tags:

Chinese spelling correction, pre-training, masking characters, iterative inference

PDF Full Text Request

Related items

1	Chinese Spelling Error Correction Algorithm Incorporating Multimodal Semantic Features And Applications
2	Research And Application Of Chinese Spelling Correction Technology Incorporating Phonology And Glyph Features
3	Research On Error Correction Method Of Chinese Short Text Based On BERT
4	Research Of Chinese Spelling Correction System Based On Multimodal Language Model
5	Research On Deep Learning Error Correction Method Of Chinese Text
6	Optimization And Implementation Of Chinese Spelling Error Detection And Correction Algorithm
7	Research On Chinese Error Correction Based On Pronunciation And Glyph
8	Chinese Spelling Correction Research In Search Engines Based On Statistical Model
9	Research On Connected Characters Recognition For Handwritten Checks And Its Application
10	Research On Chinese Spelling Error Correction Model Based On Deep Learning