Font Size: a A A

Research On Word Error Correction Methods Of Chinese Text

Posted on:2021-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2428330629488459Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today's world,China's economy is booming and international competitiveness is constantly improving.The charm of Chinese language has attracted more and more foreign learners to learn.This article explores the correction of Chinese text for the topic of foreign learners learning Chinese.The study of word correction methods in Chinese texts is an important inquiry to ensure the accuracy of learners in the process of learning and communication.It is a key technical means to predict the existence of errors in texts and select accurate correction words.It is the field of Chinese natural language processing research.Important topics in this article aims to assist learners to correct their mistakes in the learning process,and at the same time to reduce the pressure of Chinese teachers' guidance.From the perspective of facilitating model construction,Chinese text error correction research divides error correction tasks into two categories: Chinese spelling error correction and Chinese grammar error correction,and constructs models for error correction.After a series of detailed investigations,the formation elements and categories of text spelling problems were summarized.Based on the N-gram language model,word-based N-ary text was used to segment the text and its probability was counted.Confusion sets and dynamic programming were introduced to improve model correction.Error efficiency,and use smoothing technology to deal with data sparse problems,proposed an algorithm combining Chinese word segmentation,binary model and ternary model,and constructed a Chinese spelling error correction combined model based on N-gram.In view of Chinese grammatical error correction,this article divides grammatical problems into four categories: word redundancy,missing words,word errors,and word order errors.Aiming at the problem that the traditional statistical-based N-gram language model cannot cope with the problem of unregistered adjacent words and long-distance grammatical errors,the paper applies the neural language model Bidirectional Long Short-Term Memory Network(Bi LSTM)to model and evaluate the sentence correctness through bidirectional context information.In addition,a conditional random field(CRF)was introduced for sequence labeling,and the combination of part-of-speech features and word vectorswas added to the embedding layer as Bi LSTM input.A Chinese grammatical error correction model based on Bi LSTM-CRF was constructed to further optimize the model correction.Wrong performance.The experimental results using the development evaluation data set display that the Chinese spelling correction model based on N-gram and the Chinese grammar correction model based on Bi LSTM-CRF proposed in this paper have achieved good results in their respective error fields.
Keywords/Search Tags:Spelling correction, Grammatical error correction, Confusion set, N-gram, Bi LSTM-CRF
PDF Full Text Request
Related items