Research On Text Proofreading Method Based On Deep Learning

Posted on:2021-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:B Wang

Full Text:PDF

GTID:2428330611480623

Subject:Computer science and technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,the amount of text data on the network has increased rapidly,and at the same time,the quality of text has declined.Traditional manual proofreading has long been unable to complete such a huge amount of work,and automatic text proofreading technology came into being.This technology can not only speed up the pace of publishing,but also can use this technology to reduce errors in a large number of electronic documents that need to be saved in enterprises,and at the same time,it can also assist teachers in reviewing test papers and finding spelling errors in education.There are many problems with traditional text-based proofreading methods based on statistics and rules.On the one hand,the formulation of rules requires rich experience,high labor costs,and this pipeline-based model can easily cause errors to accumulate due to noise generated by word segmentation.On the other hand,the existing methods only use the feature information of words or words,and do not effectively use the three kinds of feature information of characters,words,and pinyin.In view of the above problems,this paper proposes a deep learning-based sequence labeling model BLSTM-CRF.No manual intervention is required,labor cost is saved,and word granularity is used to avoid noise introduced by word segmentation.In addition,the BLSTM-CRF model has been improved for the problem of inefficient use of multiple features.The lattice LSTM and the gate control mechanism are used to effectively fuse the three features of characters,words,and pinyin.The main content of this paper is divided into two aspects:(1)This paper proposes a neural network architecture BLSTM-CRF for Chinese spell checking,which is a bidrectional long-short-term memory network combined with a conditional random field model.It is a true end-to-end model that does not rely on task-specific resources,feature engineering,or data preprocessing.Second,by using word-granular vector input,the introduction of word segmentation noise is avoided.Experiments on the news and novel data sets show that the model performance F1 value has been greatly improved compared to the baseline model on the news and novel test set.(2)This paper proposes a novel spelling check model FL-LSTM-CRF,which combines the features of characters,words,and pinyin to make full use of potential information.The experimental results on the SIGHAN dataset prove the feasibility of the end-to-end framework in spelling error checking,and verify the validity of the feature information of the fusion of words,words,and pinyin on error detection tasks.With the same external resources,the FL-LSTM-CRF model is significantly better than other models.

Keywords/Search Tags:

Chinese text proofreading, deep learning, sequence labeling, multi feature fusion

PDF Full Text Request

Related items

1	Research On Chinese Text Proofreading Method Based On Deep Learning
2	Research And Implementation Of Chinese Text Automatic Proofreading Based On Deep Learning
3	Research On The Proofreading Method Of Chinese Typos Based On Sequence Labeling Mode
4	Deep Learning Based Text Sequence Recongition System
5	Chinese Text Based On Statistical Observations Proofing System Design And Implementation
6	The Research Of Chinese Automatic Question Answering And Proofreading Based On Deep Learning
7	Research On Automatic Proofreading Method Of OCR Recognition Results
8	Deep Network Model Based On Feature Layer Fusion Of Visual Information And Linguistic Information For Handwritten Chinese Text Recognition
9	Research On Automatic Generation Technology Of Chinese Text Proofreading Corpora
10	Research On Text Causality Extraction Based On Deep Learning And Sequence Labeling