Font Size: a A A

Automatic Proofreading Of English Text With Rich Information

Posted on:2022-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H LiuFull Text:PDF
GTID:1488306746956639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text proofreading is an important procedure for article publishing.It can provide text reviews for individuals,enterprises,and government departments,ensuring the accuracy and authenticity of the grammatical and semantic correctness of the published articles and preventing the spread of misinformation.However,text proofreading is meticulous work,and manual proofreading often faces some problems,such as omission and low efficiency.Hence how to automatically proofread texts at the grammatical and semantic levels is an important research problem in the NLP community.This work aims at the two core tasks,grammatical error correction,and fact verifi-cation,to automatically proofread English texts with pre-trained language models.This article integrates rich information,such as language knowledge,world knowledge,and specific domain knowledge,to further assist the text proofreading model to check the grammatical and factual errors in the text.To solve the problems in automatic proof-reading of English text with rich information,this work systematically carries out the following four studies.This article first leverages the grammatical error correction models to gener-ate grammatical error correction evidence for grammatical error detection models.This work compares the general language model pre-training methods and different pre-training strategies for grammatical error correction.Then this work determines the op-timal pre-training strategy for grammatical error correction models.Besides,this work further trains grammatical error correction models by filtering the training corpus that con-tains noise to further improve model performance.Finally,this work uses the well-trained grammatical error correction model to provide several grammatical error correction re-sults for the grammatical error detection model via beam search decoding to annotate the potential grammatical errors and assist grammatical error detection models.To integrate the text proofreading evidence from the grammatical error correction model,the world knowledge base,and the knowledge base of a specific domain,this paper proposes two models to fuse multiple proofreading evidence for the text error detection,the grammatical error detection model with multiple grammatical error correction results and the fine-grained fact verification model with multi-evidence reasoning to assist the two proofreading tasks,grammatical error detection,and fact verification.These two models consider the characteristics of proofreading at the grammatical level and the semantic level and design different methods to extract proofreading clues from rich information that can assist the pre-trained language models in text error detection tasks.Besides,our grammatical error detection model can further improve grammatical error correction models through quality estimation.To solve the problem of fact verification in the specific domain,this paper proposes the enhanced pre-trained language model to improve its language modeling ability and text reasoning ability in the specific domain.This method proposes two different continuous training strategies that train language models on the data of the specific domain to help language models learn the word semantics in the specific domain and improve the fact verification performance in the specific domain.
Keywords/Search Tags:Text Proofreading, Rich Information, Grammatical Error Correction, Grammatical Error Detection, Fact Verification
PDF Full Text Request
Related items