Font Size: a A A

Research On Intelligent Error Correction And Quality Estimation Methods Of Texts Oriented To Judical Documents

Posted on:2022-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:M J BaiFull Text:PDF
GTID:2506306572460144Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In order to solve the problem of grammatical error correction and text quality estimation of judicial documents,this article uses a variety of methods and experiments to study the solutions to the above problems.By studying the current research status,understanding the common solutions in the current field,and integrating the relevant characteristics of judicial documents,for the problem of judicial document text error correction,we propose a rule-based and language model-based error correction method for judicial documents and a depth model-based error correction method Method: For the problem of estimating the text quality of judicial documents,a method for estimating the text quality based on semantic understanding is proposed.Through experiments,it is proved that the above method can more effectively correct grammatical errors in judicial documents and accurately and reasonably estimate the text quality of judicial documents.This paper proposes a rule-based and language model-based error correction method for judicial documents and a document text error correction method that integrates a generation of confrontation frameworks,and compares them with mainstream methods to prove their effectiveness.At the same time,it proposes a text quality based on semantic understanding.Estimation methods,supplemented by document structured processing,so as to better estimate the text quality of judicial documents.For grammatical error correction tasks,both the rule-based method and the neural network-based method have been proved to be more effective,but different methods are good at correcting different types of errors.In order to maximize the effect of model error correction,many excellent text error correction methods will solve the error classification and classification one by one.The error correction method in this article also inherits this idea.First,use rules-based and language model-based methods to correct shallow errors such as typos,and then use deep model methods to resolve deep-seated grammatical errors.Moreover,correcting the typos in the text to be corrected before using the depth model method will further improve the error correction effect of the depth model.In the typos correction method based on the language model,this experiment draws on the rule-based and language model-based method in pycorrector and also optimizes in many aspects,so that the typos correction effect for judicial documents can be further improved.In order to make the rules learned by the language model more in line with the characteristics of the judicial document text,first use the Bi LSTM+CRF entity recognition method to construct the judicial document vocabulary,and then further use the judicial-oriented confusion dictionary matching method,the phonetic near-dictionary matching method,and editing The distance method and the method based on the n-gram language model correct shallow errors in judicial documents,and thereby pave the way for the correction of deep grammatical errors.For the correction of deep grammatical errors,considering labor costs and other issues,this experiment treats the text error correction task as a monolingual translation task,uses a machine translation model to translate the text to be corrected into the correct text,and uses the constructed judicial document The "error-correction" corpus of information trains the model on the corpus.First,use the LSTM+Attention method and the Transformer method based on self-attention to implement error correction.Later,in order to solve the problem of exposure bias in the above two methods,a generative confrontation model was introduced.Based on the idea of Seq GAN,the Monte Carlo tree search and strategy gradient method are used to solve the problem that the gradients that exist when the generated adversarial model is used for discrete text data are difficult to return.Experiments have proved that the self-attention-based Transformer model as the generator of the generative confrontation model has a better effect than other methods in the task of correcting judicial document text.In order to assist the optimization of the text quality of judicial documents,this experiment proposes a text quality estimation method based on the depth model.In order to make the text characteristics of judicial documents also be included in the quality estimation evaluation factors,the pre-training model BERT is used to extract the word vectors of the judicial document texts with semantic information,and the deep learning method is used to realize the quality estimation method of judicial documents based on semantic understanding..At the same time,the judicial documents are processed in a structured manner,and the processed documents are matched with templates to assist the quality estimation of judicial documents.And through experiments to prove the effectiveness of this method in the text quality estimation.
Keywords/Search Tags:Judicial documents, text error correction, attention mechanism, Generative adversarial model, quality estimation
PDF Full Text Request
Related items