Font Size: a A A

Research On Chinese Text Error Detection Based On External Knowledge

Posted on:2022-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J D LiFull Text:PDF
GTID:2518306788956619Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
Chinese text error detection is an important task of natural language processing,which has a wide range of application scenarios and important value.With the rapid increase in the amount of text,relying solely on manual error checking is timeconsuming,laborious and inefficient.Researchers began to use deep learning technology to detect text errors and achieved good results.However,most of the current research focuses on designing high-performance error detection models according to the characteristics of the text to be detected.By increasing the complexity of the model and using a large amount of data to drive the improvement of the performance of the model,The potential role of knowledge in improving model performance is ignored.Aiming at the above problems,this thesis studies the fusion method of external knowledge and model based on word splitting knowledge and semaphore knowledge,and improves the error detection performance of the model by introducing external knowledge.This thesis focuses on the following two aspects:(1)A knowledge fusion method for LSTM model is proposed,which uses word splitting and semaphore external knowledge to assist model error detection.At the input end of the model,lattice structure is used to input the word splitting knowledge of text characters and the semaphore knowledge of word segmentation into the model.The word splitting knowledge can reflect the difference of characters under clustering characteristics,and the semaphore can enhance the semantic representation of words.The model uses the averaging strategy to further integrate the two kinds of knowledge for model error detection.(2)A knowledge fusion method for Transformer model is proposed,which uses word splitting and semaphore external knowledge to build a knowledge matrix to guide the model to detect errors.Because LSTM has weak ability to express the dependency between words,Transformer model with attention mechanism as the core is selected to replace LSTM for error detection.Firstly,the model constructs an external knowledge matrix based on word splitting and semaphore.Through the improvement of attention mechanism,the external knowledge is integrated into the core structure of the model for model error detection.Experiments on Tencent and Sighan-2015 data sets show that the LSTM and Transformer model integrating external knowledge can effectively improve the error detection performance of the model.After combining the external knowledge of word splitting and semaphore,the comprehensive error detection effect of LSTM model and Transformer model is improved by 4.81% and 4.39%.Through the use case analysis,it is found that due to the extraction of the font structure of characters,the model is more sensitive to the wrong characters with similar font,and the detection effect is better;Semaphore knowledge represents other words through a set of limited words,so as to enhance the semantic representation ability of words and strengthen the error detection ability of the model.
Keywords/Search Tags:Text Error Detection, External Knowledge, Chinese Word Splitting, Sememe, Knowledge Matrix, Transformer Model
PDF Full Text Request
Related items