Font Size: a A A

Research On Chinese Text Error Correction Based On N-gram And Dependency Parsing

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:S P YuanFull Text:PDF
GTID:2518306461970299Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This thesis adopts natural language processing technology,analyzes the general error types of Chinese text,and combines the characteristics of the news field text.The research scope is determined to be short-range errors and long-range errors based on homophone replacement.For these two types of errors,this thesis starts with the research on the error detection and correction of these two errors,and conducts the following research.The method of error checking and correction based on the n-gram model is proposed in this paper.The method is divided into two stages.First,we check errors through the combination of 2-gram and 3-gram models to obtain a short-distance error set.And then use 3-gram model performs error correction.Finally,real test text is used for testing.Within the range of short-distance error checking and error correction,the error-checking recall rate was 83.1%,the error-checking accuracy rate was 41.5%,the F-score was 55.4%,and the error-correction rate was 78.1%.And the method is compared with the 2-gram model and the 3-gram model.The accuracy of error checking is increased by 7.2% and8.2% respectively,and the F-score is increased by 6.3% and 8.2% respectively.It is better than the 2-gram model in terms of time consumption.It is equivalent to the 3-gram model.This thesis uses dependency syntax analysis to grammatically analyze the entire sentence to obtain dependency pairs.Through further screening of the dependency relationship pairs,a collocation knowledge base is obtained,which can effectively detect long distance errors.Collocation knowledge base and mutual information are used for Chinese long distance text error correction.Finally,the real test text is used for testing.And within the range of remote error checking and error correction,the error detection recall rate is 74.7%,the error detection accuracy rate is 35.3%,the F-score is 47.9%,and the error correction rate is 59.1%.The method is compared with the CPH method.The error-checking recall rate is increased by 10.5%,the error-checking accuracy rate is increased by 9.8%,the F-score is increased by 11.4%,and the error correction rate is increased by 4.7%.In terms of time consumption,both the cost of error checking and correction are better than CPH method.In general,the method is compared with Baidu AI open platform,cloud error checking,JCJC typos detection platform and Microsoft Word 2010.We have achieved better results and have certain research value and application value.
Keywords/Search Tags:N-gram model, Dependency parsing, Error detection, Error correction, Chinese text
PDF Full Text Request
Related items