Font Size: a A A

Optimization And Implementation Of Chinese Spelling Error Detection And Correction Algorithm

Posted on:2020-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:S L ZhangFull Text:PDF
GTID:2428330590450618Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of computers,more and more industries use computers to communicate and collaborate at work or in life.Chinese spelling error detection and correction is an important technology to ensure the correctness of communication and collaboration,which is used to check whether Chinese has spelling errors and give correct suggestions.It is an important topic in the field of Chinese natural language processing research and has a wide range of applications,such as document tool,handwriting recognition,search engine and question answering system.The late start of Chinese natural language processing and complex features of Chinese(such as multi-tone,near-shape and without obvious interval between characters)make the Chinese spelling error detection and correction more difficult.After a detailed investigation,author summarizes the various types of Chinese spelling errors and causes of them.Paper proposes a Chinese spelling detection method based on Character-based NGram probability and a Chinese spelling error correction method based on weighted noisy channel model.A framework for spelling error detection and correction was designed and implemented to promote the study of it.The Chinese spelling error detection method was proposed by the NGram strings' probability statistic from Chinese text corpus combined with the confusion set.And the Chinese spelling correction method was proposed by the noisy channel model,Chinese characters' frequency and the minimum editing distance between pinyin of Chinese Characters.In order to find the optimal parameter set and speed up decoding,paper implemented an NGram language model framework with various of smoothing algorithm and a Beam Search algorithm.The experimental result on the same test set show that the spelling error detection method of character-based NGram probability performance better in false positive rate,accuracy and precision;The spelling error correction method based on the weighted noisy channel model performance better in accuracy and recall.Which has made contribution to the research of Chinese spelling error detection and correction.
Keywords/Search Tags:Spelling error correction, Noisy channel model, Ngram language model, Chinese word segmentation, Decoding algorithm
PDF Full Text Request
Related items