Font Size: a A A

Chinese Text Based On Statistical Observations Proofing System Design And Implementation

Posted on:2015-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:B L LiFull Text:PDF
GTID:2268330431957442Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Web2.0technology and social media in recent years,a large number of opinionated texts begin to appear on the Internet. Compared withregular news texts, opinionated texts are written in a more free style and thus containmany ill-formed characters such as misspelled words, punctuation errors andhomophonic words. Therefore, how to remove these ill-formed characters is one majorchallenge for opinionated text analysis.Based on the analysis of the characteristics of Chinese opinionated texts, this paperexplores the Chinese opinionated texts proof-reading in the view of misspelled wordscorrection and punctuation correction in the framework of statistical, in order toimprove the readability and normatively of the text, and then lay a good foundation forthe subsequent the opinion mining. Specifically, our research concerns the followingtwo aspects:For the punctuation errors exist in the opinionated texts, based on the CRFsequence labeling that integrates multi-level language features, we designed andimplemented a three-stage punctuation correction system based on position prediction.Our experiments verified that the introduction of the original punctuation can improvethe punctuation error correction performance.Aiming at the misspelled words exists in the opinionated texts, this paper analysisthe potential relationship between the misspelled words and their correct forms,focusing on the reasons for the formation of misspelled words during the character entryprocess. On this basis, we propose a Hanzi-Pinyin-Hanzi conversion based method forChinese misspelled words correction. Experimental results show that exploring thefeatures of misspelled words’ pronunciation has a positive effect on the misspelledwords proof-reading.
Keywords/Search Tags:Opinion mining, misspelled word proofreading, punctuation correction, CRFs sequence labeling
PDF Full Text Request
Related items