Font Size: a A A

Design And Implementation Of Chinese Text Automatic Proofreading System

Posted on:2018-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2348330515471167Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid growth of Internet data has brought a wealth of information for people's life,but it also reduced the quality of the Internet information simultaneously.Although the News publishing and television broadcasting industry have higher requirements for the quality of text,the review work of these industries are still manual procedures,which result in more or less problems in Chinese words,pinyin,numbers,punctuation,etc.Furthermore,with the rapid increase in the number of Chinese network text,it will bring more and more accumu-lated errors,which reduce the use-value of text and add the burden of manual correction.At the same time,various forms of Chinese carriers greatly increase the difficulty of manual review.As many of the existing text proofreading softwares are difficult to deal with various forms,different formats and diverse carrier text,it has important practical significance to re-search and develop automatic proofreading methods and systems for Chinese text.Based on the above background,this thesis develops a Chinese text automatic proofread-ing system for the common mistakes in Chinese text,and the work is carried out in the fol-lowing aspects:1.The system requirements analysis of Chinese text automatic proofreading system is car-ried out.A detailed analysis is made on the user,business and functional requirements of this system.And the MVC framework is employed to design the system frame,and the functions of each layer in the system are analyzed in detail.The specific functions of the proofreading service are also analyzed and designed,including the word,punctuation,digital and pinyin proofreading.2.The method of the word proofreading,punctuation,digital and pinyin proofreading is studied in this thesis.Firstly,the combination of Conditional Random Fields(CRF)and segmentation technology are combined to recognize the entity of text,and the utilization of named entity linking is applied to proofread the name of named entity.Moreover,a trie tree is built to proofread the common and sensitive word.Secondly,a rule-base is constructed to proofread the punctuation and digital.Thirdly,for phonetic proofreading,a toolkit is used to get the correct pinyin of the word,and then it is compared with the marked pinyin of the word in the text.If it is different with the marked pinyin,then the marked pinyin in the orig-inal text is proofread.3.To accomplish the Chinese text automatic proofreading system based on B/S model,the Spring MVC framework is employed to achieve the Web framework of this system,which includes the configuration of Spring MVC framework,the programing of the foreground JSP and the controller.The Chinese text automatic proofreading system deveolped in this thesis can correct the common mistakes in Chinese text,including word,punctuation,numbers and pinyin errors,etc.In addition,the system supports two proofreading types,viz.,online and offline.User not only can input the short text to Web page for the online proofreading,but also upload the word files to the server for the offline proofreading.
Keywords/Search Tags:Text proofreading, Named entity recognition, Named entity linking, Tire tree, Spring MVC
PDF Full Text Request
Related items