Font Size: a A A

Research And Implementation Of Standard Information Proofreading System Based On Conditional Random Field

Posted on:2023-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q X ZhouFull Text:PDF
GTID:2568306917479244Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of the Internet,the number of data generated on the network is increasing exponentially,so the application and development of big data technology are very extensive in recent years.And behind the mass data on the Internet,more and more data with poor quality begin to emerge.Taking Chinese texts as an example,many texts on the Internet have errors such as easily mistaken words,numerical symbols and punctuation marks,which greatly reduce the reference value of these Chinese texts.Therefore,such texts need to be reviewed to ensure their correctness.The traditional manual proofreading method is not only time-consuming and laborious,but also can not guarantee the accuracy.Based on this background,people have a growing demand for text intelligent proofreading tools.Most of the proofreading tools appearing in the industry today focus on the proofreading of text content such as error prone words and punctuation marks,ignoring the impact of the correctness of the data quoted in the text on the use value of the text.For example,the proofreading of standard information quoted in text content such as papers and periodicals is also particularly important.The importance of standard information is needless to say,which stipulates and restricts the development conditions of all walks of life.However,the misquoted standard information in the text not only brings great trouble to the reading of the text and the use of the standard,but also affects the authority of the article.Based on this research background and practical needs,this paper mainly implements a standard information proofreading system based on conditional random field segmentation model through the research,design and experiment of standard proofreading algorithm,and solves the relevant problems of the correctness proofreading of quoted data in Chinese text.The research content of this paper is mainly divided into the following two aspects.(1)Research the main methods of Chinese text segmentation,and design standard proofreading algorithm.Comparing various text segmentation models,determine the advantages and characteristics of conditional random field segmentation model in processing Chinese text,establish relevant models based on conditional random field algorithm and conduct training,and conduct experimental evaluation on the training results to determine its feasibility.According to the common format of standard information,design the standard proofreading algorithm,propose a standard information matching method based on the standard dictionary library,and establish a standard retrieval table to optimize the algorithm and performance according to the missing data.Through a large number of experiments and tests,the algorithm can effectively reduce false positives and false negatives of proofreading results,and significantly improve the proofreading performance.(2)A standard information proofreading system based on conditional random field is designed and implemented.In the standard proofreading module,a Word structured preprocessing module and an errata based proofreading result display module are added.At the same time,the error position is highlighted in the original text.In the errata module,innovative functions such as automatic modification and result export are added,further improving the use experience of the standard information proofreading system and the automatic proofreading process.At the same time,the experimental comparison and test analysis between the system and the main review systems in the academic and engineering fields prove the superiority of the system in the review performance,and further prove the feasibility of the review method.
Keywords/Search Tags:Conditional Random Fields(CRF), Chinese Word Segmentation, Standard Information, Text Proofreading, Review System
PDF Full Text Request
Related items