Font Size: a A A

Research On Lexical Analysis Of Ancient Books Based On Deep Learning

Posted on:2019-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:C M LiFull Text:PDF
GTID:2435330548980587Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
With the arrival of "artificial intelligence",the ancient books information processing has attracted more and more attention.Lexical analysis is the basis of ancient books information processing,it refers to automatic word segmentation,part of speech tagging and named entity recognition.However,it’s hard to carry out deeper research without the high-precision lexical analysis.And the lexical analysis of ancient books is limited by technical methods and annotated corpus for a long time,which the progress is rather slow than the progress in modern Chinese.The focus of this paper is to apply the deep learning method to the automatic sentence analysis and lexical analysis of ancient books,then construct an automatic annotation platform for ancient books,which including the following aspects:First,this paper applies the LSTM-CRF model in deep learning to the lexical analysis of ancient books.By using Si Ku Quan Shu(WenyuanGe)to train the word,then it could enhance the model effect on the basis of covering more ancient characters.As we choose the first ten volumes of ZuoZhuan as training set and the last two volumes as the test-set,then we get the segmentation,part of speech tagging and named entity recognition all at once with the integrated lexical analysis method.The F-score of segmentation,POS,person name recognization and location name recognization achieved at 94.81%,90.21%,82.79%and 82.49%,respectively,and 10-folder cross-validations further validates the effect of the model.Besides,it is found that if the POS information was removed,the accuracy of NER could be better in further experiment.Second,a database of ancient Chinese proper name information dictionary was established after combing and integrating the date of many ancient books.According to previous research,we use the word circulation to reduce the mismatch which caused by frequently-used character.And in order to quickly match the entries,the trie tree was improved.The dictionary database could not only improve the accuracy of named entity recognition by combining with neural network,but also providing interpretation to proofreader.The experiment showed that the accuracy of named entity improved by combining the dictionary with the neural network.Third,this paper developed an online automatic annotation system for ancient books by intergrating J2EE with Tensorflow.We use J2EE to construct the system framework which was based on MVC,And the tagging module communicated with dictionary database and Tensorflow,where we deployed the lexical analysis model.In fact,the fully coupling-seperation was realized in these modules.Fourth,in the process of lexical analysis,it is found that a large part of the ancient books have no punctuation,but the input of neural network model must be sentences.Therefore,based on the integration of lexical analysis,we study the automatic sentence segmentation method based on neural network.And on the basis of summing up the shortcomings of previous studies,a new method of segmentation is proposed.And it is proved that the punctuation method with convolutional neural network has a good effect,as experimenting with the classical "Twenty-Four Histories" as training set and"Romance of the Three Kingdoms" as testing set,the F-score on the punctuation is 86.69%.this study uses deep learning technology to solve the problems of automatic sentence and lexical analysis of ancient books,integrate dictionary resources to improve the accuracy and interpretation of named entity recognition,and finally form a system of practical value.To summarize,this study uses deep learning technology to solve the punctuation and lexical analysis problems in ancient books,and integrates the dictionary resources to improve the accuracy of NER and provide interpretation at the sametime.Finally,a valuable system was formed.
Keywords/Search Tags:ancient books, lexical analysis, punctuation, deep learning
PDF Full Text Request
Related items