Font Size: a A A

The Study Of The Sentence Alignment Between Shijing With Its Commentations And Annotations

Posted on:2019-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:S S WangFull Text:PDF
GTID:2428330602470059Subject:Library science
Abstract/Summary:PDF Full Text Request
There are quite a few notes on the annotation of the Shijing,but the most perfect one is the notes of the "the Thirteen Cassics",it is the head of the history subsets.Therefore,this paper uses Mao Shi Zheng Yi by Ruan Yuan in block-printed edition of the Thirteen Classics commentary,which is subject investigated in the Book of Songs's Commentary Literature.In this paper,we take the Mao Shi Zheng Yi and Mao Shi Index as resources.This article learns from previous studies on Pre-Qin Classics such as Mencius,analects of Confuciusand Master Zuo's spring and Autumn Annal,which made a deeper level processing of Shijing,by trying to explore the automatic word segmentation of Shijing,which is based on the domain word list in Mao Shi Index.We use regular expression matching and similarity calculation to explore the sentence alignment of the Book of Songs and its annotation literature Mao Shi Zheng Yi,it has a good alignment effect.This paper mainly studies sentence alignment from the following three aspects:First,we analyze the textual structure features and exegesis language of Mao Shi Zheng Yi.This paper sumarizes the textual classification featuers of Mao Shi Zheng Yi,and it provides a basis for the alignment of article and chapters.In the Mao Shi Zheng Yi,the different training objects and the content of the training generally use different specific terms.By summarizing the characteristics and laws of the five aspects of exegesis terms in terms,formats,contents,methods and styles,finding out the laws of exegesis.We use regular expressions to refine processing,build an exegesis specific term pattern library to assist in the exploration of sentence alignment in this article.Secondly,this paper studies the automatic word segmentation of The Book of S ongs by machine learning method of conditional random fields.The automatic segm entation of the Book of Songs is based on the corpus of the manual word segmentatio n of the Book of Songs.The method of combining the Guang Yun and statistical analysis w as used to get 23 sets of feature.Based on the corpus of the manual word segmentation o f The Book of Songs,the method of combining the Guang Yun and statistical analysis wa s used to get 23 sets of feature template which fuse different characteristic knowledge and t hen producing machine learning segmentation model by training.template which fuse diffe rent characteristic knowledge and then producing machine learning segmentation model b y training.The performance of each word segmentation model is tested.It is found that the word character has the greatest impact on the effect of the word segmentati on of Shijing,and the highest F value of the harmonic mean value of the participle model can reach 97.42%.Thirdly,this paper studies the sentence alignment of Shijing and Mao Shi Zheng Yi by combining regular expression matching and similarity calculation.The regular expression matching experiment is based on the exegesis specific term pattern library by using Python to carry out multiple pattern matching for experimental corpus,and test the effect step by step to obtain better matching effect.Gensim is a Natural Language Processing library,which can automatically extract semantic information from text.Based on Gensim,this paper combined with the word frequency weight vector(TF-IDF).This paper constructs LSI theme model for similarity calculation and combines the results of regular expression matching to explore the sentence alignment of Shijing and Mao Shi Zheng Yi.Using terminologies of regular matching for sentence alignment,the highest accuracy rate of P can reach 86.92%and the harmonicmean valueof F can reach 84.61%.Thismethod combines the text structure features and the terminologies of linguistic features,the experimental effect is better.Gensim similarity calculation method is combined with semantic features.The matching result of the sentence using Gensim similarity is actually ideal.The accuracy rate of the matching between the classics notes and commentaries is 89.92%,and the accuracy of the matching between the classics and notes is 86.14%,and the harmonic mean value of F can reach 85.6%.It shows that the method of regular matching and gensim similarity matching can be used to explore sentence alignment in this paper,and the alignment experiment is effective.In addition,the above two methods exploring the sentence alignment of Shijing and Mao Shi Zheng Yi from different perspectives,which can be combined for subsequent research.There are two innovative points in this paper.First,the paper summarizes the text structure features and the terminologies of linguistic features of Mao Shi Zheng Yi,building a relatively complete system of text structure and the exegesis specific term pattern library.It can be implemented on the computer,which is convenient for future generations to do related research on this basis.Secondly,the paper explores a method to realize automatic segmentation of the Book of Songs by using multiple words vocabularies from different angles,and it obtains the word corpus of The Book of Songs that fuses the expert vocabulary knowledge of the Mao Shi Index.this paper studies the sentence alignment of Shijing and Mao Shi Zheng Yi by combining regular expression matching and similarity calculation,which is based on the exegesis specific term pattern library.This method of sentence alignment botains a better alignment effect.However,there are still some shortcomings in this paper.First of all,the exegesis specific term pattern library can be further improved by expanding the scope of the annotation of the Shijing,although it has been summarized as complete as possible.Secondly,there is no detailed division of the opinions of various experts in the commentaries of Mao Shi Zheng Yi in the experiment of sentence alignment.Subsequent studies can analyze and summarize the expressions of each expert.
Keywords/Search Tags:Shijing, Mao Shi Zheng Yi, sentence alignment, regular express, similarity calculation
PDF Full Text Request
Related items