A Study On The Mining Of Entity Knowledge In Ancient Chinese Classics | Posted on:2019-02-20 | Degree:Doctor | Type:Dissertation | Country:China | Candidate:L Liu | Full Text:PDF | GTID:1485305447455404 | Subject:Library and file management | Abstract/Summary: | PDF Full Text Request | As a valuable spiritual wealth of Chinese people,ancient Chinese classics contained a culture and memory of a people with several thousand years ’ history.With the development of information technology,modern studies of ancient Chinese classics show promising future.From the digitalization of classics to segmentation and POS tagging of ancient Chinese,the studies has becoming deeper and wider.Thus the mining of entity knowledge shows a very important role in the study of ancient Chinese classics.This paper attempts to propose a study scheme of the mining of entity knowledge on ancient Chinese classics.This paper uses the Index on Spring and Autumn Annals and the Three Commentaries as the corpus and completes a series of studies contains the definition of entity guideline,manual tagging of entity,entity knowledge mining based on statistical analysis,entity recognition,entity disambiguation and the establishment of a database of entity knowledge.The entity guideline is used as a guide.Entity knowledge is retrieved with the tagging,recognition and disambiguation.More detailed knowledge is acquired by the statistical analysis and all the entity knowledge mined within the study is stored in structure in a database.There are four main parts of this paper according to the study scheme.Chapter 3 proposes an entity guideline.Chapter 4 discusses the methods of manual tagging and shows a statistical analysis on the tagged corpus.Chapter 5 shows several experiments of entity recognition based on different models and features.Chapter 6 discusses the two main kind of entity ambiguity and proposes a rule based method of entity disambiguation.The main parts of an entity knowledge database and the construct of a database on this study is also discussed within this Chapter.The main contents of this four parts are showed respectively below.(1)The construction of an entity guideline.The guideline defines the entity and entity categories.There are three kind of entity:Person,Location and Time.The inner components of an entity is deeply studied later.All the construction rules of entities are show with tagged sentences as examples.(2)Entity manual tagging and the statistical analysis on the tagged corpus.The chapter proposed three half-automatic methods of entity tagging and disambiguation.The distribution of entity,entity types and entity components is calculated later.Furthermore,the most popular person,location and time entity in the corpus are presented as well.(3)Entity recognition.The specialty of entity recognition in ancient Chinese classics is discussed at first.Three machine learning models(HMM,Max Entropy and CRF)with four different features are used on entity recognition.The result shows that CRF model with integrate features performs best.(4)Entity disambiguation and the construction of entity knowledge database.Entity with ambiguity may fall in two main types,each type is discussed deeply in the paper.A rule based method on entity disambiguation is proposed later.The method of constructing an entity knowledge database is presented at last.The entity knowledge database contains all the structured entity knowledge obtained throughout this study.This paper has presented a study scheme of the mining of entity knowledge on ancient Chinese classics.All the main parts of this scheme is deeply studied throughout this paper.This study is innovative on several topics,and the author will focus on the study and expand the topics in the study following up. | Keywords/Search Tags: | ancient Chinese classics, mining of entity knowledge, ancient Chinese names, ancient Chinese information processing, entity tagging, entity recognition, entity disambiguation, entity knowledge database | PDF Full Text Request | Related items |
| |
|