Font Size: a A A

Establishment And Study Of Cultural Relics Digital Protection Thesaurus

Posted on:2018-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:W LuoFull Text:PDF
GTID:2348330518994127Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Digital protection is an important way to protect cultural relics at present.A large amount of data will be produced in the digitization process,and this proposes challenges to the field of information retrieval of cultural relics.It is of great significance to construct a thesaurus for the field of cultural relics,which is used as a standardized retrieval language tool for the protection of cultural relic and the study of cultural relics information resources by research units.The main work of this paper is to build a "cultural relics digital protection thesaurus" by means of computer technology.This paper mainly researches on the following aspects:1)Through investigating and surveying the status of compiling important subject vocabularies and thesauruses in the cultural relics digital protection field both in the domestic and overseas,and analyzing the construction process,the rules of cataloging,the structure of words and the compile technology of subject vocabularies,this paper proposes a compiling system framework of the thesaurus.2)Constructing the main table,the word family table,the English and Chinese index table.Firstly,build the corpus of cultural relics with classified labels manually.Segment corpus using segmentation system as well as retrieve keywords of journal and magazine through the web,after which the preliminary selected lexicon is established.And the formal subjected terms are chosen by a variety of norms.Then,this paper obtain the synonymy between phrases through the study of the semantic relation algorithm between lexicals,and the selected subject words are processed synthetically based on the semantic similarity algorithm of HowNet,the semantic similarity algorithm of synonymy thesaurus and the pattern matching method.Next,the hierarchical relationship between the vocabularies is obtained by the method of the rear consistent method and the literal similarity algorithm.And the association relationship is obtained through the Dice measure and the literature co-occurrence method.Finally,the three types of inter-word relations are used to construct the main table,the lexeme table,the English and Chinese index tables.3)Through the study of Word2vec word clustering algorithm,the word clustering algorithm is applied to the cultural relics classification system for achieving the automatic clustering of cultural relics classification system,and used to build the category table.Through the research above,this paper completes the construction of the whole thesaurus,including main table,word family table,category table,English and Chinese index table,which contains 2605 words in total.The thesaurus can be used not only for indexing and recording the literature resources,but also for normalizing and unifying terms in the field of cultural relics.
Keywords/Search Tags:cultural relics digital, thesaurus, automatic establishment, semantic relations, word clustering
PDF Full Text Request
Related items