Font Size: a A A

Research On Key Technologies Of Gesar Epic Named Entity Recognition Based On Deep Learning

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:K Y HuanFull Text:PDF
GTID:2518306752993269Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Language and characters are the knowledge that is best at exploring human wisdom.They are the collection of ancient civilizations and the core resources of today's natural language processing and knowledge atlas.Natural language processing and knowledge atlas are the key to the realization of artificial intelligence and the Pearl on the crown of artificial intelligence.Breaking through natural language processing and knowledge atlas will greatly promote the wide attention and application of artificial intelligence in many fields such as academia and industry.In recent years,with the support of big data and deep learning,natural language processing and knowledge mapping technology have developed rapidly,and human beings urgently need to quickly and accurately obtain the information they want or mine valuable knowledge elements from these vast data.Therefore,named entity recognition technology came into being.Named entity recognition in English,Chinese and other languages has achieved fruitful results and has been widely used.However,the research on Tibetan named entity recognition is in the preliminary stage,with only a decade of history.The research on Tibetan named entity recognition has important theoretical significance and wide application value for automatically discovering the language characteristics of Tibetan and studying Tibetan language with deep learning technology.Aiming at the problem of named entity recognition in specific fields that need to be solved urgently in Tibetan information processing technology at the present stage,this paper collects and arranges Gesar Epic with rich named entities in Tibetan literature as experimental corpus,formulates Gesar Epic named entity annotation specification,and designs Gesar Epic named entity recognition visualization system.On this basis,focusing on how to design and implement the combination of deep learning technology and epic language features to improve the recognition performance of Tibetan named entities,this paper gradually studies the entity extraction method of Gesar Epic based on deep learning,the entity boundary algorithm,and the automatic annotation and comparative analysis of entities in the history of four demons.The main work includes:(1)Gesar Epic data preprocessingIn view of the lack of Tibetan named entity corpus in specific fields,this paper analyzes the language characteristics of Gesar Epic,selects Gesar Epic as corpus data,and puts forward a data preprocessing method with Tibetan syllables as the basic unit.In order to improve the standardization of Tibetan named entities,this paper studies the composition law of Tibetan named entities,analyzes and summarizes the characteristics of Gesar Epic named entities,and formulates six types of Gesar Epic named entities,including human names,place names,organization names,sacred animal mounts,weapons and armor and living utensils,The specification basically covers the meaningful entities or named entities of entity reference items in Gesar Epic.(2)Named entity recognition technology of Gesar Epic based on deep learningAiming at the named entity characteristics and recognition difficulties of Gesar Epic,the post-processing task of Gesar Epic entity extraction and the entity boundary recognition algorithm at syllable level are designed,and the named entity recognition method based on Bert model,the named entity recognition method based on bidirectional LSRM-CRF model and in order to solve the problem of simplification of syllable vector representation in Gesar Epic and Tibetan named entity recognition,a two-way LSTM-CRF method integrating Bert syllable embedding is proposed.This method can learn different levels of knowledge such as syllables,words and language models from Gesar Epic corpus,which is helpful for named entity recognition and analysis.(3)Experimental analysis and system implementationAiming at the recognition performance of the explicit evaluation model,four classical entity extraction models based on Tibetan syllables are tested on the same data set,namely TS-BILSTM-CRF,TS-BILSTM,BERT-BILSTM and BERT-BILSTM-CRF.Experiments show that the combination of bidirectional LSTM-CRF and Bert Tibetan syllable embedding is 2.75 percentage points higher than BERT-BILSTM,6.12 percentage points higher than TS-BILSTM-CRF and 13.92 percentage points higher than TS-BILSTM.It is verified that the Gesar Epic named entity recognition method integrating Bert syllable embedding and bidirectional LSTM-CRF can effectively improve the recognition performance and is better than the existing entity recognition models.In order to further show the recognition effect of the experiment,the named entity recognition system of Gesar Epic is designed.The main functions of the system are sentence level text segmentation,syllable level text representation,named entity recognition,and statistical analysis of entity types and times.(4)Automatic entity labeling and comparative analysisAiming at the time-consuming and labor-consuming problem of manual corpus annotation,an automatic entity annotation method for labeling the target corpus according to the characteristics of training data is proposed.Then,in order to reflect the characteristics of naming entity word use,distribution analysis and quantity statistics of Gesar Epic,a visual system is designed and implemented for entity comparative analysis,and it is concluded that the ratio of punctuation sentence to entity in Gesar Epic is 3:1,The number of complete sentences and entities involved in epic is equal,which reflects that Gesar Epic is a particularly rich data set of named entities in Tibetan literature.Therefore,this paper summarizes the differences and evolution laws of Gesar Epic at the level of named entity recognition,and shows that named entities are one of the charm of epic.In short,this paper has carried out the research on the key technologies of named entity recognition in specific fields of Tibetan for the first time.According to the challenges faced by Tibetan named entity recognition and the characteristics of Gesar Epic,this paper deeply studies the construction of Gesar Epic named entity corpus,entity acquisition,type classification,entity boundary algorithm,visualization system implementation Automatic entity annotation and comparative analysis.Meaningful conclusions and research results have been obtained in the research,which makes the performance of named entity recognition in Gesar Epic reach the usable level,and provides basic support services for Tibetan upstream tasks.It is hoped that this study can be beneficial to the field of Tibetan natural language processing and knowledge map.
Keywords/Search Tags:Deep learning, Gesar epic, Named entity recognition, BERT
PDF Full Text Request
Related items