Font Size: a A A

Mongolian Named Entity Recoginition

Posted on:2019-06-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H WangFull Text:PDF
GTID:1368330596956126Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Mongolian named entity recognition is the task of identifying and classifying proper names in a given text.It is one of the fundamental tasks in the natural language processing fields,which can improve the performance of machine translation,information retrieval,information extraction and machine comprehension.More importantly,it is the key component to build a knowledge graph or a question answering system.As one of agglunitive languages,Mongolian has complex morphological structure.Nowadays,the researches on Mongolian named entity recognition are at their initial stage.The related work on Mongolian are very limited yet.Therefore the progress of its research has restricted the whole development of Mongolian language processing.So we conducted researches on named entity recognition for Mongolian,which could bring the related Mongolian research into a new level.In this thesis,we built the first manually annotated corpus for Mongolian named entity recognition,since there is no manually annotated named entity rules and data sets for Mongolian now.We made the rules and the platform for annotating.Annotation will be conducted by referring other languages annotation rules.Consequencely,this corpus becomes the largest corpus for Mongolian named entities right now.With this corpus,we addressed four key problems to recognize Mongolian named entities.They are as follows: how to improve the performance of Mongolian named entity system with rich features;how to learn morpheme representation automatically from corpus,how to incorporate knowledge from other similar task and how to transfer knowledge from other languages.We addressed the key problems during the research which could promote the natural language processing research on Mongolian.The main contributions of this work are as follows:(1)With the complex morphological structure of Mongolian,we proposed a method to perform Mongolian named entities recognition with rich features using suffixes segmentation.The comprehensive features including context features,morphological features,semantic features and syllable features.In contrast to English,a Mongolian word is usually composed with adding several suffixes.So we segmented the suffix as a new unit to train a Conditional Random Field(CRF)classifier.The experimental results show that segmenting each suffix into an individual token achieves better results than deleting suffixes or using the suffixes as feature.The system based on segmenting suffixes with the optimal features combination yields benchmark result on this corpus.(2)In order to release the dependence on features engineering,we presented a new Mongolian named entity recognition approach using recurrent neural network.This neural network uses the morpheme representation as the input,which are learned from a large scale unannotated corpus.On the top of it,a CRF layer will jointly decode the best label sequence.This method could learn the sematic relationship between morphemes and the dependence of labels.The experimental results show that feeding the morpheme representation into the neural network instead of word vector improve the performance of Mongolian named entity recognition.Additionally,the jointly decode layer learn the relationship of each tags that result in the improvement of the whole system.(3)We imporved the recurrent neural network model via incorporating the knowledge from Mongolian character and morpheme language model.The character representation can learn the sematic knowledge within a morpheme.The language model auxiliary loss could learn about the morpheme context.Experimental results show that the added character embedding and language model loss function benefit for the improvement of system performance.(4)Cyrillic Mongolian is the mother tongue of Mongolia,which has the same grammar and similar pronunciation with classical Mongolian.It is necessary to transfer other languages knowledge to further promote the performance of classical Mongolian named entity recognition system,especially the related language,Cyrillic Mongolian.Therefore,we transfer the knowledge acquired from Cyrillic Mongolian named entity recognition system with shared neural network parameters or language knowledge.The experimental results show that the additional knowledge do good to the classifier.To conclude,our work made the Mongolian named entity recognition into practical and laid solid foundation to other Mongolian information processing tasks.Also,this work would be beneficial to the development of artificial intelligent and big data in the minor regions of China.More importantly,our work would also inspire researches on other agglunitive languages.
Keywords/Search Tags:Information Processing for Mongolian, Named Entity Recognition, Representation Learning, Recurrent Neural Network
PDF Full Text Request
Related items