Mongolian Named Entity Recoginition

Posted on:2019-06-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W H Wang

Full Text:PDF

GTID:1368330596956126

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Mongolian named entity recognition is the task of identifying and classifying proper names in a given text.It is one of the fundamental tasks in the natural language processing fields,which can improve the performance of machine translation,information retrieval,information extraction and machine comprehension.More importantly,it is the key component to build a knowledge graph or a question answering system.As one of agglunitive languages,Mongolian has complex morphological structure.Nowadays,the researches on Mongolian named entity recognition are at their initial stage.The related work on Mongolian are very limited yet.Therefore the progress of its research has restricted the whole development of Mongolian language processing.So we conducted researches on named entity recognition for Mongolian,which could bring the related Mongolian research into a new level.In this thesis,we built the first manually annotated corpus for Mongolian named entity recognition,since there is no manually annotated named entity rules and data sets for Mongolian now.We made the rules and the platform for annotating.Annotation will be conducted by referring other languages annotation rules.Consequencely,this corpus becomes the largest corpus for Mongolian named entities right now.With this corpus,we addressed four key problems to recognize Mongolian named entities.They are as follows: how to improve the performance of Mongolian named entity system with rich features;how to learn morpheme representation automatically from corpus,how to incorporate knowledge from other similar task and how to transfer knowledge from other languages.We addressed the key problems during the research which could promote the natural language processing research on Mongolian.The main contributions of this work are as follows:(1)With the complex morphological structure of Mongolian,we proposed a method to perform Mongolian named entities recognition with rich features using suffixes segmentation.The comprehensive features including context features,morphological features,semantic features and syllable features.In contrast to English,a Mongolian word is usually composed with adding several suffixes.So we segmented the suffix as a new unit to train a Conditional Random Field(CRF)classifier.The experimental results show that segmenting each suffix into an individual token achieves better results than deleting suffixes or using the suffixes as feature.The system based on segmenting suffixes with the optimal features combination yields benchmark result on this corpus.(2)In order to release the dependence on features engineering,we presented a new Mongolian named entity recognition approach using recurrent neural network.This neural network uses the morpheme representation as the input,which are learned from a large scale unannotated corpus.On the top of it,a CRF layer will jointly decode the best label sequence.This method could learn the sematic relationship between morphemes and the dependence of labels.The experimental results show that feeding the morpheme representation into the neural network instead of word vector improve the performance of Mongolian named entity recognition.Additionally,the jointly decode layer learn the relationship of each tags that result in the improvement of the whole system.(3)We imporved the recurrent neural network model via incorporating the knowledge from Mongolian character and morpheme language model.The character representation can learn the sematic knowledge within a morpheme.The language model auxiliary loss could learn about the morpheme context.Experimental results show that the added character embedding and language model loss function benefit for the improvement of system performance.(4)Cyrillic Mongolian is the mother tongue of Mongolia,which has the same grammar and similar pronunciation with classical Mongolian.It is necessary to transfer other languages knowledge to further promote the performance of classical Mongolian named entity recognition system,especially the related language,Cyrillic Mongolian.Therefore,we transfer the knowledge acquired from Cyrillic Mongolian named entity recognition system with shared neural network parameters or language knowledge.The experimental results show that the additional knowledge do good to the classifier.To conclude,our work made the Mongolian named entity recognition into practical and laid solid foundation to other Mongolian information processing tasks.Also,this work would be beneficial to the development of artificial intelligent and big data in the minor regions of China.More importantly,our work would also inspire researches on other agglunitive languages.

Keywords/Search Tags:

Information Processing for Mongolian, Named Entity Recognition, Representation Learning, Recurrent Neural Network

PDF Full Text Request

Related items

1	Domain Adaptation Research And Application Of Named Entity Recognition
2	Research On Chinese Named Entity Recognition Based On Deep Learning
3	Research On Named Entity Recognition For Chinese Weibo Text
4	Research On Chinese Named Entity Recognition Based On Deep Learning
5	Named Entity Recognition In The Food Of Online Healthy Texts Based On Deep Learning
6	Domain Named Entity Recognition Method Based On Recurrent Neural Network
7	Research And Application On Named Entity Recognition Based On LSTM
8	Research Of Entity Named Recognition Based On Neural Network
9	Research On Named Entity Recognition Of Chinese Image Reports Based On Recurrent Neural Networks
10	Research On Named Entity Recognition Based On Neural Network Ensemble