Study On Uyghur Named Entity Recognition And Related Problems

Posted on:2019-01-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H M T M M T Mai

Full Text:PDF

GTID:1528305651965769

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Named entity recognition(NER)is a classic problem in Natural Language Processing(NLP).It identifies specific entities in text,including personal names,place names,organization names,proper nouns and so on.Due to the unique lexical and linguistic features of Uyghur named entity recognition,and it is not appropriate to apply the technique suitable for English and Chinese directly.At present,there is no publicly available Uyghur named entity tagged corpus.This paper constructs a corpus of Uyghur named entity by manual annotation.On the basis of deeply analyzing the grammatical and semantic features of Uyghur named entity and in view of the excellent performance in sequence labelling task,we first use CRFs model to study Uyghur named entity recognition.On the design of feature templates,word,syllable,POS tagging and distributed vector representation are utilized to analyze their influence on NER.Secondly,we use deep learning technology to further study Uyghur NER.We utilize character embedding and syllable embedding to improve the system performance.Finally,we apply named entity recognition results to propose a cross-language NER translation pairs automatic extraction method based on word vector.The main work are as follows:1.Uyghur named entity tagging Corpus Construction: We use the existing bilingual resources and Chinese NER results to construct a Uyghur named entity corpus(UNEC),including person name tagged corpus,location name tagged corpus,organization name tagged corpus and the integrated corpus of person name,place name and organization name.These work to fill the gaps in the current lack of named entity tagged Uyghur corpus and provide open data resources for Uyghur NLP researches.2.In Uyghur Part-of-Speech(POS)tagging,we use bidirectional long short-term memory neural network with CRF layer(BI-LSTM-CRF)to study Uyghur POS tagging and propose a method which combines character embedding,word embedding,syllable features and suffix features to further improve tagging performance.We construct a fast and effective POS tagging system whose performance has exceeded that of all known methods in the comparative experiment.3.A Uyghur named entity recognition method based on CRFs and unsupervised feature extraction is proposed;a syllable feature and similar word feature extraction method is put forward,then the efficiency of Uyghur NER is improved.The proposed syllable feature can almost replace stem and affix features,the effect of similar word feature which is extracted from unlabeled large-scale corpora to obtain the semantic and syntactic information of words,almost reaches the same recognition efficiency comparing to lexical features,even superior to morphological and dictionary features in some recognition tasks;the proposed feature extraction method can greatly reduce the cost of engineered feature creation,and improve the performance of Uyghur named entity recognition.4.Based on the feature that there are more transliterated named entities in Uyghur language and its syllables is relatively special,we propose a Syllable-Embedding for BI-LSTMCRF model and perform a comprehensive study of Uyghur NER based on neural network,verifying the syllable-based word representation and its effectiveness.Furthermore,we study the impact of different word representations on Uyghur NER in deep learning method and reduce the shortage of data sparseness,unknown words tagging and artificial feature construction problems in Uyghur NER.5.Cross-Language named entity translation pairs extraction method based on bilingual word vector and NER: On the basis of the recognition results of the Uyghur NER,we propose a multilingual named entity equivalent pairs extraction method based on the word vector.After conducting NER separately for bilingual aligned sentences,we merge bilingual sentences together to train bilingual word vectors,and then extract equivalent entity translation pairs using different strategies.6.Based on the research results achieved in this paper,a web service platform for Uyghur natural language processing is constructed.The main services provided include Uyghur POS tagging(the processing depth can be selected 15,25,64 tags set annotation),named entity recognition,tokenization,syllabification and sentence boundary detection,etc.

Keywords/Search Tags:

Uyghur Language, Named Entity Recognition, Neural Network, Extraction of Named Entity Translation Equivalents, POS Tagging, Unsupervised Feature Extraction, Syllable-Embedding, Character-Embedding

PDF Full Text Request

Related items

1	Joint Extraction Of Named Entity Recognition And Entity Relationship Based On Neural Network
2	Research On Named Entity Extraction Method For Symptom Phenotype
3	Research On Extraction Of Named Entity Translation Equivalents From Comparable Corpus
4	The Research And Implementation Of Named Entity Recognition For Chinese Social Media
5	Research On Named Entity Recognition Relation Extraction And Recommendation Algorithm In Chinese Tourism
6	Research On A Resume-oriented Chinese-Uyghur Machine Translation System
7	The Methods And Researches Into Construct Chinese-Japanese Named Entity Translation Equivalents
8	The Field Of Music, A Combination Of Rules And Statistical Named Entity Recognition
9	The Research Of Chinese Named Entity Recognition And Its Relation Extraction
10	Research On Part Of Speech Tagging System Of Pre-Qin Classics Oriented To Entity Extraction