Font Size: a A A

Research On Chinese Name Entity Recognition Algorithm

Posted on:2018-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z N XieFull Text:PDF
GTID:2348330515459744Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Name Entity Recognition(NER)is the recognition of specified entity,mainly including the name of person,location,organization,etc.It is a significant technique of the conversion from unstructured data to structured data and the key step for computper to understand text.Besides,it is also the basic task of many NLP applications such as information extrac-tion,sentiment analysis,question answering system,etc.Undoubtedly,the study of NER is of great weight.However,due to the feature of Chinese langauage itself,Chinese NER encounters many difficulties.(1)Chinese NER technique commonly relys on single-based model.They have their own characteristic and limitation.(2)Chinese NER is commonly based on word sequence,which needs the help of Chinese Word Segmentation(CWS)tech-nique.The performance of Chinese NER usually relies on the accuracy of CWS.The content and work of this paper mainly includes serveral parts.(1)Related work of NER has been reviewed in this paper.We sumarize and implement the main method of NER.We also analyze and compare their advantage and disadvantage,providing some ideas for the subsequent work of this paper.(2)In order to solve the limitation single-based model,this paper combine several models and utilize multi-task learning.The method,BiLSTM-CRF-MTL,can solve the weakness of single-based model effectively and use several related task to learn feature without laborious feature engineering.(3)To solve the problem of word-sequence-based method,this paper adopt char-sequence-based method to recognize name entity,introducing word vector,which is based on external corpus and new word identification.Meanwhile,the confidence of CWS,which is based on keyword extraction,is considered as feature to alleviate the noise of inaccuracy of CWS.(4)In order to learn the context feature and alleviate the problem of few label data,we propose a method of generation of new sample by replacing entity.This paper use the corpus of 1998 People Daily as Chinese NER evaluation.We compare mutiple single-based methods and methods of related paper.In the experiment,our method achieves average F1 with 88.79%,exceeding other methods substantially.
Keywords/Search Tags:Chinese Name Entity Recognition, hybrid model, multi-task learning, word vec-tor
PDF Full Text Request
Related items