Font Size: a A A

Multi-task Learning For Chinese Named Entity Recognition

Posted on:2019-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2428330611493303Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
For a long time,Named entity recognition has been a fundamental and challenging task in Natural language processing.Named entity recognition and mention detection is a task of identifying entities(named and/or nominal)from raw text,and classifying the detected entities into one of pre-defined categories such as person,organization,location,etc.Named entity recognition is the foundation of many advanced natural language processing tasks,such as relationship extraction,question and answer system,automatic summarization,information retrieval,knowledge base construction,and etc..Therefore,Named entity recognition has very practical significance.The main research work of the article is as follows:1)Chinese stroke for character embedding This paper is based on the word2 vec model and improves the Chinese character embedding by mining the semantic and morphological features in words.Inspired by the English subword model,we divide Chinese characters into more fine-grained stroke sequences.The n-gram and LSTM are used to capture the internal structural features of Chinese characters,and improve the ability of the model to express ideograms.The test is on Chinese Wikipedia and Chinese electronic medical record datasets.The results show that the character embedding vector of model training is better than the results of mainstream models such as word2 vec,GloVe and CWE.2)Multitask Learning for Chinese Named Entity Recognition: At present,the research on named entity recognition based on multi-task learning is not mature.We design a hierarchical multi-task learning model,which uses Chinese word segmentation as an auxiliary task to further improve the entity prediction accuracy.We further add the objective function of the language model as an auxiliary task to the named entity recognition training process: adding two output layers for each input,predicting the previous character and the next character.This can learn more semantic features without the need to increase the training samples.In addition,we introduce the attention mechanism in the named entity recognition model,and focuses the model attention on the entity and its surrounding areas,so that the model pays more attention to the local features of the sequence and further enhances the named entity recognition.The prediction accuracy of the model.Finally,This paper designs a unified Chinese entity recognition framework based on the above methods.Experiment is on the Chinese medical electronic medical records and Chinese social media data sets to verify its effectiveness.Experiments results demonstrate that the unified entity recognition framework can further enhance the recognition effect of the named entity.The model's strict F1 in the Chinese medical electronic medical record data set(CCKS-NER 2017)is 90.65%,which was 1.70% higher than the baseline model,which is the best on this dataset to the best knowledge of the author.
Keywords/Search Tags:Natural language processing, Named entity recognition, Multi-task learning, Attention
PDF Full Text Request
Related items