Font Size: a A A

Domain Named Entity Recognition Method Based On Recurrent Neural Network

Posted on:2019-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q SunFull Text:PDF
GTID:2428330596966398Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At information age,people tend to search information and get answers from internet.The common way a traditional searching engine uses is to give relevant documents rather than exact answers according to the questions that people enter.Thus the question and answering system based on knowledge graph has been developing nowadays.As a fundamental and virtual step of knowledge graph construction,named entity recognition(NER)task has been widely researched.However,for those same texts,their features and types may vary in different domains,which makes the domain NER research harder.In recent years,neural networks have shown their great potential in many fields and recurrent neural network(RNN)has become a popular and effective method to do NER task.However,it needs plenty of labeled data while the lack of labeled data is kind of a knotty problem.Based on the current methods for named entity recognition,this thesis concerns about the very limited amount of available tagged data condition,proposes an improved RNN model named enhanced RNN(ERNN).Meanwhile,an instance transfer method and a co-training strategy are both adopted.Focusing on the situation when labeled data is severely in lack,the main work of this thesis includes:(1)This thesis researches and analyzes the named entity recognition performance of statistical probability models and RNN model.The experiments are conducted on ATIS dataset published by DARPA and a Chinese literature dataset published by a paper,and different datasizes are used to do supervised learning.The results show that for NER task,RNN model is much better than statistical probability models and the highest improvement reaches 39.72%.Meanwhile,it's proved that small dataset can also achieve good performance.(2)This thesis adopts RNN and statistical probability models to co-training strategy and modifies the RNN's activation function.The performances are compared between before/after modifying the function.Experiment results indicate that the new activation function improves the recognition performances to some degree.Besides,when train data is much less than test set,good results can be obtained by leveraging the unlabeled data.(3)Aiming at domain NER task,this thesis improves the structure of the neural network that mentioned in(2).By adding an additional layer and adopting instance transfer,source domain data is made good use of from two aspects.In addition,this thesis proposes two transfer strategies.In experiments,Peoples' daily and sougou news corpus are used as source domain data while high school news test data are used as target domain data.This thesis not only compares the experiment results of using different source domains and before/after modifying RNN model,but also makes a comparison between with the help of source domain data and the without.The results show that the enhanced RNN model proposed by this thesis gets a 2.06% improvement on F1 score(from 0.9212 to 0.9402).
Keywords/Search Tags:knowledge graph, recurrent neural network, domain named entity recognition, co-training, instance transfer
PDF Full Text Request
Related items