Font Size: a A A

The Method Of Nested Named Entity Recognition In Microblog

Posted on:2018-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:W RaoFull Text:PDF
GTID:2348330518473165Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous improvement and development of China's Internet construction,the network has gradually penetrated into all walks of life and people living in all aspects.Variety of Internet applications has brought us great convenience,many Internet users through the Internet for social entertainment,shopping and so on.Online social networks,such as Twitter,Sina Microblog,has become an important medium and ways to communicate,access to information for people.People always share information,such as text,pictures,video and other by this.However,it then bring massive social data generation.How to extract useful information from these massive social data,and use this information to promote social development and progress,is particularly important,and it's a great challenge in the data mining field.At present,nested named entity recognition as the main component of named entity recognition task,is one of the most basic and core technologies in many scientific research(such as question and answer system,knowledge map,artificial intelligence,etc.),and the related method results has a wide applications in practical life.The complexity of Chinese leads to more nested named entity in the text.The existing named entity recognition method can better identify the basic named entity with relatively simple structure.The existing named entity recognition method can better identify the basic named entity with relatively simple structure,but it is difficult to completely and accurately identify nested named entity with complex structure,and the existing methods are focused on named entity recognition research in the regular text.For nested named entity recognition in microblog,after the analysis of its hierarchical structure,this paper divided the recognition task of nested named entity into two parts,and the reasonable construction of the feature database,the external semantic knowledge database and the feature template are combined with cascaded conditional random fields to nested named entity recognition.The main research contents include as follows.1.After the structure analysis of nested named entity,this paper give five types of hierarchical structure feature types,and believe nested named entity recognition is more appropriate after simple basic named entity recognition.Therefore,this paper presents a nested named entity recognition method based on cascaded conditional random fields where the identification task is divided into two stages:Firstly,the basic named entity with relatively simple structure,such as the person name,location name and organization name,are first recognized by a CRFs model,and the result then is passed to another CRFs model and supports the decision of this model for identification of nested named entity recognition.2.This thesis constructed the nested named entity feature database and proposed an automatic extraction algorithm of nested named entity feature word based on Part-of-Speech tagging.At the same time,this paper puts forward the improvement method for the feature word importance.In addition,we construct an external semantic knowledge base artificially,so that the model can obtain abundant information to recognize.3.This paper constructed a set of suitable feature templates for the low-level and high-level recognition models in cascaded conditional random fields,and proposed a feature template selection strategy based on feature word importance to improve the overall recognition effect in the model.4.Finally,this paper compares the effects of different window sizes on Precision,Recall and F-value in the recognition process,and selects the appropriate window size in the recognition model.In addition,the experimental results show that the proposed method of nested named entity recognition based on cascaded conditional random fields has higher precision,recall and F-value and more stable than other existing methods.
Keywords/Search Tags:Microblog, Nested Named Entity, Cascaded Conditional Random Fields, Feature Database, Named Entity Recognition
PDF Full Text Request
Related items