Font Size: a A A

Research On Classifying Entity Mentions In Natural Language

Posted on:2019-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:K Y CuiFull Text:PDF
GTID:2428330545953682Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
An entity mention in natural language is a phrase in a sentence,which is a subject or concept with distinct and independent existence.Inferring semantic types of the entity mentions in a sentence is a necessary yet challenging task.One entity mention may occur in different contexts;thus,it may belong to different semantic types.For instance,Apple has different semantic types,like Fruit,Mobile Phone,Movie,etc.,in different sentences.What makes things more difficult is that these three types do not have any semantic similarity between each other.In the age of the internet,there is huge amount of data produced every day.With the growth of data and the boost of storage system,people can obtain all kinds of information from different sources.Obviously,it is significant for researchers and engineers to work on how to analysis the data and extract useful information,and how to utilize the data to make profits.Entity mention classification is a vital important step toward understanding natural language.In order to understand and utilize the huge amount of language data,there has been some work focusing on the entity mention classification task.But most of existing methods employ a very coarse-grained type taxonomy,which is too general and not exact enough for many other NLP(Natural Language Processing)tasks.However,the performances of those methods drop sharply when we extend the type taxonomy to a fine-grained one with several hundreds of types.Since there are many other NLP tasks based on the classification task,the drop in the classification performance leads to the drop in the performance of other NLP tasks.In this thesis,we introduce a hybrid neural network model for type classification of entity mentions with a fine-grained taxonomy.There are four components in our model,namely,the entity mention component,the context component,the relation component,the already known type component,which are used to extract features from the target entity mention,context,relations and already known types of the entity mentions in surrounding context respectively.The learned features by the four components are concatenated and fed into a logistic layer to predict the type distribution.We carried out extensive experiments to evaluate our proposed model.Experimental results showed that our model achieved state-of-the-art performance on the FIGER dataset.Moreover,we extracted larger datasets from Wikipedia and DBpedia.On the larger datasets,our model achieved the comparable performance to the state-of-the-art methods with the coarse-grained type taxonomy but performed much better than those methods with the fine-grained type taxonomy in terms of micro-F1,macro-F1 and weighted-F1.The contributions of this thesis are summarized as follows:We introduce a novel unsupervised method to extract the context entity mentions by exploring anchor links in the similar sentences.We propose a hybrid neural model to quantify and leverage entity mention relations to improve the performance of type classification.We utilize already known types of entity mentions in the surrounding context,which enables the model to better generalize for uncommon or unseen entity mentions.We carried out extensive experiments.With the coarse-grained taxonomy,REM is comparable to the state-of-the-art methods.But with the fine-grained taxonomy,REM outperforms these methods.And TEM significantly outperforms both REM and the state-of-the-art methods.
Keywords/Search Tags:Entity Mention Classification, Entity Mention Relation, Fine-grained Taxonomy
PDF Full Text Request
Related items