Font Size: a A A

Research On Graph Based Named Entity Disambiguation

Posted on:2016-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:G YangFull Text:PDF
GTID:2308330479990116Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularization of Web2.0,more and more Internet users take part in internet as creator of web document. With the rapid expansion of the internet,a lot of web documents are generated. Valuable information can be discovered by analyzing natural language in web documents with natural language processing technique. While the good and bad are intermingled due to the web documents were made by ordinary Internet users. Because of the diversity and ambiguity of the natural language, there are various of expressions for one entity and a single entity string may stand for different entities in different context. In order to make computer understand natural language text correctly, removing ambiguity for named entity is especially important. In our research we use a graph based method to solve this problem. The research contents of this paper are as follows:First, we preprocess the knowledge base. Relationships between entities are utilized by the graph based method to expand candidate entities. The performance of named entity disambiguation is directly impacted by the quality of enti ty triples in knowledge base. Preprocessing stage is of great importance. We consider this stage in two part, the form of RDF triple and filtering of tri ples. In the aspect of RDF form, we simplify the expression of entity, and deal with the exception encoding. In the view of dataset, we will analyze the feature of different dataset and then filter the triples. By the means of preprocessing, the data which is useless for disambiguation is removed, the noise of data is reduced. The processed knowledge base will be used for the follow up task.The main task of name entity disambiguation consists of candidate entity generation and disambiguation. We will generate candidate entity at first. The candidate generation method based on two aspects: string similarity based method and knowledge base based method. String similarity method contains the procedure of candidate expansion and candidate filtering. Knowledge base based method includes generating candidate by heuristic and generate candidate by prior knowledge. We contrast the effectiveness of different method and combine them finally.After candidate generation, the next task is disambiguation of named entity. First we expand the candidate entities using the relationship between entities in the knowledgebase,therefore a graph that connect all candidate entities emerge. Then we will run link analysis algorithm on this graph, the algorithm will give a score for each of the candidate entities. The strategies to expand entities include the relationship to used, the path length between candidate entity and the context scale. The strategy for graph based disambiguation algorithm include choose target for all mention successively or collectively. We will contrast these strategies. We also optimize the graph based method to improve the efficiency. We will choose the best strategy,and compare to the baseline system.
Keywords/Search Tags:named entity disambiguation, graph based method, DBpedia, knowledge base
PDF Full Text Request
Related items