Font Size: a A A

Research And Application Of The Chinese Organization Names Recognition And Disambiguation

Posted on:2017-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:X M XiangFull Text:PDF
GTID:2308330485470213Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the Internet has become a carrier of knowledge in the information society. The Internet brings together a large number of text. In order to obtain the knowledge, information automatic processing technology is very important, the named entity recognition (NER) is an important research. The named entity is refers to the text recognition that has the specific meaning of the entity, mainly including person names, place names, organization names, besides, the organization name is the most difficult part. Unfortunately, even if identified named entities, homonyms, synonym problems bring new challenges. In order to be able to identify the different organization name, in this paper, on the basis of the recognition, further optimize the results of named entity disambiguation, named entity disambiguation is to recognize the right meaning in different context. Therefore, this paper focuses on the study of named entity recognition and application of named entity disambiguation.At present, a lot of work has been done for named entity recognition and named entity disambiguation. Although has the improvement in the accuracy, but still exist the following problems. At first, in the named entity disambiguation, named entity recognition is an important part in named entity disambiguation. Besides, named entity exists a large number of abbreviations, because of the acronym formation is complex, so the identification of abbreviations is relatively difficult. Last, named entity disambiguation now consider only the context, the relationship between the does not take into account. If only considering the context and ignore the relationship between named entities, it will have certain influence to the disambiguation accuracy.This article from the named entity recognition, to realize the optimization of named entity disambiguation results. In this paper, the specific work and achievements include.First of all, based on the conditions random field, we propose a new method to identify the organization name. To the results of beyond the critical value, we join template for the recognition, thereby improving the accuracy of the recognition. In view of the general recognition algorithm, we put more emphasis on the feature template in this paper, through the contrast experiment, and the experiment of different window size, we prove the effectiveness of the method in this paper.Second, in the process of recognizing the organization name, this paper join the algorithm that based on rules and semantic edit distance. On the basis of the existing data sets, this article defines the full name and the abbreviation name rules. To the organization name has been identified, we put the method of semantic edit distance to identify the full name and abbreviation name, so as to improve the recall rate of the identification of the organization names. Finally, the experiments prove that our method has a higher accuracy compared with the previous method.Finally, On the basis of full name and abbreviation name recognition, we use the existing entity disambiguation algorithm, thus on the basis of the statistical method that we join the methods of abbreviation with the full name identification. In the end, we greatly improves the accuracy of entity disambiguation and overcome problems of the existing method that only consider the context features without considering the characteristics that between entities.
Keywords/Search Tags:Named Entity Recognition, Named Entity Disambiguation, Conditional Random Field, Rules and Semantic Edit Distance, Bag of words, Random Walk
PDF Full Text Request
Related items