Font Size: a A A

The Research On The Algorithm Of Chinese Name Disambiguation

Posted on:2017-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:H WanFull Text:PDF
GTID:2428330590968180Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Name ambiguation is a phenomenon in which the identity of an entity is uncertain,and is also an important issue in the field of natural language processing.With the development of Internet technology and big data era,more Internet applications come into people's life.The entity of personal names has played a crucial role in many new applications,including in the search engine,social network,the name of the knowledge base construction and other fields.Now,applications are based on the purpose of human-oriented and private customized services to the public,Therefore,how to effectively eliminate the ambiguity caused by the same name has become a very important research topic at home and abroad.The study of Chinese name ambiguity is also facing a huge challenge.So the research content of this paper is to find algorithms with models to eliminate the ambiguity.First of all,the main research ideas of the name disambiguation algorithm,is to extract the text features from the text containing the key words.The key words of these texts are used to identify the similarity between the texts by comparing the algorithm,So as to achieve the elimination of the name ambiguity.Specific approach is to use the TF-IDF algorithm to extract the key words with the weight from the texts.Then,create the feature vector of texts.Based on the formula of the cosine theorem of vector in the space of different feature vectors,the similarity of different feature vectors is calculated,and the results of the angle between the vectors are used to judge the ambiguity of the names.The algorithm design of the experiment process is from simple to complex,and the improvement measures are put forward after the analysis of the characteristics and features of the algorithm.Proposed a vector set of multi feature fusion,normalization of feature vector set,etc.,some other auxiliary features of the text are fused to the name disambiguation algorithm,which forms an extensible complement.The experimental results show that,by comparing the characteristics of the feature vector generated by the text with the cosine similarity algorithm,the purpose of the name disambiguation can be achieved effectively.It also puts forward the direction of improvement in the future.It can be added to the context of the impact of the semantic features of the text,so as to improve the name disambiguation algorithm.
Keywords/Search Tags:Name disambiguation, Keyword feature, TF-IDF algorithm, Cosine similarity formula, Auxiliary feature
PDF Full Text Request
Related items