The Research On The Algorithm Of Chinese Name Disambiguation

Posted on:2017-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:H Wan

Full Text:PDF

GTID:2428330590968180

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Name ambiguation is a phenomenon in which the identity of an entity is uncertain,and is also an important issue in the field of natural language processing.With the development of Internet technology and big data era,more Internet applications come into people's life.The entity of personal names has played a crucial role in many new applications,including in the search engine,social network,the name of the knowledge base construction and other fields.Now,applications are based on the purpose of human-oriented and private customized services to the public,Therefore,how to effectively eliminate the ambiguity caused by the same name has become a very important research topic at home and abroad.The study of Chinese name ambiguity is also facing a huge challenge.So the research content of this paper is to find algorithms with models to eliminate the ambiguity.First of all,the main research ideas of the name disambiguation algorithm,is to extract the text features from the text containing the key words.The key words of these texts are used to identify the similarity between the texts by comparing the algorithm,So as to achieve the elimination of the name ambiguity.Specific approach is to use the TF-IDF algorithm to extract the key words with the weight from the texts.Then,create the feature vector of texts.Based on the formula of the cosine theorem of vector in the space of different feature vectors,the similarity of different feature vectors is calculated,and the results of the angle between the vectors are used to judge the ambiguity of the names.The algorithm design of the experiment process is from simple to complex,and the improvement measures are put forward after the analysis of the characteristics and features of the algorithm.Proposed a vector set of multi feature fusion,normalization of feature vector set,etc.,some other auxiliary features of the text are fused to the name disambiguation algorithm,which forms an extensible complement.The experimental results show that,by comparing the characteristics of the feature vector generated by the text with the cosine similarity algorithm,the purpose of the name disambiguation can be achieved effectively.It also puts forward the direction of improvement in the future.It can be added to the context of the impact of the semantic features of the text,so as to improve the name disambiguation algorithm.

Keywords/Search Tags:

Name disambiguation, Keyword feature, TF-IDF algorithm, Cosine similarity formula, Auxiliary feature

PDF Full Text Request

Related items

1	Research On Similarity Related Problems In Collaborative Filtering Algorithms
2	Similarity Computing Of Scientific And Technical Documents Based On Texts And Formulas
3	Research On Kazakh Syntactic Parsing Auxiliary Feature Extraction
4	A Non-negative Matrix Factorization Clustering Algorithm Based On L_2,1/2 Sparse Constraint And Cosine Similarity
5	Research Of Word Sense Disambiguation Based On Hybird Features And Rules
6	Mathematical Formula Feature Extraction And Locating In Chinese Scanned Printed Document
7	Research On Image Registration Algorithm Based On SURF And KAZE
8	Research On Feature Enhancement Methods For Entity Disambiguation
9	Research On Multi Feature Based Extract Text Keyword Algorithm
10	Research On Word Sense Disambiguation And Keyword Expansion In Question Answering System