Font Size: a A A

The Study Of Chinese Entity Linking With Word2Vec

Posted on:2017-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:D C HuangFull Text:PDF
GTID:2428330569998644Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity links in the field of information processing is a very basic technology,you can say where can use this technology.In doing chart analysis,you need to understand what is the object of discussion;in doing Q & A system,you need to understand what the user is asking;doing text classification is to understand what the article discusses the concept,etc.Using the named entity link,with this step will have the next analysis,prediction and so on.Entity links are also important because they are the foundation.Entity links the main idea is: First of all,the name may point to the scope of the entity,and then select one from the inside as the target entity.The step of defining the range is called the generation of the candidate entity set,and the operation of selecting the target entity is called entity disambiguation.Candidate entity collection generated by the main methods are query dictionary,similar to the calculation of two similar methods.The construction of dictionaries requires a lot of manual statistics,and the updating of lexical dictionaries tends to lag behind the appearance of new words,while the method based on literal similarity is not accurate.In this paper,a set of algorithms for dictionary construction and the literal similarity is proposed to improve the effect of candidate entity set generation.For the entity disambiguation,that is to determine the target entity from the candidate entity set,the main methods used are popularity,theme similarity,label similarity,text similarity and other methods.The popularity of the method is ineffective,because no matter what the context of the use of popular method of speed results are the same,but the popularity of the reference does have the meaning,so he put the candidate entity collection part;The degree of computation can determine the probability that two entities belong to the same topic,but this method is computationally intensive;the label similarity calculates the similarity between tags,so it needs to get the label of each entity,which is not all Entities are,therefore,not universal,but if each entity has a label can help improve the accuracy of the entity link.The similarity of text is calculated by some common algorithms of text similarity calculation.However,these similarities are very small,and the similarity of text does not have reference value in the process of entity linking.
Keywords/Search Tags:entity linking, entity disambiguation, Baidu Encyclopedia, Word2vec, Chinese microblog
PDF Full Text Request
Related items