Font Size: a A A

The Study On Cross-document Chinese Person Name Disambiguation With Coreference Resolution

Posted on:2014-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2298330422990408Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, how to obtain the neededinformation effectively from explosive growing information has become the goalof information retrieval study, while the person name oriented searching plays animportant role. However, in Chinese Internet environment, the phenomenon ofperson name repetition is serious, which brings many difficulties to person nameoriented searching. Therefore, the study on person name disambiguation hasbecome an important topic in information retrieval research.The observations show that the person name ambiguity attributes toin-document co-reference ambiguity and cross-document entities ambiguity. Thus,the Chinese person name disambiguation includes in-document co-referenceresolution and cross-document person name disambiguation. The existing workson Chinese co-reference resolution are camped into rule-based and statistic-basedapproach. In which, rule-based approach achieves good precision but itsportability is unsatisfactory. The statistic-based approach normally achieves goodbalance between precision and recall, but its performance highly relies on thetraining data. In the cross-document person name disambiguation approaches, theone based on person name contextual features has the performance bottleneckattribute to the lack of needed knowledge while the one based on externalknowledge, such as social network, has the difficulty to further improve thedisambiguation performance since the limitation of external knowledge.This study investigated several techniques to improve the performance ofChinese person name disambiguation. Firstly, the method for improving theChinese person name recognition is developed which incorporates the constitutiverules and occurrence features of person names. Secondly, target to in-documentco-reference resolution, the method incorporates the Chinese linguistic rules andmachine learning is investigated to determine whether the candidate noun phrasepair has the co-reference relationship. The method achieved official score of0.651on the CoNLL2012Chinese co-reference resolution dataset. Thirdly, afterapplying the co-reference resolution method to identify the accurate context ofperson names, a cross-document Chinese person name disambiguation methodwhich leverages the encyclopedia knowledge and uses the internet verification isproposed. This method achieved82.4%precision and83.4%recall onCIPS-SIGHAN2012Chinese person name disambiguation dataset.The contributions of this study may be summarized as below. Firstly, aChinese co-reference resolution method incorporating rule-based and statistic-based techniques is developed. This method is ranked4th in the worldand2th in China in the CoNLL2012Chinese co-reference resolution evaluation.Second, a cross-document Chinese person name disambiguation method isinvestigated in which the encyclopedia knowledge is leveraged to solve theproblem of incomplete description information for estimating the similaritybetween entities accurately. This is helpful to improve the precision of personname disambiguation. Furthermore, the internet verification is adopted to decreasethe influence of the lack of the entity’s information. As the result, the recall isimproved. Thirdly, the proposed incorporation of in-document co-referenceresolution and cross-document person name disambiguation has shown its goodperformance.
Keywords/Search Tags:person name disambiguation, co-reference resolution, encyclopediaknowledge, Internet verification
PDF Full Text Request
Related items