Font Size: a A A

The Multi-strategic Research Of Chinese Weibo Entity And Wikipedia Entry Linking

Posted on:2016-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y L GuoFull Text:PDF
GTID:2308330461967814Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rise of WEB2.0 technology and the Internet industry,Social networkdeveloped unprecedentedly, derived a new type of social networking platform,Weibo, its user scale and the amount of data that a sharp growth. On the other hand, WEB2.0 technology also brings rapid development of Internet encyclopedia. How to use social media content to construct and expand knowledge base is becoming the research hot spot recently.Among them, the ambiguity of entity to be expanded become the focus difficulty in the research areas. Entity linking technology is an important way to solve the problem. In this paper, for Chinese weibo content brief and optional features such as nonstandard language, proposed a multi-strategic approach in Chinese weibo entity disambiguation.Research on Chinese Weibo entity andWikipedia entry linking, ismatching the named entity in weibo content and encyclopedia entriess in the knowledge base. Ask tolink entities in weibo and wikipedia entry accurately. Research on Chinese Weibo entity andencyclopedia entry linking belongs to the named entity disambiguation. It is a hot filed in the Natural Language Processing, plays an important role in research in the field of natural language processing.It is an indispensable research foundation. Enhancing Chinese weibo link entity disambiguation accuracy, can be better to build and expand online encyclopedia knowledge base, and reflect the high and good performance of universal Natural Language Processing system.In this paper, we put the evaluation task of CCF Conference on Natural Language Processing & Chinese Computing as the main research content. The paper uses the web crawler to get microblog content and network Wikipedia page information. Build an entity mapping table and combing wikipedia entry knowledge base. Using the LDA model and disambiguation algorithm are based on topic model in name entity disambiguation.Our method contains the disambiguation algorithm based on entity mapping table, the disambiguation algorithm based on TF-IDF model, the disambiguation algorithm based on the micro-blog entity’s tag information and the disambiguation algorithm based on Fast-Newman Clustering. This article main contribution includes:1. Build an entity mapping table and combing wikipedia entry knowledge base2. Propose a Names disambiguation algorithm based on topic model3. Propose a multi-level, multi-strategic entity disambiguation algorithm4. Write Chinese weibo entity recognition and encyclopedic knowledge base program, and application software copyrightThe data of paper is from Chinese microblog entity linking task of 2nd and 3rdNatural Language Processing & Chinese Computation conference (NLP&CC 2013&2014). Which in 2013 evaluation, the number of KB entities is 44492, the number of entities to be tested for 1274. While in 2014 evaluation, the number of KB entities is 378207, the number of entities to be tested for 607.In 2013, Accuracyevaluation is84.99%,sixthranked in the 18set of resultssubmittedinthe country. The following accuracy rate is 84.02% in 2014, our team ranked third. After a follow-up summary improvement,using the model and algorithm in this paper, the rate of accuracy is up to91.40%.
Keywords/Search Tags:Chinese Micro-blog, Named Entity Disambiguation, Entity Linking, Wikipedia Entry Knowledge Base
PDF Full Text Request
Related items