Font Size: a A A

A Research Of Timeline Based On Personal Micro-blog Characteristics

Posted on:2016-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2298330452971391Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age, we are eager to know the people andthings which we are interested in. Enterprises want to understand their competitors, thecompany employees want to know the boss’s information, the fans want to know her favoritestar, people want to know the status of their relatives and friends. carrying our personalidea,Micro-blog was born.The micro blog was a new rapid development of social networking tools, with the WEB,mobile client and other ways,the user can published the daily life and share with his or herfriends. Personal micro-blog published in140(based on Sina micro blog) to limit, and add thetitle, expression, URL, pictures and other information. Because of the characteristics of micro-blog this openness, makes its users increase.Micro blog has a short text length, small information and the characteristics of the greatamount of data, which cause serious data in text processing sparse; Micro blog has thecharacteristics of strong real-time updates, Micro blog treatment engineering complex; Withtext of micro-blog is not normative, grammar of informal, colloquial language, lifestyle andspelling errors, add a lot of difficulties to event extraction; Cross each other tweets about a topic,and presents the long tail phenomenon, lead to serious uneven data distribution; with its uniquecharacteristics, so that in the event extraction at the same time, complexity and difficulty toincrease.With the rapid development of the historical information of micro-blog, The increasingnumber of micro-blog,adding to micro-blog unique characteristics, which makes us want toknow when we are interested in people and things that exacerbated the difficulties. micro-blogevent detection and traditional text event detection has the very big difference, which makes theextraction algorithm of traditional text event detection, cannot be applied to the extraction ofmicro-blog events detection, we base on the traditional text extraction algorithm, adding somefeatures and characteristics, studied micro-blog event extraction.This article according to the characteristic of micro-blog, extraction, proposed an extraction algorithm based on thecharacteristics of micro-blog events. the problem which Traditional event extraction method didnot fully considered the characteristics of micro-blog, in this paper the characteristics of micro-blog refinement, the corresponding forwarding, comments, like micro-blog, label, URLtitle andother characteristics into the improvement of TF-IDF, exit key words by the improved TF-IDFextraction, at the same time the characteristics of keyword extraction differentiation of micro-blog, defines three keywords to micro-blog characteristic as the foundation, then the extractedkeywords are detailed according to the characteristics of micro-blog, then one by one thecharacteristics of micro-blog refinement similarity calculation to get comprehensive similarity,and finally to the comprehensive similarity as the basis, using the features of time plus micro-blog the improved clustering algorithm the event extraction results. The experimental resultsproved the effectiveness of the proposed algorithm.
Keywords/Search Tags:Micro-blog, Improvement of TF–IDF, Event detection
PDF Full Text Request
Related items