Font Size: a A A

Microblogging Hot Topic On The Fault-tolerant Rough Set Of Discovery

Posted on:2015-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2268330422967799Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Hot topic is topics of different areas, occurred in specific period, can arouse wideconcern. Micro-blog hot topics’ is hot topic in application platform of micro-blog.Micro-blog is information sharing and dissemination platform based on user relation,with strong interaction. Its own characteristics making micro-bloggers has recipientand originator identity, the feature promotes micro-blog informationdissemination quickly and timely on the Internet, some unexpected hot event oftenwill show at this moment. Hot topics detection not only helps people quickly knowingcertain period social hot, but also helps managers timely finding social public opinion,and guides correctly.Micro-blog with open and interactive features, every day emerges vast amountsof information has a short text, less information content, isn’t standardized wordsand other characteristics, makes traditional method of extracting hot topics can’tdirectly apply to detect micro-blog’s hot topics. View of this, the work of this paperdoes as following:First, according to micro-blog information dissemination characteristics,extended the traditional tolerance rough set model, proposes extending tolerancerough set model based on micro-blog characteristics. The traditional tolerant rough setmodel is based on some of property’s collaboratively appear to construct upperand lower approximation of a concept,achieve reduction or expansion of the property,due tolerant class too loose, would lead to a low accuracy and validity,not suitablefor directly applied to this article, Therefore, based micro-blog’s repost, commentfeatures to improve the traditional tolerant rough set.Second, acquisition sina’s micro-blog messages as corpus, analysis corpuscharacteristics, and shown as tolerance rough set text representation model based onmicro-blog characteristics, We find direct application existing text representationmodel, will lead to text representation sparsely, based on this, using tolerance roughset text representation model to build document representation model, and extended term’s weight calculation method.Third, proposes an incremental agglomerative hierarchical K-means clusteringalgorithm to detect micro-blog hot topic, effectively overcome k-means algorithminitial centroids selection and hierarchical clustering algorithms higher complexity,improved topic heat degree measurement formula.
Keywords/Search Tags:micro-blog hot topic, tolerant rough set, IAHC&K-means algorithm, topic hot
PDF Full Text Request
Related items