Font Size: a A A

Research And Implementation On UGC Data Mining

Posted on:2014-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z P LiFull Text:PDF
GTID:2248330398972302Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Social networking is getting close, especially for analysing the user-generated content (UGC) with methods in the field of data-mining. We select Twitter as data source, to design and build a demo sensible for the specific organization or institution, trying to excavate information relevant.This article will focus on three parts, the crawler based on theory of information retrieval, the classifier that filtering irrelevant data, and the cluster which contributes to many mining function in system, especially the automatic summarization technology.First, the article implements how to design and realize method on retrieving relevant data with high quality. We propose two perspectives from keywords and users, to generalize a collaborative retrivial method. For the keywords, we dramatically update them after the CHI feature selection; and for the users, we propose two different concepts to judge users" importance, one is the concept of activity, the other is the concept of authority.Then, we put forward a new method to filter out irrelevant information. We propose a method on the fusion of three different characteristics, the feature about keywords and their sequence locations, the feature of characters with high frequency, and the feature about the distribution ratios of sysbols, to build a Bayesian classifier. And a transform method on feature weight is conducted for better performance. The classifier gains hight recall and precision rates.The third part, the article develops some important function based on clustering method, including trends detection, active users mining, users’ network consctruction, and automatic summarization. In the process of summarization, the WAF method is introduced to define the Concept Space within which similar sentences are clustering together. Finally, we select the representative sentences to generate the summary. And with help of the definition of novelty, the method of summary update is realized.
Keywords/Search Tags:twitter, ugc, information retrieval, classification, summarization
PDF Full Text Request
Related items